Re: [sqlite] Multiple NATURAL JOIN precedence?

Pavel Ivanov Wed, 07 Oct 2009 06:29:06 -0700

> Q: When joining two tables, does SQLite choose the smaller one as the driving 
> table? If I understand things correctly, this yields a performance gain.


In fact this is not always true. For example let's say you're doing

SELECT * FROM t1 INNER JOIN t2 ON t1.x = t2.x ORDER BY t2.y

If t1 and t2 do not have any indexes then it will not matter much what
table is driving one - SQLite will have to make cartesian join anyway,
doing cnt_t1 * cnt_t2 iterations (if cnt_t1 is number of rows in t1
and cnt_t2 is number of rows in t2). If the only index in the system
is on t2.y then making t2 driving table will have benefit no matter
big it or small. If the only index is on t2.x then t1 is better to be
driving no matter how big it is, although if t2 is too small benefit
will be negligible. But the biggest benefit will be if there're 2
indexes - on t2.y and t1.x and if t2 is driving table. And it doesn't
matter again if t2 big or small.
But in case if you do another query:

SELECT * FROM t1 INNER JOIN t2 ON t1.x = t2.x WHERE t1.y = c1 AND t2.z = c2

And you have 2 indexes on t1.x and t2.x then indeed having smaller
table as driving table will be better.

All these are pretty simple theoretical examples but I believe they
all are implemented in SQLite optimizer.


Pavel

On Wed, Oct 7, 2009 at 9:08 AM, Kristoffer Danielsson
<kristoffer.daniels...@live.se> wrote:
>
> I see my problem now. You said it below: "cartesian product". I believe 
> "NATURAL JOIN" should be used with caution.
>
>
>
> I'll demonstrate my mistake with a small sample:
>
>
>
> CREATE TABLE t1 (x INTEGER PRIMARY KEY, a INTEGER NOT NULL);
>
> CREATE TABLE t2 (x INTEGER PRIMARY KEY, b INTEGER NOT NULL);
>
> CREATE TABLE t3 (y INTEGER PRIMARY KEY, c INTEGER NOT NULL);
>
> CREATE TABLE t4 (y INTEGER PRIMARY KEY, x INTEGER NOT NULL);
>
>
>
> SELECT COUNT(*) FROM t1 NATURAL JOIN t2 NATURAL JOIN t3 NATURAL JOIN t4; -- 
> wrong... and slow due to cartesian product!
>
> SELECT COUNT(*) FROM t1 NATURAL JOIN t2 NATURAL JOIN t4 NATURAL JOIN t3; -- 
> correct... and fast!
>
>
>
> So SQLite did the right thing here. I have one more question though:
>
>
>
> Q: When joining two tables, does SQLite choose the smaller one as the driving 
> table? If I understand things correctly, this yields a performance gain.
>
>
>
>
>
> Thanks.
>
>
>
>
>> Date: Tue, 6 Oct 2009 20:33:17 -0700
>> From: dar...@darrenduncan.net
>> To: sqlite-users@sqlite.org
>> Subject: Re: [sqlite] Multiple NATURAL JOIN precedence?
>>
>> Kristoffer Danielsson wrote:
>> > All my tables have unique column names, so I tend to use "NATURAL JOIN" for
>> > my joins (all join-columns are indexed). When I have multiple tables, I
>> > sometimes get incredibly slow queries (not sure if they even terminate).
>>
>> If all of your tables have unique column names, then a natural join would
>> degenerate to a cartesian product, because there are no column pairs across
>> different tables for which only rows having matching values are kept. A
>> cartesian product would almost always be slow regardless of your JOIN syntax.
>> I'll assume that you meant to say that all of your columns *except* the ones 
>> you
>> are joining on, have unique column names, which makes more sense. Correct me 
>> if
>> I'm wrong.
>>
>> > SELECT SUM(x) FROM t1 NATURAL JOIN t2 NATURAL JOIN t3 NATURAL JOIN t4 GROUP
>> > BY x ORDER BY x; -- takes forever, whereas:
>> >
>> > SELECT SUM(x) FROM t2 NATURAL JOIN t3 NATURAL JOIN t4 NATURAL JOIN t1 GROUP
>> > BY x ORDER BY x; -- takes one second
>>
>> Are all of those "x" supposed to be the same column?
>>
>> I don't think it makes sense to do all of [SUM(x), GROUP BY x, ORDER BY x] in
>> the same query.
>>
>> If the query is supposed to have exactly 1 output row, counting the number of
>> groups resulting from the GROUP BY, then the ORDER BY is useless, and makes 
>> the
>> query slower (unless a smart optimizer eliminates it from ever running).
>>
>> If the query is supposed to have an output row for each distinct value of x 
>> from
>> the GROUP BY, then SUM(x) would presumably be the same number as x for every 
>> row.
>>
>> Did you mean this?:
>>
>> SELECT foo, SUM(bar) FROM ... GROUP BY foo ORDER BY foo
>>
>> -- Darren Duncan
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> _________________________________________________________________
> Windows Live: Gör det enklare för dina vänner att se vad du håller på med på 
> Facebook.
> http://www.microsoft.com/windows/windowslive/see-it-in-action/social-network-basics.aspx?ocid=PID23461::T:WLMTAGL:ON:WL:sv-se:SI_SB_2:092009
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Multiple NATURAL JOIN precedence?

Reply via email to