Re: [sqlite] Multiple NATURAL JOIN precedence?

Kristoffer Danielsson Wed, 07 Oct 2009 08:20:11 -0700
Thanks for your clarification, Pavel!
 
> From: paiva...@gmail.com
> Date: Wed, 7 Oct 2009 09:28:37 -0400
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] Multiple NATURAL JOIN precedence?
> 
> > Q: When joining two tables, does SQLite choose the smaller one as the 
> > driving table? If I understand things correctly, this yields a performance 
> > gain.
> 
> In fact this is not always true. For example let's say you're doing
> 
> SELECT * FROM t1 INNER JOIN t2 ON t1.x = t2.x ORDER BY t2.y
> 
> If t1 and t2 do not have any indexes then it will not matter much what
> table is driving one - SQLite will have to make cartesian join anyway,
> doing cnt_t1 * cnt_t2 iterations (if cnt_t1 is number of rows in t1
> and cnt_t2 is number of rows in t2). If the only index in the system
> is on t2.y then making t2 driving table will have benefit no matter
> big it or small. If the only index is on t2.x then t1 is better to be
> driving no matter how big it is, although if t2 is too small benefit
> will be negligible. But the biggest benefit will be if there're 2
> indexes - on t2.y and t1.x and if t2 is driving table. And it doesn't
> matter again if t2 big or small.
> But in case if you do another query:
> 
> SELECT * FROM t1 INNER JOIN t2 ON t1.x = t2.x WHERE t1.y = c1 AND t2.z = c2
> 
> And you have 2 indexes on t1.x and t2.x then indeed having smaller
> table as driving table will be better.
> 
> All these are pretty simple theoretical examples but I believe they
> all are implemented in SQLite optimizer.
> 
> 
> Pavel
> 
> On Wed, Oct 7, 2009 at 9:08 AM, Kristoffer Danielsson
> <kristoffer.daniels...@live.se> wrote:
> >
> > I see my problem now. You said it below: "cartesian product". I believe 
> > "NATURAL JOIN" should be used with caution.
> >
> >
> >
> > I'll demonstrate my mistake with a small sample:
> >
> >
> >
> > CREATE TABLE t1 (x INTEGER PRIMARY KEY, a INTEGER NOT NULL);
> >
> > CREATE TABLE t2 (x INTEGER PRIMARY KEY, b INTEGER NOT NULL);
> >
> > CREATE TABLE t3 (y INTEGER PRIMARY KEY, c INTEGER NOT NULL);
> >
> > CREATE TABLE t4 (y INTEGER PRIMARY KEY, x INTEGER NOT NULL);
> >
> >
> >
> > SELECT COUNT(*) FROM t1 NATURAL JOIN t2 NATURAL JOIN t3 NATURAL JOIN t4; -- 
> > wrong... and slow due to cartesian product!
> >
> > SELECT COUNT(*) FROM t1 NATURAL JOIN t2 NATURAL JOIN t4 NATURAL JOIN t3; -- 
> > correct... and fast!
> >
> >
> >
> > So SQLite did the right thing here. I have one more question though:
> >
> >
> >
> > Q: When joining two tables, does SQLite choose the smaller one as the 
> > driving table? If I understand things correctly, this yields a performance 
> > gain.
> >
> >
> >
> >
> >
> > Thanks.
> >
> >
> >
> >
> >> Date: Tue, 6 Oct 2009 20:33:17 -0700
> >> From: dar...@darrenduncan.net
> >> To: sqlite-users@sqlite.org
> >> Subject: Re: [sqlite] Multiple NATURAL JOIN precedence?
> >>
> >> Kristoffer Danielsson wrote:
> >> > All my tables have unique column names, so I tend to use "NATURAL JOIN" 
> >> > for
> >> > my joins (all join-columns are indexed). When I have multiple tables, I
> >> > sometimes get incredibly slow queries (not sure if they even terminate).
> >>
> >> If all of your tables have unique column names, then a natural join would
> >> degenerate to a cartesian product, because there are no column pairs across
> >> different tables for which only rows having matching values are kept. A
> >> cartesian product would almost always be slow regardless of your JOIN 
> >> syntax.
> >> I'll assume that you meant to say that all of your columns *except* the 
> >> ones you
> >> are joining on, have unique column names, which makes more sense. Correct 
> >> me if
> >> I'm wrong.
> >>
> >> > SELECT SUM(x) FROM t1 NATURAL JOIN t2 NATURAL JOIN t3 NATURAL JOIN t4 
> >> > GROUP
> >> > BY x ORDER BY x; -- takes forever, whereas:
> >> >
> >> > SELECT SUM(x) FROM t2 NATURAL JOIN t3 NATURAL JOIN t4 NATURAL JOIN t1 
> >> > GROUP
> >> > BY x ORDER BY x; -- takes one second
> >>
> >> Are all of those "x" supposed to be the same column?
> >>
> >> I don't think it makes sense to do all of [SUM(x), GROUP BY x, ORDER BY x] 
> >> in
> >> the same query.
> >>
> >> If the query is supposed to have exactly 1 output row, counting the number 
> >> of
> >> groups resulting from the GROUP BY, then the ORDER BY is useless, and 
> >> makes the
> >> query slower (unless a smart optimizer eliminates it from ever running).
> >>
> >> If the query is supposed to have an output row for each distinct value of 
> >> x from
> >> the GROUP BY, then SUM(x) would presumably be the same number as x for 
> >> every row.
> >>
> >> Did you mean this?:
> >>
> >> SELECT foo, SUM(bar) FROM ... GROUP BY foo ORDER BY foo
> >>
> >> -- Darren Duncan
> >> _______________________________________________
> >> sqlite-users mailing list
> >> sqlite-users@sqlite.org
> >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> > _________________________________________________________________
> > Windows Live: Gör det enklare för dina vänner att se vad du håller på med 
> > på Facebook.
> > http://www.microsoft.com/windows/windowslive/see-it-in-action/social-network-basics.aspx?ocid=PID23461::T:WLMTAGL:ON:WL:sv-se:SI_SB_2:092009
> > _______________________________________________
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
                                          
_________________________________________________________________
Hitta kärleken nu i vår!
http://dejting.se.msn.com/channel/index.aspx?trackingid=1002952
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Multiple NATURAL JOIN precedence?

Reply via email to