I see my problem now. You said it below: "cartesian product". I believe 
"NATURAL JOIN" should be used with caution.

 

I'll demonstrate my mistake with a small sample:

 

CREATE TABLE t1 (x INTEGER PRIMARY KEY, a INTEGER NOT NULL);

CREATE TABLE t2 (x INTEGER PRIMARY KEY, b INTEGER NOT NULL);

CREATE TABLE t3 (y INTEGER PRIMARY KEY, c INTEGER NOT NULL);

CREATE TABLE t4 (y INTEGER PRIMARY KEY, x INTEGER NOT NULL);

 

SELECT COUNT(*) FROM t1 NATURAL JOIN t2 NATURAL JOIN t3 NATURAL JOIN t4; -- 
wrong... and slow due to cartesian product!

SELECT COUNT(*) FROM t1 NATURAL JOIN t2 NATURAL JOIN t4 NATURAL JOIN t3; -- 
correct... and fast!

 

So SQLite did the right thing here. I have one more question though:

 

Q: When joining two tables, does SQLite choose the smaller one as the driving 
table? If I understand things correctly, this yields a performance gain.

 

 

Thanks.

 

 
> Date: Tue, 6 Oct 2009 20:33:17 -0700
> From: dar...@darrenduncan.net
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] Multiple NATURAL JOIN precedence?
> 
> Kristoffer Danielsson wrote:
> > All my tables have unique column names, so I tend to use "NATURAL JOIN" for
> > my joins (all join-columns are indexed). When I have multiple tables, I
> > sometimes get incredibly slow queries (not sure if they even terminate).
> 
> If all of your tables have unique column names, then a natural join would 
> degenerate to a cartesian product, because there are no column pairs across 
> different tables for which only rows having matching values are kept. A 
> cartesian product would almost always be slow regardless of your JOIN syntax. 
> I'll assume that you meant to say that all of your columns *except* the ones 
> you 
> are joining on, have unique column names, which makes more sense. Correct me 
> if 
> I'm wrong.
> 
> > SELECT SUM(x) FROM t1 NATURAL JOIN t2 NATURAL JOIN t3 NATURAL JOIN t4 GROUP
> > BY x ORDER BY x; -- takes forever, whereas:
> > 
> > SELECT SUM(x) FROM t2 NATURAL JOIN t3 NATURAL JOIN t4 NATURAL JOIN t1 GROUP
> > BY x ORDER BY x; -- takes one second
> 
> Are all of those "x" supposed to be the same column?
> 
> I don't think it makes sense to do all of [SUM(x), GROUP BY x, ORDER BY x] in 
> the same query.
> 
> If the query is supposed to have exactly 1 output row, counting the number of 
> groups resulting from the GROUP BY, then the ORDER BY is useless, and makes 
> the 
> query slower (unless a smart optimizer eliminates it from ever running).
> 
> If the query is supposed to have an output row for each distinct value of x 
> from 
> the GROUP BY, then SUM(x) would presumably be the same number as x for every 
> row.
> 
> Did you mean this?:
> 
> SELECT foo, SUM(bar) FROM ... GROUP BY foo ORDER BY foo
> 
> -- Darren Duncan
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
                                          
_________________________________________________________________
Windows Live: Gör det enklare för dina vänner att se vad du håller på med på 
Facebook.
http://www.microsoft.com/windows/windowslive/see-it-in-action/social-network-basics.aspx?ocid=PID23461::T:WLMTAGL:ON:WL:sv-se:SI_SB_2:092009
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to