I see my problem now. You said it below: "cartesian product". I believe "NATURAL JOIN" should be used with caution.
I'll demonstrate my mistake with a small sample: CREATE TABLE t1 (x INTEGER PRIMARY KEY, a INTEGER NOT NULL); CREATE TABLE t2 (x INTEGER PRIMARY KEY, b INTEGER NOT NULL); CREATE TABLE t3 (y INTEGER PRIMARY KEY, c INTEGER NOT NULL); CREATE TABLE t4 (y INTEGER PRIMARY KEY, x INTEGER NOT NULL); SELECT COUNT(*) FROM t1 NATURAL JOIN t2 NATURAL JOIN t3 NATURAL JOIN t4; -- wrong... and slow due to cartesian product! SELECT COUNT(*) FROM t1 NATURAL JOIN t2 NATURAL JOIN t4 NATURAL JOIN t3; -- correct... and fast! So SQLite did the right thing here. I have one more question though: Q: When joining two tables, does SQLite choose the smaller one as the driving table? If I understand things correctly, this yields a performance gain. Thanks. > Date: Tue, 6 Oct 2009 20:33:17 -0700 > From: dar...@darrenduncan.net > To: sqlite-users@sqlite.org > Subject: Re: [sqlite] Multiple NATURAL JOIN precedence? > > Kristoffer Danielsson wrote: > > All my tables have unique column names, so I tend to use "NATURAL JOIN" for > > my joins (all join-columns are indexed). When I have multiple tables, I > > sometimes get incredibly slow queries (not sure if they even terminate). > > If all of your tables have unique column names, then a natural join would > degenerate to a cartesian product, because there are no column pairs across > different tables for which only rows having matching values are kept. A > cartesian product would almost always be slow regardless of your JOIN syntax. > I'll assume that you meant to say that all of your columns *except* the ones > you > are joining on, have unique column names, which makes more sense. Correct me > if > I'm wrong. > > > SELECT SUM(x) FROM t1 NATURAL JOIN t2 NATURAL JOIN t3 NATURAL JOIN t4 GROUP > > BY x ORDER BY x; -- takes forever, whereas: > > > > SELECT SUM(x) FROM t2 NATURAL JOIN t3 NATURAL JOIN t4 NATURAL JOIN t1 GROUP > > BY x ORDER BY x; -- takes one second > > Are all of those "x" supposed to be the same column? > > I don't think it makes sense to do all of [SUM(x), GROUP BY x, ORDER BY x] in > the same query. > > If the query is supposed to have exactly 1 output row, counting the number of > groups resulting from the GROUP BY, then the ORDER BY is useless, and makes > the > query slower (unless a smart optimizer eliminates it from ever running). > > If the query is supposed to have an output row for each distinct value of x > from > the GROUP BY, then SUM(x) would presumably be the same number as x for every > row. > > Did you mean this?: > > SELECT foo, SUM(bar) FROM ... GROUP BY foo ORDER BY foo > > -- Darren Duncan > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _________________________________________________________________ Windows Live: Gör det enklare för dina vänner att se vad du håller på med på Facebook. http://www.microsoft.com/windows/windowslive/see-it-in-action/social-network-basics.aspx?ocid=PID23461::T:WLMTAGL:ON:WL:sv-se:SI_SB_2:092009 _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users