Re: [sqlite] Automatic join reordering doesn't seem to work?

Ryan Johnson Mon, 23 Jan 2012 16:25:07 -0800

On 23/01/2012 3:09 PM, Ryan Johnson wrote:

On 23/01/2012 12:51 PM, Richard Hipp wrote:
On Mon, Jan 23, 2012 at 12:48 PM, Simon Slavin<[email protected]>wrote:
I don't know if Dr Hipp is pursuing this privately or expecting itto be
solved collaboratively on this list.
I don't have a test case to work on.
My database file is ~150MB, but was generated by the TPC-H datagenerator program. Assuming a linux-like environment (includingcygwin), the following will reproduce the setup in under five minutes:The problem persists with a freshly-generated database on my machine,using a just-compiled sqlite-3.7.10.

OK, it looks like I didn't install the 3.7.10 binary properly, becausethe situation is significantly different once ANALYZE has run.

The first query from the OP is remarkably similar to a query mentionedin where.c (from the 3.7.5 sources, which I happened to have handy):

Previous versions of SQLite [did not always find the lowest-cost plan]for scripts such as the following:
    **
    **   CREATE TABLE t1(a, b);
    **   CREATE TABLE t2(c, d);
    **   SELECT * FROM t2, t1 WHERE t2.rowid = t1.a;
    **
The best strategy is to iterate through table t1 first. However it isnot possible to determine this with a simple greedy algorithm. Sincethe cost of a linear scan through table t2 is the same as the cost ofa linear scan through table t1, a simple greedy algorithm may chooseto use t2 for the outer loop, which is a much costlier approach.


Re-casting my first query as the above gives:

sqlite> explain query plan select * from customer t2, orders t1 wheret2.rowid = t1.custkey;
0|0|0|SCAN TABLE customer AS t2 (~15000 rows)
0|1|1|SEARCH TABLE orders AS t1 USING INDEX OrderCustomers (custKey=?)(~15 rows)

The same query plan is chosen if t1 comes first; execution times andstats confirm the improvement.

For query 2, the situation is less clear. First of all, it appears toexecute more than 4x faster overall (10s and .2s -- WOW!) but theoptimizer still seems to choose the wrong plan (13.6M rows examined):

1|0|0|SCAN TABLE orders (~37500 rows)
1|0|0|USE TEMP B-TREE FOR DISTINCT
0|0|2|SCAN SUBQUERY 1 AS Y (~37500 rows)
0|1|0|SCAN TABLE orders AS O (~75000 rows)
0|2|1|SEARCH TABLE lineitem AS L USING INDEX LineItemOrders(orderKey=?) (~2 rows)

If I force the proper ordering, "X cross join Y", then the following,better-than-expected plan is used instead (only 300k rows examined!):

1|0|0|SCAN TABLE orders (~37500 rows)
1|0|0|USE TEMP B-TREE FOR DISTINCT
0|0|0|SCAN TABLE orders AS O (~75000 rows)
0|1|1|SEARCH TABLE lineitem AS L USING INDEX LineItemOrders(orderKey=?) (~5 rows)
0|2|2|SCAN SUBQUERY 1 AS Y (~18750 rows)

The main difference seems to be that Y should be the innermost loop butisn't.


Thoughts?
Ryan


_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Automatic join reordering doesn't seem to work?

Reply via email to