Nitpick:  A "bug" means it gets the wrong answer, which is not the case
here.  What you are reporting here is not a bug but an optimization
opportunity.

On Thu, Mar 14, 2013 at 2:07 PM, Ryan Johnson
<ryan.john...@cs.utoronto.ca>wrote:

> The offending query (slightly modified version of Q7:
>
> select count(*)
> from supplier, lineitem, orders, customer, nation n1, nation n2
> where s_suppkey = l_suppkey and o_orderkey = l_orderkey
>       and c_custkey = o_custkey and s_nationkey = n1.n_nationkey
>       and c_nationkey = n2.n_nationkey and (
>           (n1.n_name = 'ALGERIA' and n2.n_name = 'EGYPT')
>           or (n1.n_name = 'EGYPT' and n2.n_name = 'ALGERIA')
>       ) and l_shipdate between  '1995-01-01' and  '1996-12-31';
>
> The optimizer chooses this plan:
>
> 0|0|4|SCAN TABLE nation AS n1 (~25 rows)
> 0|1|5|SEARCH TABLE nation AS n2 USING INDEX nni (N_NAME=?) (~1 rows)
> 0|1|5|SEARCH TABLE nation AS n2 USING INDEX nni (N_NAME=?) (~1 rows)
> 0|2|0|SEARCH TABLE supplier USING INDEX snki (S_NATIONKEY=?) (~40 rows)
> 0|3|3|SEARCH TABLE customer USING INDEX cnki (C_NATIONKEY=?) (~600 rows)
> 0|4|2|SEARCH TABLE orders USING INDEX ocki (O_CUSTKEY=?) (~15 rows)
> 0|5|1|SEARCH TABLE lineitem USING INDEX lpki (L_ORDERKEY=?) (~2 rows)
>
> Dropping index nni and disabling automatic indexing improves things a bit
> (3.3 s, 50% speedup):
>
> 0|0|4|SCAN TABLE nation AS n1 (~25 rows)
> 0|1|0|SEARCH TABLE supplier USING INDEX snki (S_NATIONKEY=?) (~40 rows)
> 0|2|1|SEARCH TABLE lineitem USING INDEX lski (L_SUPPKEY=?) (~300 rows)
> 0|3|2|SEARCH TABLE orders USING INDEX opki (O_ORDERKEY=?) (~1 rows)
> 0|4|3|SEARCH TABLE customer USING INDEX cpki (C_CUSTKEY=?) (~1 rows)
> 0|5|5|SEARCH TABLE nation AS n2 USING INDEX npki (N_NATIONKEY=?) (~1 rows)
>

SQLite version 3.7.16 chooses the second plan regardless.  So that much has
already been addressed.


>
>
> Presumably it's slow because predicates don't really hit until after all
> the joins finish.
>

Sort of.  SQLite does evaluate predicates as soon as it can.  It doesn't
wait until after the inner join finishes to evaluate the predicates.  As
soon as all information needed for a predicate is available, it is
evaluated.

The problem is that SQLite does not do a lot of algebraic manipulation of
predicates to try to factor out terms that can be evaluated early.  So the
predicate:

      (n1.n_name='ALGERIA' and n2.n_name='EGYPT')
      OR (n1.n_name='EGYPT' and n2.n_name='ALGERIA')

is treated as a unit and cannot be evaluated until both n1 and n2 are
available.  If SQLite were to be enhanced to deal with this case, what it
would need to do is factor this into three separate conjuncts, as follows:

      (n1.n_name='ALGERIA' AND n2.n_name='EGYPT')
      OR (n1.n_name='EGYPT' and n2.n_name='ALGERIA')
   AND
      (n1.n_name='ALGERIA OR n1.n_name='EGYPT')
   AND
      (n2.n_name='EGYPT' OR n2.n_name='ALGERIA')

The second two are entirely redundant in the sense that if the first is
true then the second two are also true.  (SQLite has a method of marking
them so and making sure they are not evaluated if the first is evaluated.
Similar auxiliary conjuncts are used to help optimize LIKE, GLOB, and
BETWEEN operators)  But the second two conjuncts also depend on just a
single table, so they have the option of being evaluated early whereas the
first must wait until both tables have been evaluated.

If you augment the WHERE clause of your query by adding "AND
(n1.n_name='ALGERIA' OR n1.n_name='EGYPT')" you get the observed speedup.


-- 
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to