On Thu, Mar 27, 2014 at 7:59 AM, Richard Hipp <d...@sqlite.org> wrote:

>
> SQLite 3.8.* is not using the PRIMARY KEY to sort in this case because its
> cost function thinks that there is no performance advantage to using the
> PRIMARY KEY to sort rather than sorting in a separate step.  We might need
> to tweak the cost function a little to fix this in 3.8.5.  Investigating...
>
>
A simplified version of the problem is this:

   CREATE TABLE t1(a,b,c,d,e,f,PRIMARY KEY(a,b));
   SELECT * FROM t1
   WHERE a LIKE '%1%' AND c BETWEEN 1 AND 9999999 AND d=31
   ORDER BY a;

Suppose that there are N rows in the table, of which K rows make it through
the WHERE clause and are output. There are many ways to do this query, but
the query planner quickly settles down to a choice of two:

(1) Scan the original table from beginning to end.  Add each row that
matches the WHERE clause to a list.  Sort the list.  Then walk the list and
output rows one by one.  This approach uses time roughly equal to N + K +
KlogK.  The N is for the initial scan.  The KlogK is for the sort.  And the
K is for the final output pass.

(2) Scan the PRIMARY KEY index from beginning to end.  For each row, look
up the corresponding table row and evaluate the WHERE clause.  If the WHERE
clause is true, then output that row.  This approach requires roughly NlogN
time.  Scanning the PRIMARY KEY index requires N steps.  On each step, the
lookup of the corresponding table row requires logN time.

So the question comes down to whether N+K+KlogK is less than NlogN.  (There
are also constants of proportionality on each term that account for the
relative speeds of the various operations.)  This in turn becomes a
question of the size of K relative to N.  If the WHERE clause is very
selective - if very few rows of the original table end up being output -
then K will be small relative to N and algorithm (1) will be best.  If, on
the other hand, most or all of the table rows match the WHERE clause then K
will be very close in size to N and algorithm (2) is preferred.

Unfortunately, there is no way for the query planner to know in advance how
big K will be.  It has to guess.  SQLite version 3.7.17 guesses a larger K
than SQLite 3.8.4.2, and so 3.7.17 picks algorithm (2) whereas 3.8.4.2
picks algorithm (1).

You can override this by giving the query planner hints using the
likelihood() SQL function.  Change the query to:

   SELECT * FROM t1
   WHERE likelihood(a LIKE '%1%' AND c BETWEEN 1 AND 9999999 AND d=31, 0.99)
   ORDER BY a;

And that will tell the query planner that the WHERE clause is probably
always true.  The query planner will then know that K is very close in size
to N and will choose algorithm (2), which is a think what you want.

I looked into tuning the query planner (by adjusting the constants of
proportionality on the various cost terms) so that your particular query
chooses algorithm (2) by default.  But after running some experiments, I
think I am going to leave them alone for now.  That decision might be
revisited in the future, though.


-- 
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to