On Thu, Mar 27, 2014 at 7:59 AM, Richard Hipp <d...@sqlite.org> wrote:
> > SQLite 3.8.* is not using the PRIMARY KEY to sort in this case because its > cost function thinks that there is no performance advantage to using the > PRIMARY KEY to sort rather than sorting in a separate step. We might need > to tweak the cost function a little to fix this in 3.8.5. Investigating... > > A simplified version of the problem is this: CREATE TABLE t1(a,b,c,d,e,f,PRIMARY KEY(a,b)); SELECT * FROM t1 WHERE a LIKE '%1%' AND c BETWEEN 1 AND 9999999 AND d=31 ORDER BY a; Suppose that there are N rows in the table, of which K rows make it through the WHERE clause and are output. There are many ways to do this query, but the query planner quickly settles down to a choice of two: (1) Scan the original table from beginning to end. Add each row that matches the WHERE clause to a list. Sort the list. Then walk the list and output rows one by one. This approach uses time roughly equal to N + K + KlogK. The N is for the initial scan. The KlogK is for the sort. And the K is for the final output pass. (2) Scan the PRIMARY KEY index from beginning to end. For each row, look up the corresponding table row and evaluate the WHERE clause. If the WHERE clause is true, then output that row. This approach requires roughly NlogN time. Scanning the PRIMARY KEY index requires N steps. On each step, the lookup of the corresponding table row requires logN time. So the question comes down to whether N+K+KlogK is less than NlogN. (There are also constants of proportionality on each term that account for the relative speeds of the various operations.) This in turn becomes a question of the size of K relative to N. If the WHERE clause is very selective - if very few rows of the original table end up being output - then K will be small relative to N and algorithm (1) will be best. If, on the other hand, most or all of the table rows match the WHERE clause then K will be very close in size to N and algorithm (2) is preferred. Unfortunately, there is no way for the query planner to know in advance how big K will be. It has to guess. SQLite version 3.7.17 guesses a larger K than SQLite 3.8.4.2, and so 3.7.17 picks algorithm (2) whereas 3.8.4.2 picks algorithm (1). You can override this by giving the query planner hints using the likelihood() SQL function. Change the query to: SELECT * FROM t1 WHERE likelihood(a LIKE '%1%' AND c BETWEEN 1 AND 9999999 AND d=31, 0.99) ORDER BY a; And that will tell the query planner that the WHERE clause is probably always true. The query planner will then know that K is very close in size to N and will choose algorithm (2), which is a think what you want. I looked into tuning the query planner (by adjusting the constants of proportionality on the various cost terms) so that your particular query chooses algorithm (2) by default. But after running some experiments, I think I am going to leave them alone for now. That decision might be revisited in the future, though. -- D. Richard Hipp d...@sqlite.org _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users