Re: [sqlite] Hints for the query planner

Hick Gunter Wed, 11 Sep 2013 04:37:44 -0700

(6) maybe(EXPR)

-----Ursprüngliche Nachricht-----
Von: Richard Hipp [mailto:[email protected]]
Gesendet: Dienstag, 10. September 2013 21:27
An: General Discussion of SQLite Database
Betreff: [sqlite] Hints for the query planner


There is a survey question at the bottom of this message.  But first some 
context...

Over on the sqlite-dev mailing list, a debate has been going on about the best 
way to provide some useful hints to the query planner.  The query under 
discussion looks like this:

SELECT DISTINCT aname
  FROM album, composer, track
 WHERE cname LIKE '%bach%'
   AND composer.cid=track.cid
   AND album.aid=track.aid;

Assuming that the schema has appropriate indices and ANALYZE has been run, 
SQLite does a good job of selecting an efficient query plan for the above.
But the query planner lacks a key piece of information that could help it to do 
a better job.  In particular, the query planner does not know how often the 
subexpression "cname LIKE '%bach%'" will be true.  But, it turns out, the best 
query plan depends critically on this one fact.

By default, the query planner (in SQLite 3.8.0) assumes that a subexpression 
that cannot use an index will always be true.  Probably this will be tweaked in 
3.8.1 so that such subexpressions will be assumed to usually, but not always, 
be true.  Either way, it would be useful to be able to convey to the query 
planner the other extreme - that a subexpression is usually not true.

(Pedantic detail:  "not true" is not the same as "false" in SQL because NULL is 
neither true nor false.)

There is currently code in a branch that provides a hinting mechanism using a 
magic "unlikely()" function.  Subexpressions contained within "unlikely()" are 
assumed to usually not be true.  Other than this hint to the query planner, the 
unlikely() function is a complete no-op and optimized out of the VDBE code so 
that it does not consume any CPU cycles.
The only purpose of the unlikely() function is to let the query planner know 
that the subexpression contained in its argument is not commonly true.  So, if 
an application developer knows that the string "bach" seldom occurs in composer 
names, then she might rewrite the query like this:

SELECT DISTINCT aname
  FROM album, composer, track
 WHERE unlikely(cname LIKE '%bach%')
   AND composer.cid=track.cid
   AND album.aid=track.aid;

The query planner might use this "likelihood" hint to choose a different query 
plan that works better when the subexpression is commonly false.  Or it might 
decide that the original query plan was good enough and ignore the hint.  The 
query planner gets to make that decision.  The application developer is not 
telling the query planner what to do. The application developer has merely 
provided a small amount of meta-information about the likelihood of the 
subexpression being true, meta-information which the query planner may or may 
not use.

Note that the subexpression does not have to be a LIKE operator.
PostgreSQL, to name one example, estimates how often a LIKE operator will be 
true based on the pattern on its right-hand side, and adjust query plans 
accordingly, and some have argued for this sort of thing in SQLite.  But I want 
a more general solution.  Suppose the subexpression involves one or more calls 
to application-defined functions about which the query planner cannot possible 
know anything.  A general mechanism for letting the query planner know that 
subexpressions are commonly not true is what is desired - not a technique for 
making LIKE operators more efficient.

SURVEY QUESTION:

The question for today is what to call this magic hint function:

(1)  unlikely(EXPR)
(2)  selective(EXPR)
(3)  seldom(EXPR)
(4)  seldom_true(EXPR)
(5)  usually_not_true(EXPR)

Please feel free to suggest other names if you think of any.

ADDITIONAL INFORMATION:

The current implementation allows a second argument which must be a floating 
point constant between 0.0 and 1.0, inclusive. The second argument is an 
estimate of the probability that the expression in the first argument will be 
true.  The default is 0.05.  Names like "unlikely" or "seldom" work well when 
this probability is small, but if the second argument is close to 1.0, then 
those names seem backwards.  I don't know if this matters.  The optional second 
argument is not guaranteed to make it into an actually release.
--
D. Richard Hipp
[email protected]
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


--------------------------------------------------------------------------
 Gunter Hick
Software Engineer
Scientific Games International GmbH
Klitschgasse 2 – 4, A - 1130 Vienna, Austria
FN 157284 a, HG Wien
Tel: +43 1 80100 0
E-Mail: [email protected]

This e-mail is confidential and may well also be legally privileged. If you 
have received it in error, you are on notice as to its status and accordingly 
please notify us immediately by reply e-mail and then delete this message from 
your system. Please do not copy it or use it for any purposes, or disclose its 
contents to any person as to do so could be a breach of confidence. Thank you 
for your cooperation.
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Hints for the query planner

Reply via email to