Hi Richard,

What about "probability" or "likelyhood"? This works in both the case where the likelyhood is great as well as when it is low. From the list you provided, I would pick "unlikely".

Kind regards,

Philip Bennefall
----- Original Message ----- From: "Richard Hipp" <d...@sqlite.org>
To: "General Discussion of SQLite Database" <sqlite-users@sqlite.org>
Sent: Tuesday, September 10, 2013 9:26 PM
Subject: [sqlite] Hints for the query planner


There is a survey question at the bottom of this message.  But first some
context...

Over on the sqlite-dev mailing list, a debate has been going on about the
best way to provide some useful hints to the query planner.  The query
under discussion looks like this:

SELECT DISTINCT aname
 FROM album, composer, track
WHERE cname LIKE '%bach%'
  AND composer.cid=track.cid
  AND album.aid=track.aid;

Assuming that the schema has appropriate indices and ANALYZE has been run,
SQLite does a good job of selecting an efficient query plan for the above.
But the query planner lacks a key piece of information that could help it
to do a better job.  In particular, the query planner does not know how
often the subexpression "cname LIKE '%bach%'" will be true.  But, it turns
out, the best query plan depends critically on this one fact.

By default, the query planner (in SQLite 3.8.0) assumes that a
subexpression that cannot use an index will always be true.  Probably this
will be tweaked in 3.8.1 so that such subexpressions will be assumed to
usually, but not always, be true.  Either way, it would be useful to be
able to convey to the query planner the other extreme - that a
subexpression is usually not true.

(Pedantic detail:  "not true" is not the same as "false" in SQL because
NULL is neither true nor false.)

There is currently code in a branch that provides a hinting mechanism using
a magic "unlikely()" function.  Subexpressions contained within
"unlikely()" are assumed to usually not be true.  Other than this hint to
the query planner, the unlikely() function is a complete no-op and
optimized out of the VDBE code so that it does not consume any CPU cycles.
The only purpose of the unlikely() function is to let the query planner
know that the subexpression contained in its argument is not commonly
true.  So, if an application developer knows that the string "bach" seldom
occurs in composer names, then she might rewrite the query like this:

SELECT DISTINCT aname
 FROM album, composer, track
WHERE unlikely(cname LIKE '%bach%')
  AND composer.cid=track.cid
  AND album.aid=track.aid;

The query planner might use this "likelihood" hint to choose a different
query plan that works better when the subexpression is commonly false.  Or
it might decide that the original query plan was good enough and ignore the
hint.  The query planner gets to make that decision.  The application
developer is not telling the query planner what to do. The application
developer has merely provided a small amount of meta-information about the
likelihood of the subexpression being true, meta-information which the
query planner may or may not use.

Note that the subexpression does not have to be a LIKE operator.
PostgreSQL, to name one example, estimates how often a LIKE operator will
be true based on the pattern on its right-hand side, and adjust query plans
accordingly, and some have argued for this sort of thing in SQLite.  But I
want a more general solution.  Suppose the subexpression involves one or
more calls to application-defined functions about which the query planner
cannot possible know anything.  A general mechanism for letting the query
planner know that subexpressions are commonly not true is what is desired -
not a technique for making LIKE operators more efficient.

SURVEY QUESTION:

The question for today is what to call this magic hint function:

(1)  unlikely(EXPR)
(2)  selective(EXPR)
(3)  seldom(EXPR)
(4)  seldom_true(EXPR)
(5)  usually_not_true(EXPR)

Please feel free to suggest other names if you think of any.

ADDITIONAL INFORMATION:

The current implementation allows a second argument which must be a
floating point constant between 0.0 and 1.0, inclusive. The second argument
is an estimate of the probability that the expression in the first argument
will be true.  The default is 0.05.  Names like "unlikely" or "seldom" work
well when this probability is small, but if the second argument is close to
1.0, then those names seem backwards.  I don't know if this matters.  The
optional second argument is not guaranteed to make it into an actually
release.
--
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to