Re: [sqlite] Min/Max and skip-scan optimizations

Gerlando Falauto Tue, 05 Feb 2019 14:13:04 -0800

Hi Ryan,

first of all thank you for your patience and contribution.


[....]

>
> Add to that the fact that an SQLite TABLE is, in and of itself, nothing
> less than a covering Index with row_id as a key (or a custom key for
> WITHOUT ROWID tables), and as such it is a rather good Index and a
> mostly preferred Index by the query planner (because "using" any other
> index adds cycles plus an extra row_id lookup).


That's quite interesting indeed.
I actually started off with source1,source2,ts as the primary key and for
some reason (which I no longer remember) I thought it would be wise to use
a ROWID and add an index instead.
Please bear in mind this table is nothing more than a continuous log with a
two-columned source identifier and a timestamp.
The use case involves retaining as much data as the storage can possibly
hold (so a bunch of gigabytes).
I could've just used directories and logfiles instead of abusing a
relational database but I just thought it would more convenient to issue a
query and use a cursor.

Due to this, scanning
> the table is often more efficient than threading a lookup via another
> index into the query plan. Sometimes crafting a new temp BTree Index for
> (a) specific field(s) on a materialized set of data might also be judged
> faster than re-establishing links between said data and its original Index.
>

Do you think restoring the original primary key (instead of ROWID) and
dropping the index would make any difference?

>
> The method by which the query planner decides which other Index (if any)
> should be used involves a bit of game theory, typically looking at some
> ANALYZE result data along with with some tried and tested weights in the
> decision tree (which I'm not going into since A - It's not important,
> and B - I don't know enough of how SQLite does it). If the end score
> finds that there is no remarkable advantage to using a separate index,
> then it WILL opt to use the more-efficient table scan.
>

I pre-populated the table with a realistic use case scenario and ran
ANALYZE.
I'm not planning on using ANALYZE on the real system -- though I might
indeed pre-populate sqlite_stat1 with typical values as suggested in the
docs.

It might be that the adding of the "ORDER BY" simply pushes one such
> decision weight over the edge in this use case, and, once the table data
> evolved to be more complex or hefty, it may again turn to the Index.
>

> To add to another poster's comment: Do not second-guess the
> Query-planner, leave it to its devices. You may even be able to
> construct a scenario where the specific use case causes the QP to choose
> an execution path that is slightly slower than an alternate one, but if
> it is looked at in the general case, then other similar query scenarios
> might again be faster with that chosen path. Further to this, if you
> construct a weird query now to force a path of execution with some gain,
> you possibly prohibit it from capitalizing on an even better improvement
> that might be inherent to the next SQLite update (possibly thanks to
> your very own report here).
>
>
I could not agree more.



> If you can demonstrate a true degradation (one that slows down a
> significant time slice that trespasses on human-perceptible time) for a
> general query, an optimization will surely be considered, but this case,
> unless I've misunderstood the severity, does not seem to warrant that.
>

Yes, in the worst case, adding the ORDER BY clause (2 vs.1, 4 vs.3) leads
to a perceivable degradation in terms of both seek time (several seconds
vs. milliseconds to get the first row) and occupied disk space.

Keith's approach (5) seems to somehow mitigate the effect by only adding
some +33% execution time (40s vs. 30s) at the cost of a "weird" query.
A query which I would never have thought of by myself, and whose query plan
I don't quite understand.


> [PS: this is not a discouragement, it's great to hear of every possible
> quirk and make other users aware of a possible query scenario that might
> not be optimal - thanks for that, and I'm certain the devs would notice
> this, perhaps even get on fixing it right away, or maybe only keep it in
> the back of their minds for when the next round of query-planner
> refinement happens. I'm simply saying that there is possibly no
> satisfying answer to your question right now - best we can do is:
> "Sometimes the QP correctly evaluates the best path to be one that is
> not obviously best to us, or maybe even worse for a specific case, but
> typically better in the general case".]
>

As I already said, my use case *is* quite unusual. Definitely not something
you'd normally use a relational database for.
So I'm not surprised it's not the use case SQLite is optimized against.

Thank you everyone for contributing to this conversation!
Gerlando
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Min/Max and skip-scan optimizations

Reply via email to