Re: [sqlite] Min/Max and skip-scan optimizations

R Smith Wed, 06 Feb 2019 01:05:16 -0800


On 2019/02/06 12:12 AM, Gerlando Falauto wrote:


The use case involves retaining as much data as the storage can possibly
hold (so a bunch of gigabytes).
I could've just used directories and logfiles instead of abusing a
relational database but I just thought it would more convenient to issue a
query and use a cursor.

There is nothing wrong with using a database to store a list of values,as another poster pointed out, [insert high made-up number]% of schemataout there are basically just that, but I like Keith's suggestion, sinceyou did decide to DB it, why not make it nicely relational too?

the table is often more efficient than threading a lookup via another
index into the query plan. Sometimes crafting a new temp BTree Index for
(a) specific field(s) on a materialized set of data might also be judged
faster than re-establishing links between said data and its original Index.

Do you think restoring the original primary key (instead of ROWID) and
dropping the index would make any difference?

I do think it would make a difference (almost any change would), but Iam not sure it would make all the difference. I would however suggest,at the very least, to test this and see.

I pre-populated the table with a realistic use case scenario and ran
ANALYZE.
I'm not planning on using ANALYZE on the real system -- though I might
indeed pre-populate sqlite_stat1 with typical values as suggested in the
docs.

This is fine. I would ask - did your "test" data include a gigabyte ormore data? The amount, cardinality and shape of the data are all mostimportant for ANALYZE to provide good information.

If you can demonstrate a true degradation //...

Yes, in the worst case, adding the ORDER BY clause (2 vs.1, 4 vs.3) leads
to a perceivable degradation in terms of both seek time (several seconds
vs. milliseconds to get the first row) and occupied disk space.

I must have missed this, apologies, that is certainly a very truedegradation. Note that the entire query delivery should be taken intoconsideration. The first row can often be delivered near instantaneouswith following rows taking progressively longer. Very often the time ittakes to deliver the first row is compensated by the time that is savedlater along the subsequent rows. The QP takes this into consideration.

A good example is getting a set of data, say 100 rows, sorting it firstand then just spitting it out from memory. The preparation (aka firstrow delivery) will take time, all the rest will be instant. Contrastthat with a query that needs no sorting, it might produce rows as itscans the table, the first of which might appear instantly (since it'sat the top of the table and satisfies the WHERE clause), but all thenext qualifying rows might take a long while to produce depending onwhere they fall within the table. In the end the fully traversed cursormay take similar amounts of time.

The QP cannot know before-hand how many "hits" it would encounter, sohas to use a basic pre-made guide and/or help from the ANALYZE data tobest guess which route is better - and you can easily construct anon-usual set of data for which it will choose wrong every time, and forwhich "fixing" it will negatively affect more common sets of data.

As I already said, my use case *is* quite unusual. Definitely not something
you'd normally use a relational database for.

That does not matter - if the query planner can do better, it should -unless of course changing the decision tree will negatively affectanother more widely used query case. (This is the hard part to establish.)



_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Min/Max and skip-scan optimizations

Reply via email to