Hi, I see, thanks for the explanation. I still don't understand how the whole skip-scan optimization works though. My use-case involves a table pretty much like this one:
CREATE TABLE `rolling` ( `source1` TEXT NOT NULL, `source2` TEXT NOT NULL, `ts` INTEGER NOT NULL, `value` TEXT ); CREATE INDEX `sources` ON `rolling` ( `source1`, `source2`, `ts` ); INSERT INTO rolling WITH RECURSIVE src1( source1 ) AS ( VALUES("aaa") UNION ALL VALUES("bbb") ), src2( source2 ) AS ( VALUES("X1") UNION ALL VALUES("X2") UNION ALL VALUES("X3") UNION ALL VALUES("X4") ), cnt( ts, value) AS ( VALUES( 0, "ZZZZ") UNION ALL SELECT ts+1, value FROM cnt LIMIT 1000000) select src1.source1, src2.source2, cnt.* from src1, src2, cnt; So we have 2*4*1M = 8M rows, all indexed. The first two columns (source1, source2) just serve the purpose of identifying the data source and therefore have a very limited set of possible values. That's a good case for skip-scan, isn't it? Here's a session where I try to run some benchmarks (notice this is running on a Raspberry pi with USB storage). SQLite version 3.25.0 2018-09-15 04:01:47 Enter ".help" for usage hints. sqlite> .eqp on sqlite> .timer on sqlite> select distinct source1,source2 from rolling; QUERY PLAN `--SCAN TABLE rolling USING COVERING INDEX sources aaa|X1 aaa|X2 aaa|X3 aaa|X4 bbb|X1 bbb|X2 bbb|X3 bbb|X4 Run Time: real 6.767 user 6.200000 sys 0.390000 OK, so that's pretty slow. So I run analyze: sqlite> analyze; Run Time: real 12.572 user 10.890000 sys 1.400000 sqlite> select distinct source1,source2 from rolling; QUERY PLAN `--SCAN TABLE rolling USING COVERING INDEX sources aaa|X1 aaa|X2 aaa|X3 aaa|X4 bbb|X1 bbb|X2 bbb|X3 bbb|X4 Run Time: real 0.001 user 0.000000 sys 0.000000 sqlite> That's quite an improvement! There's two things I don't understand though: 1) Why that needs the DB to be ANALYZEd first. That's just the first 2 columns of an existing index. This optimization should always pay off, no matter how big the tables 2) Why the two query plans look identical (no indication of any skip-scan kicking in) Now I just want to get the first ts value for each (source1,source2) pair: sqlite> select distinct source1,source2,min(ts) from rolling group by source1, source2; QUERY PLAN |--SCAN TABLE rolling USING COVERING INDEX sources `--USE TEMP B-TREE FOR DISTINCT aaa|X1|0 aaa|X2|0 aaa|X3|0 aaa|X4|0 bbb|X1|0 bbb|X2|0 bbb|X3|0 bbb|X4|0 Run Time: real 11.394 user 10.650000 sys 0.630000 sqlite> Here I would expect to query to be as fast as the previous one. After all, ts is also part of the index, so min(ts) should just be the first value. OK, so perhaps it's the distinct clause? sqlite> select source1,source2,min(ts) from rolling group by source1, source2; QUERY PLAN `--SCAN TABLE rolling USING COVERING INDEX sources aaa|X1|0 aaa|X2|0 aaa|X3|0 aaa|X4|0 bbb|X1|0 bbb|X2|0 bbb|X3|0 bbb|X4|0 Run Time: real 10.760 user 10.360000 sys 0.330000 Not much of an improvement indeed. Perhaps ordering? sqlite> select source1,source2,min(ts) from rolling group by source1, source2 order by source1, source2; QUERY PLAN `--SCAN TABLE rolling USING COVERING INDEX sources aaa|X1|0 aaa|X2|0 aaa|X3|0 aaa|X4|0 bbb|X1|0 bbb|X2|0 bbb|X3|0 bbb|X4|0 Run Time: real 10.699 user 10.410000 sys 0.260000 Nope. What if I try some seek? sqlite> select source1,source2,min(ts) from rolling where ts>=999999 group by source1, source2; QUERY PLAN `--SEARCH TABLE rolling USING COVERING INDEX sources (ANY(source1) AND ANY(source2) AND ts>?) aaa|X1|999999 aaa|X2|999999 aaa|X3|999999 aaa|X4|999999 bbb|X1|999999 bbb|X2|999999 bbb|X3|999999 bbb|X4|999999 Run Time: real 0.001 user 0.000000 sys 0.000000 YES! That's skip-scan kicking in, and it's finally telling me with the ANY() clauses. However, I still can't seem to get the min(ts) query to be optimized. Perhaps I need some sub-query? Any ideas? Thank you! Gerlando On Thu, Jan 24, 2019 at 4:47 PM Richard Hipp <d...@sqlite.org> wrote: > On 1/24/19, Gerlando Falauto <gerlando.fala...@gmail.com> wrote: > > What is the purpose of ANALYZE sqlite_master; ? > > Causes the content of sqlite_stat1 to be reloaded into the query > planner after making out-of-band changes using UPDATE. > -- > D. Richard Hipp > d...@sqlite.org > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users