Hi,

I see, thanks for the explanation.
I still don't understand how the whole skip-scan optimization works though.
My use-case involves a table pretty much like this one:

CREATE TABLE `rolling` (
    `source1`    TEXT NOT NULL,
    `source2`    TEXT NOT NULL,
    `ts`    INTEGER NOT NULL,
    `value`    TEXT
);

CREATE INDEX `sources` ON `rolling` (
    `source1`,
    `source2`,
    `ts`
);

INSERT INTO rolling
    WITH RECURSIVE
      src1( source1 ) AS ( VALUES("aaa") UNION ALL VALUES("bbb") ),
      src2( source2 ) AS ( VALUES("X1") UNION ALL VALUES("X2") UNION ALL
VALUES("X3") UNION ALL VALUES("X4") ),
      cnt( ts, value) AS (
      VALUES( 0, "ZZZZ")
        UNION ALL
      SELECT ts+1, value FROM cnt LIMIT 1000000)

    select src1.source1, src2.source2, cnt.* from src1, src2, cnt;


So we have 2*4*1M = 8M rows, all indexed.
The first two columns (source1, source2) just serve the purpose of
identifying the data source and therefore have a very limited set of
possible values.
That's a good case for skip-scan, isn't it?

Here's a session where I try to run some benchmarks (notice this is running
on a Raspberry pi with USB storage).

SQLite version 3.25.0 2018-09-15 04:01:47
Enter ".help" for usage hints.
sqlite> .eqp on
sqlite> .timer on
sqlite> select distinct source1,source2 from rolling;
QUERY PLAN
`--SCAN TABLE rolling USING COVERING INDEX sources
aaa|X1
aaa|X2
aaa|X3
aaa|X4
bbb|X1
bbb|X2
bbb|X3
bbb|X4
Run Time: real 6.767 user 6.200000 sys 0.390000

OK, so that's pretty slow.
So I run analyze:

sqlite> analyze;
Run Time: real 12.572 user 10.890000 sys 1.400000
sqlite> select distinct source1,source2 from rolling;
QUERY PLAN
`--SCAN TABLE rolling USING COVERING INDEX sources
aaa|X1
aaa|X2
aaa|X3
aaa|X4
bbb|X1
bbb|X2
bbb|X3
bbb|X4
Run Time: real 0.001 user 0.000000 sys 0.000000
sqlite>

That's quite an improvement! There's two things I don't understand though:
1) Why that needs the DB to be ANALYZEd first. That's just the first 2
columns of an existing index.
This optimization should always pay off, no matter how big the tables
2) Why the two query plans look identical (no indication of any skip-scan
kicking in)

Now I just want to get the first ts value for each (source1,source2) pair:

sqlite> select distinct source1,source2,min(ts) from rolling group by
source1, source2;
QUERY PLAN
|--SCAN TABLE rolling USING COVERING INDEX sources
`--USE TEMP B-TREE FOR DISTINCT
aaa|X1|0
aaa|X2|0
aaa|X3|0
aaa|X4|0
bbb|X1|0
bbb|X2|0
bbb|X3|0
bbb|X4|0
Run Time: real 11.394 user 10.650000 sys 0.630000
sqlite>

Here I would expect to query to be as fast as the previous one. After all,
ts is also part of the index, so min(ts) should just be the first value.
OK, so perhaps it's the distinct clause?

sqlite> select source1,source2,min(ts) from rolling group by source1,
source2;
QUERY PLAN
`--SCAN TABLE rolling USING COVERING INDEX sources
aaa|X1|0
aaa|X2|0
aaa|X3|0
aaa|X4|0
bbb|X1|0
bbb|X2|0
bbb|X3|0
bbb|X4|0
Run Time: real 10.760 user 10.360000 sys 0.330000

Not much of an improvement indeed. Perhaps ordering?

sqlite> select source1,source2,min(ts) from rolling group by source1,
source2 order by source1, source2;
QUERY PLAN
`--SCAN TABLE rolling USING COVERING INDEX sources
aaa|X1|0
aaa|X2|0
aaa|X3|0
aaa|X4|0
bbb|X1|0
bbb|X2|0
bbb|X3|0
bbb|X4|0
Run Time: real 10.699 user 10.410000 sys 0.260000

Nope. What if I try some seek?

sqlite> select source1,source2,min(ts) from rolling where ts>=999999 group
by source1, source2;
QUERY PLAN
`--SEARCH TABLE rolling USING COVERING INDEX sources (ANY(source1) AND
ANY(source2) AND ts>?)
aaa|X1|999999
aaa|X2|999999
aaa|X3|999999
aaa|X4|999999
bbb|X1|999999
bbb|X2|999999
bbb|X3|999999
bbb|X4|999999
Run Time: real 0.001 user 0.000000 sys 0.000000

YES! That's skip-scan kicking in, and it's finally telling me with the
ANY() clauses.
However, I still can't seem to get the min(ts) query to be optimized.
Perhaps I need some sub-query?
Any ideas?

Thank you!
Gerlando

On Thu, Jan 24, 2019 at 4:47 PM Richard Hipp <d...@sqlite.org> wrote:

> On 1/24/19, Gerlando Falauto <gerlando.fala...@gmail.com> wrote:
> > What is the purpose of ANALYZE sqlite_master; ?
>
> Causes the content of sqlite_stat1 to be reloaded into the query
> planner after making out-of-band changes using UPDATE.
> --
> D. Richard Hipp
> d...@sqlite.org
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to