Re: [sqlite] Min/Max and skip-scan optimizations

Gerlando Falauto Sat, 02 Feb 2019 14:20:23 -0800

Hi,
it's me again, struggling with indices once again.
What I'm now trying to do is filter on source1 and by range of timestamp.
Results should be naturally ordered by source1, source2,ts.


1)

SELECT source1, source2, ts, value
FROM rolling
WHERE source1 = 'aaa'
AND ts > 1 AND ts < 10;

QUERY PLAN
`--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND ANY(source2)
AND ts>? AND ts<?)

This looks pretty OK on the sample dataset.
For some reason though this doesn't really work like that with the real
dataset (which has an extra index on ts only),
and that seems to kick in instead of the real index.
So the query plan comes out completely different, and rows are not
naturally sorted by source1, source2, as I'd like.
[I know I can use "+ts" and/or drop the "ts" index altogether, but I'm
trying to make a point here...]
So I add an extra ORDER BY clause:

2)

SELECT source1, source2, ts, value
FROM rolling
WHERE source1 = 'aaa'
  AND ts > 1 AND ts < 100000000
ORDER BY source1, source2, ts;

QUERY PLAN
|--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND ANY(source2)
AND ts>? AND ts<?)
`--USE TEMP B-TREE FOR RIGHT PART OF ORDER BY

Here I don't really understand why a TEMP B-TREE is required at all.
Apparently, this adds extra processing at the end so I don't start pulling
any rows until much later.
If the index is used, data would be *naturally* sorted by its key, so I
don't really understand.
So I recur to a subquery:

3)

SELECT source1, source2, ts, value
FROM rolling
WHERE ((source1, source2) in
        (SELECT DISTINCT source1, source2 from rolling where source1='aaa')
  AND ts > 1 AND ts < 10)

QUERY PLAN
|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=? AND
ts>? AND ts<?)
`--LIST SUBQUERY
   `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)

This also seems to yield the right results, in the right order.
But the order looks pretty much like a coincidence to me.
So I'd rather explicitly state the sorting I'd like:

4)

SELECT source1, source2, ts, value
FROM rolling
WHERE ((source1, source2) in
        (SELECT DISTINCT source1, source2 from rolling where source1='aaa')
  AND ts > 1 AND ts < 10)
ORDER BY source1, source2, ts

QUERY PLAN
|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=? AND
ts>? AND ts<?)
|--LIST SUBQUERY
|  `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
`--USE TEMP B-TREE FOR ORDER BY

Here the b-tree is *NOT* for "RIGHT PART OF" ORDER BY, it seems like some
global sorting
which looks even worse than case 2.

What am I doing wrong? Is this expected?
I just don't seem to be able to get what would have been a pretty trivial
task with
last-century technologies (thinking of Paradox, dBase, CA-Clipper)...

Thanks in advance!
Gerlando


On Mon, Jan 28, 2019 at 9:11 AM Gerlando Falauto <[email protected]>
wrote:

> YES! Thank you!
> Many thanks for the ".eqp full" tip also, that really explains a lot
> (though I don't really understand any of it yet).
>
> Have a great day!
> Gerlando
>
>
> On Mon, Jan 28, 2019 at 6:50 AM Keith Medcalf <[email protected]> wrote:
>
>>
>> Do you perhaps want this:
>>
>> select source1,
>>        source2,
>>        (
>>         select min(ts)
>>           from rolling
>>          where source1 = x.source1
>>            and source2 = x.source2
>>        )
>>   from (
>>         select distinct source1,
>>                         source2
>>           from rolling
>>        ) as x;
>>
>> SQLite version 3.27.0 2019-01-28 00:42:06
>> Enter ".help" for usage hints.
>> Connected to a transient in-memory database.
>> Use ".open FILENAME" to reopen on a persistent database.
>> sqlite> .timer on
>> sqlite> CREATE TABLE `rolling` (
>>    ...>     `source1`    TEXT NOT NULL,
>>    ...>     `source2`    TEXT NOT NULL,
>>    ...>     `ts`    INTEGER NOT NULL,
>>    ...>     `value`    TEXT
>>    ...> );
>> Run Time: real 0.002 user 0.000000 sys 0.000000
>> sqlite>
>> sqlite> CREATE INDEX `sources` ON `rolling` (
>>    ...>     `source1`,
>>    ...>     `source2`,
>>    ...>     `ts`
>>    ...> );
>> Run Time: real 0.001 user 0.000000 sys 0.000000
>> sqlite>
>> sqlite> INSERT INTO rolling
>>    ...>     WITH RECURSIVE
>>    ...>       src1( source1 ) AS ( VALUES("aaa") UNION ALL VALUES("bbb")
>> ),
>>    ...>       src2( source2 ) AS ( VALUES("X1") UNION ALL VALUES("X2")
>> UNION ALL
>>    ...> VALUES("X3") UNION ALL VALUES("X4") ),
>>    ...>       cnt( ts, value) AS (
>>    ...>       VALUES( 0, "ZZZZ")
>>    ...>         UNION ALL
>>    ...>       SELECT ts+1, value FROM cnt LIMIT 1000000)
>>    ...>
>>    ...>     select src1.source1, src2.source2, cnt.* from src1, src2, cnt;
>> Run Time: real 8.920 user 8.843750 sys 0.078125
>> sqlite>
>> sqlite> analyze;
>> Run Time: real 1.285 user 1.281250 sys 0.000000
>> sqlite> .eqp full
>> sqlite>
>> sqlite> select source1,
>>    ...>        source2,
>>    ...>        (
>>    ...>         select min(ts)
>>    ...>           from rolling
>>    ...>          where source1 = x.source1
>>    ...>            and source2 = x.source2
>>    ...>        )
>>    ...>   from (
>>    ...>         select distinct source1,
>>    ...>                         source2
>>    ...>           from rolling
>>    ...>        ) as x;
>> QUERY PLAN
>> |--CO-ROUTINE 2
>> |  `--SCAN TABLE rolling USING COVERING INDEX sources (~7864320 rows)
>> |--SCAN SUBQUERY 2 AS x (~7864320 rows)
>> `--CORRELATED SCALAR SUBQUERY 1
>>    `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=? AND
>> source2=?) (~983040 rows)
>> addr  opcode         p1    p2    p3    p4             p5  comment
>> ----  -------------  ----  ----  ----  -------------  --  -------------
>> 0     Init           0     64    0                    00  Start at 64
>> 1     InitCoroutine  1     23    2                    00  x
>> 2     Null           1     4     0                    08  r[4]=NULL
>> 3     OpenRead       4     3     0     k(4,,,,)       00  root=3 iDb=0;
>> sources
>> 4     ColumnsUsed    4     0     0     3              00
>> 5     Explain        5     0     0     SCAN TABLE rolling USING COVERING
>> INDEX sources (~7864320 rows)  00
>> 6     Noop           0     0     0                    00  Begin
>> WHERE-loop0: rolling
>> 7     Rewind         4     21    2     0              00
>> 8         Noop           0     0     0                    00  Begin
>> WHERE-core
>> 9         Column         4     0     2                    00
>> r[2]=rolling.source1
>> 10        Column         4     1     3                    00
>> r[3]=rolling.source2
>> 11        Ne             2     13    4     (BINARY)       80  if
>> r[4]!=r[2] goto 13
>> 12        Eq             3     20    5     (BINARY)       80  if
>> r[5]==r[3] goto 20
>> 13        Copy           2     4     1                    00
>> r[4..5]=r[2..3]
>> 14        Yield          1     0     0                    00
>> 15        Noop           0     0     0                    00  End
>> WHERE-core
>> 16        Column         4     0     6                    00  r[6]=
>> 17        Column         4     1     7                    00  r[7]=
>> 18        SeekGT         4     21    6     2              00  key=r[6..7]
>> 19      Goto           1     8     0                    00
>> 20    Next           4     8     0                    01
>> 21    Noop           0     0     0                    00  End
>> WHERE-loop0: rolling
>> 22    EndCoroutine   1     0     0                    00
>> 23    Explain        23    0     0     SCAN SUBQUERY 2 AS x (~7864320
>> rows)  00
>> 24    Noop           0     0     0                    00  Begin
>> WHERE-loop0: x
>> 25    InitCoroutine  1     0     2                    00
>> 26      Yield          1     62    0                    00  next row of x
>> 27      Noop           0     0     0                    00  Begin
>> WHERE-core
>> 28      Copy           2     9     0                    00  r[9]=r[2];
>> x.source1
>> 29      Copy           3     10    0                    00  r[10]=r[3];
>> x.source2
>> 30      Null           0     12    12                   00
>> r[12..12]=NULL; Init subquery result
>> 31      Integer        1     13    0                    00  r[13]=1;
>> LIMIT counter
>> 32      Null           0     14    15                   00  r[14..15]=NULL
>> 33      OpenRead       5     3     0     k(4,,,,)       00  root=3 iDb=0;
>> sources
>> 34      ColumnsUsed    5     0     0     7              00
>> 35      Explain        35    0     0     SEARCH TABLE rolling USING
>> COVERING INDEX sources (source1=? AND source2=?) (~983040 rows)  00
>> 36      Noop           0     0     0                    00  Begin
>> WHERE-loop0: rolling
>> 37      Copy           2     16    0                    00  r[16]=r[2];
>> x.source1
>> 38      Copy           3     17    0                    00  r[17]=r[3];
>> x.source2
>> 39      CursorHint     5     0     0     AND(EQ(c0,r[16]),EQ(c1,r[17]))
>> 00
>> 40      Copy           2     18    0                    00  r[18]=r[2];
>> x.source1
>> 41      IsNull         18    54    0                    00  if
>> r[18]==NULL goto 54
>> 42      Copy           3     19    0                    00  r[19]=r[3];
>> x.source2
>> 43      IsNull         19    54    0                    00  if
>> r[19]==NULL goto 54
>> 44      Null           0     20    0                    00  r[20]=NULL
>> 45      SeekGT         5     54    18    3              00  key=r[18..20]
>> 46        IdxGT          5     54    18    2              00
>> key=r[18..19]
>> 47        Noop           0     0     0                    00  Begin
>> WHERE-core
>> 48        Column         5     2     21                   00
>> r[21]=rolling.ts
>> 49        CollSeq        0     0     0     (BINARY)       00
>> 50        AggStep        0     21    14    min(1)         01  accum=r[14]
>> step(r[21])
>> 51        Goto           0     55    0                    00  min() by
>> index
>> 52        Noop           0     0     0                    00  End
>> WHERE-core
>> 53      Next           5     46    0                    00
>> 54      Noop           0     0     0                    00  End
>> WHERE-loop0: rolling
>> 55      AggFinal       14    1     0     min(1)         00  accum=r[14]
>> N=1
>> 56      Copy           14    12    0                    00  r[12]=r[14]
>> 57      DecrJumpZero   13    58    0                    00  if
>> (--r[13])==0 goto 58
>> 58      Copy           12    11    0                    00  r[11]=r[12]
>> 59      ResultRow      9     3     0                    00
>> output=r[9..11]
>> 60      Noop           0     0     0                    00  End WHERE-core
>> 61    Goto           0     26    0                    00
>> 62    Noop           0     0     0                    00  End
>> WHERE-loop0: x
>> 63    Halt           0     0     0                    00
>> 64    Transaction    0     0     3     0              01
>> usesStmtJournal=0
>> 65    Goto           0     1     0                    00
>> aaa|X1|0
>> aaa|X2|0
>> aaa|X3|0
>> aaa|X4|0
>> bbb|X1|0
>> bbb|X2|0
>> bbb|X3|0
>> bbb|X4|0
>> Run Time: real 0.134 user 0.000000 sys 0.000000
>> sqlite>
>>
>> ---
>> The fact that there's a Highway to Hell but only a Stairway to Heaven
>> says a lot about anticipated traffic volume.
>>
>>
>>
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [email protected]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Min/Max and skip-scan optimizations

Reply via email to