Re: [sqlite] Min/Max and skip-scan optimizations

Keith Medcalf Sat, 02 Feb 2019 15:29:22 -0800

Like this?

SELECT rolling.source1,
       rolling.source2,
       ts,
       value
  FROM (
        select distinct source1,
                        source2
          from rolling
         where source1 = 'aaa'
       ) as x
  JOIN rolling
    ON rolling.source1 = x.source1
   AND rolling.source2 = x.source2
 WHERE ts > 1
   AND ts < 10
ORDER BY 1,2,3;



---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[email protected]] On Behalf Of Gerlando Falauto
>Sent: Saturday, 2 February, 2019 15:20
>To: SQLite mailing list
>Subject: Re: [sqlite] Min/Max and skip-scan optimizations
>
>Hi,
>it's me again, struggling with indices once again.
>What I'm now trying to do is filter on source1 and by range of
>timestamp.
>Results should be naturally ordered by source1, source2,ts.
>
>1)
>
>SELECT source1, source2, ts, value
>FROM rolling
>WHERE source1 = 'aaa'
>AND ts > 1 AND ts < 10;
>
>QUERY PLAN
>`--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND
>ANY(source2)
>AND ts>? AND ts<?)
>
>This looks pretty OK on the sample dataset.
>For some reason though this doesn't really work like that with the
>real
>dataset (which has an extra index on ts only),
>and that seems to kick in instead of the real index.
>So the query plan comes out completely different, and rows are not
>naturally sorted by source1, source2, as I'd like.
>[I know I can use "+ts" and/or drop the "ts" index altogether, but
>I'm
>trying to make a point here...]
>So I add an extra ORDER BY clause:
>
>2)
>
>SELECT source1, source2, ts, value
>FROM rolling
>WHERE source1 = 'aaa'
>  AND ts > 1 AND ts < 100000000
>ORDER BY source1, source2, ts;
>
>QUERY PLAN
>|--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND
>ANY(source2)
>AND ts>? AND ts<?)
>`--USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
>
>Here I don't really understand why a TEMP B-TREE is required at all.
>Apparently, this adds extra processing at the end so I don't start
>pulling
>any rows until much later.
>If the index is used, data would be *naturally* sorted by its key, so
>I
>don't really understand.
>So I recur to a subquery:
>
>3)
>
>SELECT source1, source2, ts, value
>FROM rolling
>WHERE ((source1, source2) in
>        (SELECT DISTINCT source1, source2 from rolling where
>source1='aaa')
>  AND ts > 1 AND ts < 10)
>
>QUERY PLAN
>|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=?
>AND
>ts>? AND ts<?)
>`--LIST SUBQUERY
>   `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
>
>This also seems to yield the right results, in the right order.
>But the order looks pretty much like a coincidence to me.
>So I'd rather explicitly state the sorting I'd like:
>
>4)
>
>SELECT source1, source2, ts, value
>FROM rolling
>WHERE ((source1, source2) in
>        (SELECT DISTINCT source1, source2 from rolling where
>source1='aaa')
>  AND ts > 1 AND ts < 10)
>ORDER BY source1, source2, ts
>
>QUERY PLAN
>|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=?
>AND
>ts>? AND ts<?)
>|--LIST SUBQUERY
>|  `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
>`--USE TEMP B-TREE FOR ORDER BY
>
>Here the b-tree is *NOT* for "RIGHT PART OF" ORDER BY, it seems like
>some
>global sorting
>which looks even worse than case 2.
>
>What am I doing wrong? Is this expected?
>I just don't seem to be able to get what would have been a pretty
>trivial
>task with
>last-century technologies (thinking of Paradox, dBase, CA-Clipper)...
>
>Thanks in advance!
>Gerlando
>
>
>On Mon, Jan 28, 2019 at 9:11 AM Gerlando Falauto
><[email protected]>
>wrote:
>
>> YES! Thank you!
>> Many thanks for the ".eqp full" tip also, that really explains a
>lot
>> (though I don't really understand any of it yet).
>>
>> Have a great day!
>> Gerlando
>>
>>
>> On Mon, Jan 28, 2019 at 6:50 AM Keith Medcalf <[email protected]>
>wrote:
>>
>>>
>>> Do you perhaps want this:
>>>
>>> select source1,
>>>        source2,
>>>        (
>>>         select min(ts)
>>>           from rolling
>>>          where source1 = x.source1
>>>            and source2 = x.source2
>>>        )
>>>   from (
>>>         select distinct source1,
>>>                         source2
>>>           from rolling
>>>        ) as x;
>>>
>>> SQLite version 3.27.0 2019-01-28 00:42:06
>>> Enter ".help" for usage hints.
>>> Connected to a transient in-memory database.
>>> Use ".open FILENAME" to reopen on a persistent database.
>>> sqlite> .timer on
>>> sqlite> CREATE TABLE `rolling` (
>>>    ...>     `source1`    TEXT NOT NULL,
>>>    ...>     `source2`    TEXT NOT NULL,
>>>    ...>     `ts`    INTEGER NOT NULL,
>>>    ...>     `value`    TEXT
>>>    ...> );
>>> Run Time: real 0.002 user 0.000000 sys 0.000000
>>> sqlite>
>>> sqlite> CREATE INDEX `sources` ON `rolling` (
>>>    ...>     `source1`,
>>>    ...>     `source2`,
>>>    ...>     `ts`
>>>    ...> );
>>> Run Time: real 0.001 user 0.000000 sys 0.000000
>>> sqlite>
>>> sqlite> INSERT INTO rolling
>>>    ...>     WITH RECURSIVE
>>>    ...>       src1( source1 ) AS ( VALUES("aaa") UNION ALL
>VALUES("bbb")
>>> ),
>>>    ...>       src2( source2 ) AS ( VALUES("X1") UNION ALL
>VALUES("X2")
>>> UNION ALL
>>>    ...> VALUES("X3") UNION ALL VALUES("X4") ),
>>>    ...>       cnt( ts, value) AS (
>>>    ...>       VALUES( 0, "ZZZZ")
>>>    ...>         UNION ALL
>>>    ...>       SELECT ts+1, value FROM cnt LIMIT 1000000)
>>>    ...>
>>>    ...>     select src1.source1, src2.source2, cnt.* from src1,
>src2, cnt;
>>> Run Time: real 8.920 user 8.843750 sys 0.078125
>>> sqlite>
>>> sqlite> analyze;
>>> Run Time: real 1.285 user 1.281250 sys 0.000000
>>> sqlite> .eqp full
>>> sqlite>
>>> sqlite> select source1,
>>>    ...>        source2,
>>>    ...>        (
>>>    ...>         select min(ts)
>>>    ...>           from rolling
>>>    ...>          where source1 = x.source1
>>>    ...>            and source2 = x.source2
>>>    ...>        )
>>>    ...>   from (
>>>    ...>         select distinct source1,
>>>    ...>                         source2
>>>    ...>           from rolling
>>>    ...>        ) as x;
>>> QUERY PLAN
>>> |--CO-ROUTINE 2
>>> |  `--SCAN TABLE rolling USING COVERING INDEX sources (~7864320
>rows)
>>> |--SCAN SUBQUERY 2 AS x (~7864320 rows)
>>> `--CORRELATED SCALAR SUBQUERY 1
>>>    `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?
>AND
>>> source2=?) (~983040 rows)
>>> addr  opcode         p1    p2    p3    p4             p5  comment
>>> ----  -------------  ----  ----  ----  -------------  --  --------
>-----
>>> 0     Init           0     64    0                    00  Start at
>64
>>> 1     InitCoroutine  1     23    2                    00  x
>>> 2     Null           1     4     0                    08
>r[4]=NULL
>>> 3     OpenRead       4     3     0     k(4,,,,)       00  root=3
>iDb=0;
>>> sources
>>> 4     ColumnsUsed    4     0     0     3              00
>>> 5     Explain        5     0     0     SCAN TABLE rolling USING
>COVERING
>>> INDEX sources (~7864320 rows)  00
>>> 6     Noop           0     0     0                    00  Begin
>>> WHERE-loop0: rolling
>>> 7     Rewind         4     21    2     0              00
>>> 8         Noop           0     0     0                    00
>Begin
>>> WHERE-core
>>> 9         Column         4     0     2                    00
>>> r[2]=rolling.source1
>>> 10        Column         4     1     3                    00
>>> r[3]=rolling.source2
>>> 11        Ne             2     13    4     (BINARY)       80  if
>>> r[4]!=r[2] goto 13
>>> 12        Eq             3     20    5     (BINARY)       80  if
>>> r[5]==r[3] goto 20
>>> 13        Copy           2     4     1                    00
>>> r[4..5]=r[2..3]
>>> 14        Yield          1     0     0                    00
>>> 15        Noop           0     0     0                    00  End
>>> WHERE-core
>>> 16        Column         4     0     6                    00
>r[6]=
>>> 17        Column         4     1     7                    00
>r[7]=
>>> 18        SeekGT         4     21    6     2              00
>key=r[6..7]
>>> 19      Goto           1     8     0                    00
>>> 20    Next           4     8     0                    01
>>> 21    Noop           0     0     0                    00  End
>>> WHERE-loop0: rolling
>>> 22    EndCoroutine   1     0     0                    00
>>> 23    Explain        23    0     0     SCAN SUBQUERY 2 AS x
>(~7864320
>>> rows)  00
>>> 24    Noop           0     0     0                    00  Begin
>>> WHERE-loop0: x
>>> 25    InitCoroutine  1     0     2                    00
>>> 26      Yield          1     62    0                    00  next
>row of x
>>> 27      Noop           0     0     0                    00  Begin
>>> WHERE-core
>>> 28      Copy           2     9     0                    00
>r[9]=r[2];
>>> x.source1
>>> 29      Copy           3     10    0                    00
>r[10]=r[3];
>>> x.source2
>>> 30      Null           0     12    12                   00
>>> r[12..12]=NULL; Init subquery result
>>> 31      Integer        1     13    0                    00
>r[13]=1;
>>> LIMIT counter
>>> 32      Null           0     14    15                   00
>r[14..15]=NULL
>>> 33      OpenRead       5     3     0     k(4,,,,)       00  root=3
>iDb=0;
>>> sources
>>> 34      ColumnsUsed    5     0     0     7              00
>>> 35      Explain        35    0     0     SEARCH TABLE rolling
>USING
>>> COVERING INDEX sources (source1=? AND source2=?) (~983040 rows)
>00
>>> 36      Noop           0     0     0                    00  Begin
>>> WHERE-loop0: rolling
>>> 37      Copy           2     16    0                    00
>r[16]=r[2];
>>> x.source1
>>> 38      Copy           3     17    0                    00
>r[17]=r[3];
>>> x.source2
>>> 39      CursorHint     5     0     0
>AND(EQ(c0,r[16]),EQ(c1,r[17]))
>>> 00
>>> 40      Copy           2     18    0                    00
>r[18]=r[2];
>>> x.source1
>>> 41      IsNull         18    54    0                    00  if
>>> r[18]==NULL goto 54
>>> 42      Copy           3     19    0                    00
>r[19]=r[3];
>>> x.source2
>>> 43      IsNull         19    54    0                    00  if
>>> r[19]==NULL goto 54
>>> 44      Null           0     20    0                    00
>r[20]=NULL
>>> 45      SeekGT         5     54    18    3              00
>key=r[18..20]
>>> 46        IdxGT          5     54    18    2              00
>>> key=r[18..19]
>>> 47        Noop           0     0     0                    00
>Begin
>>> WHERE-core
>>> 48        Column         5     2     21                   00
>>> r[21]=rolling.ts
>>> 49        CollSeq        0     0     0     (BINARY)       00
>>> 50        AggStep        0     21    14    min(1)         01
>accum=r[14]
>>> step(r[21])
>>> 51        Goto           0     55    0                    00
>min() by
>>> index
>>> 52        Noop           0     0     0                    00  End
>>> WHERE-core
>>> 53      Next           5     46    0                    00
>>> 54      Noop           0     0     0                    00  End
>>> WHERE-loop0: rolling
>>> 55      AggFinal       14    1     0     min(1)         00
>accum=r[14]
>>> N=1
>>> 56      Copy           14    12    0                    00
>r[12]=r[14]
>>> 57      DecrJumpZero   13    58    0                    00  if
>>> (--r[13])==0 goto 58
>>> 58      Copy           12    11    0                    00
>r[11]=r[12]
>>> 59      ResultRow      9     3     0                    00
>>> output=r[9..11]
>>> 60      Noop           0     0     0                    00  End
>WHERE-core
>>> 61    Goto           0     26    0                    00
>>> 62    Noop           0     0     0                    00  End
>>> WHERE-loop0: x
>>> 63    Halt           0     0     0                    00
>>> 64    Transaction    0     0     3     0              01
>>> usesStmtJournal=0
>>> 65    Goto           0     1     0                    00
>>> aaa|X1|0
>>> aaa|X2|0
>>> aaa|X3|0
>>> aaa|X4|0
>>> bbb|X1|0
>>> bbb|X2|0
>>> bbb|X3|0
>>> bbb|X4|0
>>> Run Time: real 0.134 user 0.000000 sys 0.000000
>>> sqlite>
>>>
>>> ---
>>> The fact that there's a Highway to Hell but only a Stairway to
>Heaven
>>> says a lot about anticipated traffic volume.
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [email protected]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
>users
>>>
>>
>_______________________________________________
>sqlite-users mailing list
>[email protected]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Min/Max and skip-scan optimizations

Reply via email to