Re: [sqlite] Min/Max and skip-scan optimizations

Gerlando Falauto Mon, 04 Feb 2019 03:47:34 -0800

Thanks Keith, I'll give it a go and let you know!
I still don't get how that differs from
2) or 4) below, though.


Thanks again!
Gerlando

Il dom 3 feb 2019, 00:27 Keith Medcalf <[email protected]> ha scritto:

>
> Like this?
>
> SELECT rolling.source1,
>        rolling.source2,
>        ts,
>        value
>   FROM (
>         select distinct source1,
>                         source2
>           from rolling
>          where source1 = 'aaa'
>        ) as x
>   JOIN rolling
>     ON rolling.source1 = x.source1
>    AND rolling.source2 = x.source2
>  WHERE ts > 1
>    AND ts < 10
> ORDER BY 1,2,3;
>
>
> ---
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
> >-----Original Message-----
> >From: sqlite-users [mailto:sqlite-users-
> >[email protected]] On Behalf Of Gerlando Falauto
> >Sent: Saturday, 2 February, 2019 15:20
> >To: SQLite mailing list
> >Subject: Re: [sqlite] Min/Max and skip-scan optimizations
> >
> >Hi,
> >it's me again, struggling with indices once again.
> >What I'm now trying to do is filter on source1 and by range of
> >timestamp.
> >Results should be naturally ordered by source1, source2,ts.
> >
> >1)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE source1 = 'aaa'
> >AND ts > 1 AND ts < 10;
> >
> >QUERY PLAN
> >`--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND
> >ANY(source2)
> >AND ts>? AND ts<?)
> >
> >This looks pretty OK on the sample dataset.
> >For some reason though this doesn't really work like that with the
> >real
> >dataset (which has an extra index on ts only),
> >and that seems to kick in instead of the real index.
> >So the query plan comes out completely different, and rows are not
> >naturally sorted by source1, source2, as I'd like.
> >[I know I can use "+ts" and/or drop the "ts" index altogether, but
> >I'm
> >trying to make a point here...]
> >So I add an extra ORDER BY clause:
> >
> >2)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE source1 = 'aaa'
> >  AND ts > 1 AND ts < 100000000
> >ORDER BY source1, source2, ts;
> >
> >QUERY PLAN
> >|--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND
> >ANY(source2)
> >AND ts>? AND ts<?)
> >`--USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
> >
> >Here I don't really understand why a TEMP B-TREE is required at all.
> >Apparently, this adds extra processing at the end so I don't start
> >pulling
> >any rows until much later.
> >If the index is used, data would be *naturally* sorted by its key, so
> >I
> >don't really understand.
> >So I recur to a subquery:
> >
> >3)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE ((source1, source2) in
> >        (SELECT DISTINCT source1, source2 from rolling where
> >source1='aaa')
> >  AND ts > 1 AND ts < 10)
> >
> >QUERY PLAN
> >|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=?
> >AND
> >ts>? AND ts<?)
> >`--LIST SUBQUERY
> >   `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
> >
> >This also seems to yield the right results, in the right order.
> >But the order looks pretty much like a coincidence to me.
> >So I'd rather explicitly state the sorting I'd like:
> >
> >4)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE ((source1, source2) in
> >        (SELECT DISTINCT source1, source2 from rolling where
> >source1='aaa')
> >  AND ts > 1 AND ts < 10)
> >ORDER BY source1, source2, ts
> >
> >QUERY PLAN
> >|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=?
> >AND
> >ts>? AND ts<?)
> >|--LIST SUBQUERY
> >|  `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
> >`--USE TEMP B-TREE FOR ORDER BY
> >
> >Here the b-tree is *NOT* for "RIGHT PART OF" ORDER BY, it seems like
> >some
> >global sorting
> >which looks even worse than case 2.
> >
> >What am I doing wrong? Is this expected?
> >I just don't seem to be able to get what would have been a pretty
> >trivial
> >task with
> >last-century technologies (thinking of Paradox, dBase, CA-Clipper)...
> >
> >Thanks in advance!
> >Gerlando
> >
> >
> >On Mon, Jan 28, 2019 at 9:11 AM Gerlando Falauto
> ><[email protected]>
> >wrote:
> >
> >> YES! Thank you!
> >> Many thanks for the ".eqp full" tip also, that really explains a
> >lot
> >> (though I don't really understand any of it yet).
> >>
> >> Have a great day!
> >> Gerlando
> >>
> >>
> >> On Mon, Jan 28, 2019 at 6:50 AM Keith Medcalf <[email protected]>
> >wrote:
> >>
> >>>
> >>> Do you perhaps want this:
> >>>
> >>> select source1,
> >>>        source2,
> >>>        (
> >>>         select min(ts)
> >>>           from rolling
> >>>          where source1 = x.source1
> >>>            and source2 = x.source2
> >>>        )
> >>>   from (
> >>>         select distinct source1,
> >>>                         source2
> >>>           from rolling
> >>>        ) as x;
> >>>
> >>> SQLite version 3.27.0 2019-01-28 00:42:06
> >>> Enter ".help" for usage hints.
> >>> Connected to a transient in-memory database.
> >>> Use ".open FILENAME" to reopen on a persistent database.
> >>> sqlite> .timer on
> >>> sqlite> CREATE TABLE `rolling` (
> >>>    ...>     `source1`    TEXT NOT NULL,
> >>>    ...>     `source2`    TEXT NOT NULL,
> >>>    ...>     `ts`    INTEGER NOT NULL,
> >>>    ...>     `value`    TEXT
> >>>    ...> );
> >>> Run Time: real 0.002 user 0.000000 sys 0.000000
> >>> sqlite>
> >>> sqlite> CREATE INDEX `sources` ON `rolling` (
> >>>    ...>     `source1`,
> >>>    ...>     `source2`,
> >>>    ...>     `ts`
> >>>    ...> );
> >>> Run Time: real 0.001 user 0.000000 sys 0.000000
> >>> sqlite>
> >>> sqlite> INSERT INTO rolling
> >>>    ...>     WITH RECURSIVE
> >>>    ...>       src1( source1 ) AS ( VALUES("aaa") UNION ALL
> >VALUES("bbb")
> >>> ),
> >>>    ...>       src2( source2 ) AS ( VALUES("X1") UNION ALL
> >VALUES("X2")
> >>> UNION ALL
> >>>    ...> VALUES("X3") UNION ALL VALUES("X4") ),
> >>>    ...>       cnt( ts, value) AS (
> >>>    ...>       VALUES( 0, "ZZZZ")
> >>>    ...>         UNION ALL
> >>>    ...>       SELECT ts+1, value FROM cnt LIMIT 1000000)
> >>>    ...>
> >>>    ...>     select src1.source1, src2.source2, cnt.* from src1,
> >src2, cnt;
> >>> Run Time: real 8.920 user 8.843750 sys 0.078125
> >>> sqlite>
> >>> sqlite> analyze;
> >>> Run Time: real 1.285 user 1.281250 sys 0.000000
> >>> sqlite> .eqp full
> >>> sqlite>
> >>> sqlite> select source1,
> >>>    ...>        source2,
> >>>    ...>        (
> >>>    ...>         select min(ts)
> >>>    ...>           from rolling
> >>>    ...>          where source1 = x.source1
> >>>    ...>            and source2 = x.source2
> >>>    ...>        )
> >>>    ...>   from (
> >>>    ...>         select distinct source1,
> >>>    ...>                         source2
> >>>    ...>           from rolling
> >>>    ...>        ) as x;
> >>> QUERY PLAN
> >>> |--CO-ROUTINE 2
> >>> |  `--SCAN TABLE rolling USING COVERING INDEX sources (~7864320
> >rows)
> >>> |--SCAN SUBQUERY 2 AS x (~7864320 rows)
> >>> `--CORRELATED SCALAR SUBQUERY 1
> >>>    `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?
> >AND
> >>> source2=?) (~983040 rows)
> >>> addr  opcode         p1    p2    p3    p4             p5  comment
> >>> ----  -------------  ----  ----  ----  -------------  --  --------
> >-----
> >>> 0     Init           0     64    0                    00  Start at
> >64
> >>> 1     InitCoroutine  1     23    2                    00  x
> >>> 2     Null           1     4     0                    08
> >r[4]=NULL
> >>> 3     OpenRead       4     3     0     k(4,,,,)       00  root=3
> >iDb=0;
> >>> sources
> >>> 4     ColumnsUsed    4     0     0     3              00
> >>> 5     Explain        5     0     0     SCAN TABLE rolling USING
> >COVERING
> >>> INDEX sources (~7864320 rows)  00
> >>> 6     Noop           0     0     0                    00  Begin
> >>> WHERE-loop0: rolling
> >>> 7     Rewind         4     21    2     0              00
> >>> 8         Noop           0     0     0                    00
> >Begin
> >>> WHERE-core
> >>> 9         Column         4     0     2                    00
> >>> r[2]=rolling.source1
> >>> 10        Column         4     1     3                    00
> >>> r[3]=rolling.source2
> >>> 11        Ne             2     13    4     (BINARY)       80  if
> >>> r[4]!=r[2] goto 13
> >>> 12        Eq             3     20    5     (BINARY)       80  if
> >>> r[5]==r[3] goto 20
> >>> 13        Copy           2     4     1                    00
> >>> r[4..5]=r[2..3]
> >>> 14        Yield          1     0     0                    00
> >>> 15        Noop           0     0     0                    00  End
> >>> WHERE-core
> >>> 16        Column         4     0     6                    00
> >r[6]=
> >>> 17        Column         4     1     7                    00
> >r[7]=
> >>> 18        SeekGT         4     21    6     2              00
> >key=r[6..7]
> >>> 19      Goto           1     8     0                    00
> >>> 20    Next           4     8     0                    01
> >>> 21    Noop           0     0     0                    00  End
> >>> WHERE-loop0: rolling
> >>> 22    EndCoroutine   1     0     0                    00
> >>> 23    Explain        23    0     0     SCAN SUBQUERY 2 AS x
> >(~7864320
> >>> rows)  00
> >>> 24    Noop           0     0     0                    00  Begin
> >>> WHERE-loop0: x
> >>> 25    InitCoroutine  1     0     2                    00
> >>> 26      Yield          1     62    0                    00  next
> >row of x
> >>> 27      Noop           0     0     0                    00  Begin
> >>> WHERE-core
> >>> 28      Copy           2     9     0                    00
> >r[9]=r[2];
> >>> x.source1
> >>> 29      Copy           3     10    0                    00
> >r[10]=r[3];
> >>> x.source2
> >>> 30      Null           0     12    12                   00
> >>> r[12..12]=NULL; Init subquery result
> >>> 31      Integer        1     13    0                    00
> >r[13]=1;
> >>> LIMIT counter
> >>> 32      Null           0     14    15                   00
> >r[14..15]=NULL
> >>> 33      OpenRead       5     3     0     k(4,,,,)       00  root=3
> >iDb=0;
> >>> sources
> >>> 34      ColumnsUsed    5     0     0     7              00
> >>> 35      Explain        35    0     0     SEARCH TABLE rolling
> >USING
> >>> COVERING INDEX sources (source1=? AND source2=?) (~983040 rows)
> >00
> >>> 36      Noop           0     0     0                    00  Begin
> >>> WHERE-loop0: rolling
> >>> 37      Copy           2     16    0                    00
> >r[16]=r[2];
> >>> x.source1
> >>> 38      Copy           3     17    0                    00
> >r[17]=r[3];
> >>> x.source2
> >>> 39      CursorHint     5     0     0
> >AND(EQ(c0,r[16]),EQ(c1,r[17]))
> >>> 00
> >>> 40      Copy           2     18    0                    00
> >r[18]=r[2];
> >>> x.source1
> >>> 41      IsNull         18    54    0                    00  if
> >>> r[18]==NULL goto 54
> >>> 42      Copy           3     19    0                    00
> >r[19]=r[3];
> >>> x.source2
> >>> 43      IsNull         19    54    0                    00  if
> >>> r[19]==NULL goto 54
> >>> 44      Null           0     20    0                    00
> >r[20]=NULL
> >>> 45      SeekGT         5     54    18    3              00
> >key=r[18..20]
> >>> 46        IdxGT          5     54    18    2              00
> >>> key=r[18..19]
> >>> 47        Noop           0     0     0                    00
> >Begin
> >>> WHERE-core
> >>> 48        Column         5     2     21                   00
> >>> r[21]=rolling.ts
> >>> 49        CollSeq        0     0     0     (BINARY)       00
> >>> 50        AggStep        0     21    14    min(1)         01
> >accum=r[14]
> >>> step(r[21])
> >>> 51        Goto           0     55    0                    00
> >min() by
> >>> index
> >>> 52        Noop           0     0     0                    00  End
> >>> WHERE-core
> >>> 53      Next           5     46    0                    00
> >>> 54      Noop           0     0     0                    00  End
> >>> WHERE-loop0: rolling
> >>> 55      AggFinal       14    1     0     min(1)         00
> >accum=r[14]
> >>> N=1
> >>> 56      Copy           14    12    0                    00
> >r[12]=r[14]
> >>> 57      DecrJumpZero   13    58    0                    00  if
> >>> (--r[13])==0 goto 58
> >>> 58      Copy           12    11    0                    00
> >r[11]=r[12]
> >>> 59      ResultRow      9     3     0                    00
> >>> output=r[9..11]
> >>> 60      Noop           0     0     0                    00  End
> >WHERE-core
> >>> 61    Goto           0     26    0                    00
> >>> 62    Noop           0     0     0                    00  End
> >>> WHERE-loop0: x
> >>> 63    Halt           0     0     0                    00
> >>> 64    Transaction    0     0     3     0              01
> >>> usesStmtJournal=0
> >>> 65    Goto           0     1     0                    00
> >>> aaa|X1|0
> >>> aaa|X2|0
> >>> aaa|X3|0
> >>> aaa|X4|0
> >>> bbb|X1|0
> >>> bbb|X2|0
> >>> bbb|X3|0
> >>> bbb|X4|0
> >>> Run Time: real 0.134 user 0.000000 sys 0.000000
> >>> sqlite>
> >>>
> >>> ---
> >>> The fact that there's a Highway to Hell but only a Stairway to
> >Heaven
> >>> says a lot about anticipated traffic volume.
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> sqlite-users mailing list
> >>> [email protected]
> >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
> >users
> >>>
> >>
> >_______________________________________________
> >sqlite-users mailing list
> >[email protected]
> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Min/Max and skip-scan optimizations

Reply via email to