Re: [sqlite] Min/Max and skip-scan optimizations

Gerlando Falauto Mon, 04 Feb 2019 03:47:33 -0800

Hi Keith,

here's what I get as a query plan for your query:


5)
SELECT rolling.source1,
       rolling.source2,
       ts,
       value
  FROM (
        select distinct source1,
                        source2
          from rolling
         where source1 = 'aaa'
       ) as x
  JOIN rolling
    ON rolling.source1 = x.source1
   AND rolling.source2 = x.source2
 WHERE ts > 1
   AND ts < 100000000
ORDER BY 1,2,3;

QUERY PLAN
|--MATERIALIZE 1
|  `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
|--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND ANY(source2)
AND ts>? AND ts<?)
`--SEARCH SUBQUERY 1 AS x USING AUTOMATIC COVERING INDEX (source2=? AND
source1=?)
Run Time: real 40.343 user 39.380000 sys 0.710000

This looks only "slightly" slower than the optimum:

1)
SELECT source1, source2, ts, value
FROM rolling
WHERE source1 = 'aaa'
AND ts > 1 AND ts < 100000000;

QUERY PLAN
`--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND ANY(source2)
AND ts>? AND ts<?)
Run Time: real 29.435 user 28.030000 sys 0.880000

What bothers me most is the MATERIALIZE step, which I believe should be
related to the SELECT DISTINCT step.
If that's really the case, I know for a fact this will only yield a very
limited number of results, so that may not be that much of an issue.
Is my understanding correct?
Yet I don't understand why this materializing step does apply to 5) and not
3), while the documentation seems to suggest quite the opposite.

IMHO, adding the ORDER BY clause to query 1) above (i.e. query 2) should
ideally yield the exact same query plan.
In the end adding an ORDER BY clause on the exact same columns of the index
used to traverse the table, should be easily recognizable.
Knowing absolutely nothing about the internals though, I have no idea
whether this particular use case has been overlooked, or it would just be
unfeasible to handle it.

Thank you again everyone!
Gerlando

On Sun, Feb 3, 2019 at 12:27 AM Keith Medcalf <kmedc...@dessus.com> wrote:

>
> Like this?
>
> SELECT rolling.source1,
>        rolling.source2,
>        ts,
>        value
>   FROM (
>         select distinct source1,
>                         source2
>           from rolling
>          where source1 = 'aaa'
>        ) as x
>   JOIN rolling
>     ON rolling.source1 = x.source1
>    AND rolling.source2 = x.source2
>  WHERE ts > 1
>    AND ts < 10
> ORDER BY 1,2,3;
>
>
> ---
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
> >-----Original Message-----
> >From: sqlite-users [mailto:sqlite-users-
> >boun...@mailinglists.sqlite.org] On Behalf Of Gerlando Falauto
> >Sent: Saturday, 2 February, 2019 15:20
> >To: SQLite mailing list
> >Subject: Re: [sqlite] Min/Max and skip-scan optimizations
> >
> >Hi,
> >it's me again, struggling with indices once again.
> >What I'm now trying to do is filter on source1 and by range of
> >timestamp.
> >Results should be naturally ordered by source1, source2,ts.
> >
> >1)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE source1 = 'aaa'
> >AND ts > 1 AND ts < 10;
> >
> >QUERY PLAN
> >`--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND
> >ANY(source2)
> >AND ts>? AND ts<?)
> >
> >This looks pretty OK on the sample dataset.
> >For some reason though this doesn't really work like that with the
> >real
> >dataset (which has an extra index on ts only),
> >and that seems to kick in instead of the real index.
> >So the query plan comes out completely different, and rows are not
> >naturally sorted by source1, source2, as I'd like.
> >[I know I can use "+ts" and/or drop the "ts" index altogether, but
> >I'm
> >trying to make a point here...]
> >So I add an extra ORDER BY clause:
> >
> >2)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE source1 = 'aaa'
> >  AND ts > 1 AND ts < 100000000
> >ORDER BY source1, source2, ts;
> >
> >QUERY PLAN
> >|--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND
> >ANY(source2)
> >AND ts>? AND ts<?)
> >`--USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
> >
> >Here I don't really understand why a TEMP B-TREE is required at all.
> >Apparently, this adds extra processing at the end so I don't start
> >pulling
> >any rows until much later.
> >If the index is used, data would be *naturally* sorted by its key, so
> >I
> >don't really understand.
> >So I recur to a subquery:
> >
> >3)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE ((source1, source2) in
> >        (SELECT DISTINCT source1, source2 from rolling where
> >source1='aaa')
> >  AND ts > 1 AND ts < 10)
> >
> >QUERY PLAN
> >|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=?
> >AND
> >ts>? AND ts<?)
> >`--LIST SUBQUERY
> >   `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
> >
> >This also seems to yield the right results, in the right order.
> >But the order looks pretty much like a coincidence to me.
> >So I'd rather explicitly state the sorting I'd like:
> >
> >4)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE ((source1, source2) in
> >        (SELECT DISTINCT source1, source2 from rolling where
> >source1='aaa')
> >  AND ts > 1 AND ts < 10)
> >ORDER BY source1, source2, ts
> >
> >QUERY PLAN
> >|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=?
> >AND
> >ts>? AND ts<?)
> >|--LIST SUBQUERY
> >|  `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
> >`--USE TEMP B-TREE FOR ORDER BY
> >
> >Here the b-tree is *NOT* for "RIGHT PART OF" ORDER BY, it seems like
> >some
> >global sorting
> >which looks even worse than case 2.
> >
> >What am I doing wrong? Is this expected?
> >I just don't seem to be able to get what would have been a pretty
> >trivial
> >task with
> >last-century technologies (thinking of Paradox, dBase, CA-Clipper)...
> >
> >Thanks in advance!
> >Gerlando
> >
> >
> >On Mon, Jan 28, 2019 at 9:11 AM Gerlando Falauto
> ><gerlando.fala...@gmail.com>
> >wrote:
> >
> >> YES! Thank you!
> >> Many thanks for the ".eqp full" tip also, that really explains a
> >lot
> >> (though I don't really understand any of it yet).
> >>
> >> Have a great day!
> >> Gerlando
> >>
> >>
> >> On Mon, Jan 28, 2019 at 6:50 AM Keith Medcalf <kmedc...@dessus.com>
> >wrote:
> >>
> >>>
> >>> Do you perhaps want this:
> >>>
> >>> select source1,
> >>>        source2,
> >>>        (
> >>>         select min(ts)
> >>>           from rolling
> >>>          where source1 = x.source1
> >>>            and source2 = x.source2
> >>>        )
> >>>   from (
> >>>         select distinct source1,
> >>>                         source2
> >>>           from rolling
> >>>        ) as x;
> >>>
> >>> SQLite version 3.27.0 2019-01-28 00:42:06
> >>> Enter ".help" for usage hints.
> >>> Connected to a transient in-memory database.
> >>> Use ".open FILENAME" to reopen on a persistent database.
> >>> sqlite> .timer on
> >>> sqlite> CREATE TABLE `rolling` (
> >>>    ...>     `source1`    TEXT NOT NULL,
> >>>    ...>     `source2`    TEXT NOT NULL,
> >>>    ...>     `ts`    INTEGER NOT NULL,
> >>>    ...>     `value`    TEXT
> >>>    ...> );
> >>> Run Time: real 0.002 user 0.000000 sys 0.000000
> >>> sqlite>
> >>> sqlite> CREATE INDEX `sources` ON `rolling` (
> >>>    ...>     `source1`,
> >>>    ...>     `source2`,
> >>>    ...>     `ts`
> >>>    ...> );
> >>> Run Time: real 0.001 user 0.000000 sys 0.000000
> >>> sqlite>
> >>> sqlite> INSERT INTO rolling
> >>>    ...>     WITH RECURSIVE
> >>>    ...>       src1( source1 ) AS ( VALUES("aaa") UNION ALL
> >VALUES("bbb")
> >>> ),
> >>>    ...>       src2( source2 ) AS ( VALUES("X1") UNION ALL
> >VALUES("X2")
> >>> UNION ALL
> >>>    ...> VALUES("X3") UNION ALL VALUES("X4") ),
> >>>    ...>       cnt( ts, value) AS (
> >>>    ...>       VALUES( 0, "ZZZZ")
> >>>    ...>         UNION ALL
> >>>    ...>       SELECT ts+1, value FROM cnt LIMIT 1000000)
> >>>    ...>
> >>>    ...>     select src1.source1, src2.source2, cnt.* from src1,
> >src2, cnt;
> >>> Run Time: real 8.920 user 8.843750 sys 0.078125
> >>> sqlite>
> >>> sqlite> analyze;
> >>> Run Time: real 1.285 user 1.281250 sys 0.000000
> >>> sqlite> .eqp full
> >>> sqlite>
> >>> sqlite> select source1,
> >>>    ...>        source2,
> >>>    ...>        (
> >>>    ...>         select min(ts)
> >>>    ...>           from rolling
> >>>    ...>          where source1 = x.source1
> >>>    ...>            and source2 = x.source2
> >>>    ...>        )
> >>>    ...>   from (
> >>>    ...>         select distinct source1,
> >>>    ...>                         source2
> >>>    ...>           from rolling
> >>>    ...>        ) as x;
> >>> QUERY PLAN
> >>> |--CO-ROUTINE 2
> >>> |  `--SCAN TABLE rolling USING COVERING INDEX sources (~7864320
> >rows)
> >>> |--SCAN SUBQUERY 2 AS x (~7864320 rows)
> >>> `--CORRELATED SCALAR SUBQUERY 1
> >>>    `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?
> >AND
> >>> source2=?) (~983040 rows)
> >>> addr  opcode         p1    p2    p3    p4             p5  comment
> >>> ----  -------------  ----  ----  ----  -------------  --  --------
> >-----
> >>> 0     Init           0     64    0                    00  Start at
> >64
> >>> 1     InitCoroutine  1     23    2                    00  x
> >>> 2     Null           1     4     0                    08
> >r[4]=NULL
> >>> 3     OpenRead       4     3     0     k(4,,,,)       00  root=3
> >iDb=0;
> >>> sources
> >>> 4     ColumnsUsed    4     0     0     3              00
> >>> 5     Explain        5     0     0     SCAN TABLE rolling USING
> >COVERING
> >>> INDEX sources (~7864320 rows)  00
> >>> 6     Noop           0     0     0                    00  Begin
> >>> WHERE-loop0: rolling
> >>> 7     Rewind         4     21    2     0              00
> >>> 8         Noop           0     0     0                    00
> >Begin
> >>> WHERE-core
> >>> 9         Column         4     0     2                    00
> >>> r[2]=rolling.source1
> >>> 10        Column         4     1     3                    00
> >>> r[3]=rolling.source2
> >>> 11        Ne             2     13    4     (BINARY)       80  if
> >>> r[4]!=r[2] goto 13
> >>> 12        Eq             3     20    5     (BINARY)       80  if
> >>> r[5]==r[3] goto 20
> >>> 13        Copy           2     4     1                    00
> >>> r[4..5]=r[2..3]
> >>> 14        Yield          1     0     0                    00
> >>> 15        Noop           0     0     0                    00  End
> >>> WHERE-core
> >>> 16        Column         4     0     6                    00
> >r[6]=
> >>> 17        Column         4     1     7                    00
> >r[7]=
> >>> 18        SeekGT         4     21    6     2              00
> >key=r[6..7]
> >>> 19      Goto           1     8     0                    00
> >>> 20    Next           4     8     0                    01
> >>> 21    Noop           0     0     0                    00  End
> >>> WHERE-loop0: rolling
> >>> 22    EndCoroutine   1     0     0                    00
> >>> 23    Explain        23    0     0     SCAN SUBQUERY 2 AS x
> >(~7864320
> >>> rows)  00
> >>> 24    Noop           0     0     0                    00  Begin
> >>> WHERE-loop0: x
> >>> 25    InitCoroutine  1     0     2                    00
> >>> 26      Yield          1     62    0                    00  next
> >row of x
> >>> 27      Noop           0     0     0                    00  Begin
> >>> WHERE-core
> >>> 28      Copy           2     9     0                    00
> >r[9]=r[2];
> >>> x.source1
> >>> 29      Copy           3     10    0                    00
> >r[10]=r[3];
> >>> x.source2
> >>> 30      Null           0     12    12                   00
> >>> r[12..12]=NULL; Init subquery result
> >>> 31      Integer        1     13    0                    00
> >r[13]=1;
> >>> LIMIT counter
> >>> 32      Null           0     14    15                   00
> >r[14..15]=NULL
> >>> 33      OpenRead       5     3     0     k(4,,,,)       00  root=3
> >iDb=0;
> >>> sources
> >>> 34      ColumnsUsed    5     0     0     7              00
> >>> 35      Explain        35    0     0     SEARCH TABLE rolling
> >USING
> >>> COVERING INDEX sources (source1=? AND source2=?) (~983040 rows)
> >00
> >>> 36      Noop           0     0     0                    00  Begin
> >>> WHERE-loop0: rolling
> >>> 37      Copy           2     16    0                    00
> >r[16]=r[2];
> >>> x.source1
> >>> 38      Copy           3     17    0                    00
> >r[17]=r[3];
> >>> x.source2
> >>> 39      CursorHint     5     0     0
> >AND(EQ(c0,r[16]),EQ(c1,r[17]))
> >>> 00
> >>> 40      Copy           2     18    0                    00
> >r[18]=r[2];
> >>> x.source1
> >>> 41      IsNull         18    54    0                    00  if
> >>> r[18]==NULL goto 54
> >>> 42      Copy           3     19    0                    00
> >r[19]=r[3];
> >>> x.source2
> >>> 43      IsNull         19    54    0                    00  if
> >>> r[19]==NULL goto 54
> >>> 44      Null           0     20    0                    00
> >r[20]=NULL
> >>> 45      SeekGT         5     54    18    3              00
> >key=r[18..20]
> >>> 46        IdxGT          5     54    18    2              00
> >>> key=r[18..19]
> >>> 47        Noop           0     0     0                    00
> >Begin
> >>> WHERE-core
> >>> 48        Column         5     2     21                   00
> >>> r[21]=rolling.ts
> >>> 49        CollSeq        0     0     0     (BINARY)       00
> >>> 50        AggStep        0     21    14    min(1)         01
> >accum=r[14]
> >>> step(r[21])
> >>> 51        Goto           0     55    0                    00
> >min() by
> >>> index
> >>> 52        Noop           0     0     0                    00  End
> >>> WHERE-core
> >>> 53      Next           5     46    0                    00
> >>> 54      Noop           0     0     0                    00  End
> >>> WHERE-loop0: rolling
> >>> 55      AggFinal       14    1     0     min(1)         00
> >accum=r[14]
> >>> N=1
> >>> 56      Copy           14    12    0                    00
> >r[12]=r[14]
> >>> 57      DecrJumpZero   13    58    0                    00  if
> >>> (--r[13])==0 goto 58
> >>> 58      Copy           12    11    0                    00
> >r[11]=r[12]
> >>> 59      ResultRow      9     3     0                    00
> >>> output=r[9..11]
> >>> 60      Noop           0     0     0                    00  End
> >WHERE-core
> >>> 61    Goto           0     26    0                    00
> >>> 62    Noop           0     0     0                    00  End
> >>> WHERE-loop0: x
> >>> 63    Halt           0     0     0                    00
> >>> 64    Transaction    0     0     3     0              01
> >>> usesStmtJournal=0
> >>> 65    Goto           0     1     0                    00
> >>> aaa|X1|0
> >>> aaa|X2|0
> >>> aaa|X3|0
> >>> aaa|X4|0
> >>> bbb|X1|0
> >>> bbb|X2|0
> >>> bbb|X3|0
> >>> bbb|X4|0
> >>> Run Time: real 0.134 user 0.000000 sys 0.000000
> >>> sqlite>
> >>>
> >>> ---
> >>> The fact that there's a Highway to Hell but only a Stairway to
> >Heaven
> >>> says a lot about anticipated traffic volume.
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> sqlite-users mailing list
> >>> sqlite-users@mailinglists.sqlite.org
> >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
> >users
> >>>
> >>
> >_______________________________________________
> >sqlite-users mailing list
> >sqlite-users@mailinglists.sqlite.org
> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Min/Max and skip-scan optimizations

Reply via email to