Hi Keith, here's what I get as a query plan for your query:
5) SELECT rolling.source1, rolling.source2, ts, value FROM ( select distinct source1, source2 from rolling where source1 = 'aaa' ) as x JOIN rolling ON rolling.source1 = x.source1 AND rolling.source2 = x.source2 WHERE ts > 1 AND ts < 100000000 ORDER BY 1,2,3; QUERY PLAN |--MATERIALIZE 1 | `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?) |--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND ANY(source2) AND ts>? AND ts<?) `--SEARCH SUBQUERY 1 AS x USING AUTOMATIC COVERING INDEX (source2=? AND source1=?) Run Time: real 40.343 user 39.380000 sys 0.710000 This looks only "slightly" slower than the optimum: 1) SELECT source1, source2, ts, value FROM rolling WHERE source1 = 'aaa' AND ts > 1 AND ts < 100000000; QUERY PLAN `--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND ANY(source2) AND ts>? AND ts<?) Run Time: real 29.435 user 28.030000 sys 0.880000 What bothers me most is the MATERIALIZE step, which I believe should be related to the SELECT DISTINCT step. If that's really the case, I know for a fact this will only yield a very limited number of results, so that may not be that much of an issue. Is my understanding correct? Yet I don't understand why this materializing step does apply to 5) and not 3), while the documentation seems to suggest quite the opposite. IMHO, adding the ORDER BY clause to query 1) above (i.e. query 2) should ideally yield the exact same query plan. In the end adding an ORDER BY clause on the exact same columns of the index used to traverse the table, should be easily recognizable. Knowing absolutely nothing about the internals though, I have no idea whether this particular use case has been overlooked, or it would just be unfeasible to handle it. Thank you again everyone! Gerlando On Sun, Feb 3, 2019 at 12:27 AM Keith Medcalf <kmedc...@dessus.com> wrote: > > Like this? > > SELECT rolling.source1, > rolling.source2, > ts, > value > FROM ( > select distinct source1, > source2 > from rolling > where source1 = 'aaa' > ) as x > JOIN rolling > ON rolling.source1 = x.source1 > AND rolling.source2 = x.source2 > WHERE ts > 1 > AND ts < 10 > ORDER BY 1,2,3; > > > --- > The fact that there's a Highway to Hell but only a Stairway to Heaven says > a lot about anticipated traffic volume. > > >-----Original Message----- > >From: sqlite-users [mailto:sqlite-users- > >boun...@mailinglists.sqlite.org] On Behalf Of Gerlando Falauto > >Sent: Saturday, 2 February, 2019 15:20 > >To: SQLite mailing list > >Subject: Re: [sqlite] Min/Max and skip-scan optimizations > > > >Hi, > >it's me again, struggling with indices once again. > >What I'm now trying to do is filter on source1 and by range of > >timestamp. > >Results should be naturally ordered by source1, source2,ts. > > > >1) > > > >SELECT source1, source2, ts, value > >FROM rolling > >WHERE source1 = 'aaa' > >AND ts > 1 AND ts < 10; > > > >QUERY PLAN > >`--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND > >ANY(source2) > >AND ts>? AND ts<?) > > > >This looks pretty OK on the sample dataset. > >For some reason though this doesn't really work like that with the > >real > >dataset (which has an extra index on ts only), > >and that seems to kick in instead of the real index. > >So the query plan comes out completely different, and rows are not > >naturally sorted by source1, source2, as I'd like. > >[I know I can use "+ts" and/or drop the "ts" index altogether, but > >I'm > >trying to make a point here...] > >So I add an extra ORDER BY clause: > > > >2) > > > >SELECT source1, source2, ts, value > >FROM rolling > >WHERE source1 = 'aaa' > > AND ts > 1 AND ts < 100000000 > >ORDER BY source1, source2, ts; > > > >QUERY PLAN > >|--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND > >ANY(source2) > >AND ts>? AND ts<?) > >`--USE TEMP B-TREE FOR RIGHT PART OF ORDER BY > > > >Here I don't really understand why a TEMP B-TREE is required at all. > >Apparently, this adds extra processing at the end so I don't start > >pulling > >any rows until much later. > >If the index is used, data would be *naturally* sorted by its key, so > >I > >don't really understand. > >So I recur to a subquery: > > > >3) > > > >SELECT source1, source2, ts, value > >FROM rolling > >WHERE ((source1, source2) in > > (SELECT DISTINCT source1, source2 from rolling where > >source1='aaa') > > AND ts > 1 AND ts < 10) > > > >QUERY PLAN > >|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=? > >AND > >ts>? AND ts<?) > >`--LIST SUBQUERY > > `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?) > > > >This also seems to yield the right results, in the right order. > >But the order looks pretty much like a coincidence to me. > >So I'd rather explicitly state the sorting I'd like: > > > >4) > > > >SELECT source1, source2, ts, value > >FROM rolling > >WHERE ((source1, source2) in > > (SELECT DISTINCT source1, source2 from rolling where > >source1='aaa') > > AND ts > 1 AND ts < 10) > >ORDER BY source1, source2, ts > > > >QUERY PLAN > >|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=? > >AND > >ts>? AND ts<?) > >|--LIST SUBQUERY > >| `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?) > >`--USE TEMP B-TREE FOR ORDER BY > > > >Here the b-tree is *NOT* for "RIGHT PART OF" ORDER BY, it seems like > >some > >global sorting > >which looks even worse than case 2. > > > >What am I doing wrong? Is this expected? > >I just don't seem to be able to get what would have been a pretty > >trivial > >task with > >last-century technologies (thinking of Paradox, dBase, CA-Clipper)... > > > >Thanks in advance! > >Gerlando > > > > > >On Mon, Jan 28, 2019 at 9:11 AM Gerlando Falauto > ><gerlando.fala...@gmail.com> > >wrote: > > > >> YES! Thank you! > >> Many thanks for the ".eqp full" tip also, that really explains a > >lot > >> (though I don't really understand any of it yet). > >> > >> Have a great day! > >> Gerlando > >> > >> > >> On Mon, Jan 28, 2019 at 6:50 AM Keith Medcalf <kmedc...@dessus.com> > >wrote: > >> > >>> > >>> Do you perhaps want this: > >>> > >>> select source1, > >>> source2, > >>> ( > >>> select min(ts) > >>> from rolling > >>> where source1 = x.source1 > >>> and source2 = x.source2 > >>> ) > >>> from ( > >>> select distinct source1, > >>> source2 > >>> from rolling > >>> ) as x; > >>> > >>> SQLite version 3.27.0 2019-01-28 00:42:06 > >>> Enter ".help" for usage hints. > >>> Connected to a transient in-memory database. > >>> Use ".open FILENAME" to reopen on a persistent database. > >>> sqlite> .timer on > >>> sqlite> CREATE TABLE `rolling` ( > >>> ...> `source1` TEXT NOT NULL, > >>> ...> `source2` TEXT NOT NULL, > >>> ...> `ts` INTEGER NOT NULL, > >>> ...> `value` TEXT > >>> ...> ); > >>> Run Time: real 0.002 user 0.000000 sys 0.000000 > >>> sqlite> > >>> sqlite> CREATE INDEX `sources` ON `rolling` ( > >>> ...> `source1`, > >>> ...> `source2`, > >>> ...> `ts` > >>> ...> ); > >>> Run Time: real 0.001 user 0.000000 sys 0.000000 > >>> sqlite> > >>> sqlite> INSERT INTO rolling > >>> ...> WITH RECURSIVE > >>> ...> src1( source1 ) AS ( VALUES("aaa") UNION ALL > >VALUES("bbb") > >>> ), > >>> ...> src2( source2 ) AS ( VALUES("X1") UNION ALL > >VALUES("X2") > >>> UNION ALL > >>> ...> VALUES("X3") UNION ALL VALUES("X4") ), > >>> ...> cnt( ts, value) AS ( > >>> ...> VALUES( 0, "ZZZZ") > >>> ...> UNION ALL > >>> ...> SELECT ts+1, value FROM cnt LIMIT 1000000) > >>> ...> > >>> ...> select src1.source1, src2.source2, cnt.* from src1, > >src2, cnt; > >>> Run Time: real 8.920 user 8.843750 sys 0.078125 > >>> sqlite> > >>> sqlite> analyze; > >>> Run Time: real 1.285 user 1.281250 sys 0.000000 > >>> sqlite> .eqp full > >>> sqlite> > >>> sqlite> select source1, > >>> ...> source2, > >>> ...> ( > >>> ...> select min(ts) > >>> ...> from rolling > >>> ...> where source1 = x.source1 > >>> ...> and source2 = x.source2 > >>> ...> ) > >>> ...> from ( > >>> ...> select distinct source1, > >>> ...> source2 > >>> ...> from rolling > >>> ...> ) as x; > >>> QUERY PLAN > >>> |--CO-ROUTINE 2 > >>> | `--SCAN TABLE rolling USING COVERING INDEX sources (~7864320 > >rows) > >>> |--SCAN SUBQUERY 2 AS x (~7864320 rows) > >>> `--CORRELATED SCALAR SUBQUERY 1 > >>> `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=? > >AND > >>> source2=?) (~983040 rows) > >>> addr opcode p1 p2 p3 p4 p5 comment > >>> ---- ------------- ---- ---- ---- ------------- -- -------- > >----- > >>> 0 Init 0 64 0 00 Start at > >64 > >>> 1 InitCoroutine 1 23 2 00 x > >>> 2 Null 1 4 0 08 > >r[4]=NULL > >>> 3 OpenRead 4 3 0 k(4,,,,) 00 root=3 > >iDb=0; > >>> sources > >>> 4 ColumnsUsed 4 0 0 3 00 > >>> 5 Explain 5 0 0 SCAN TABLE rolling USING > >COVERING > >>> INDEX sources (~7864320 rows) 00 > >>> 6 Noop 0 0 0 00 Begin > >>> WHERE-loop0: rolling > >>> 7 Rewind 4 21 2 0 00 > >>> 8 Noop 0 0 0 00 > >Begin > >>> WHERE-core > >>> 9 Column 4 0 2 00 > >>> r[2]=rolling.source1 > >>> 10 Column 4 1 3 00 > >>> r[3]=rolling.source2 > >>> 11 Ne 2 13 4 (BINARY) 80 if > >>> r[4]!=r[2] goto 13 > >>> 12 Eq 3 20 5 (BINARY) 80 if > >>> r[5]==r[3] goto 20 > >>> 13 Copy 2 4 1 00 > >>> r[4..5]=r[2..3] > >>> 14 Yield 1 0 0 00 > >>> 15 Noop 0 0 0 00 End > >>> WHERE-core > >>> 16 Column 4 0 6 00 > >r[6]= > >>> 17 Column 4 1 7 00 > >r[7]= > >>> 18 SeekGT 4 21 6 2 00 > >key=r[6..7] > >>> 19 Goto 1 8 0 00 > >>> 20 Next 4 8 0 01 > >>> 21 Noop 0 0 0 00 End > >>> WHERE-loop0: rolling > >>> 22 EndCoroutine 1 0 0 00 > >>> 23 Explain 23 0 0 SCAN SUBQUERY 2 AS x > >(~7864320 > >>> rows) 00 > >>> 24 Noop 0 0 0 00 Begin > >>> WHERE-loop0: x > >>> 25 InitCoroutine 1 0 2 00 > >>> 26 Yield 1 62 0 00 next > >row of x > >>> 27 Noop 0 0 0 00 Begin > >>> WHERE-core > >>> 28 Copy 2 9 0 00 > >r[9]=r[2]; > >>> x.source1 > >>> 29 Copy 3 10 0 00 > >r[10]=r[3]; > >>> x.source2 > >>> 30 Null 0 12 12 00 > >>> r[12..12]=NULL; Init subquery result > >>> 31 Integer 1 13 0 00 > >r[13]=1; > >>> LIMIT counter > >>> 32 Null 0 14 15 00 > >r[14..15]=NULL > >>> 33 OpenRead 5 3 0 k(4,,,,) 00 root=3 > >iDb=0; > >>> sources > >>> 34 ColumnsUsed 5 0 0 7 00 > >>> 35 Explain 35 0 0 SEARCH TABLE rolling > >USING > >>> COVERING INDEX sources (source1=? AND source2=?) (~983040 rows) > >00 > >>> 36 Noop 0 0 0 00 Begin > >>> WHERE-loop0: rolling > >>> 37 Copy 2 16 0 00 > >r[16]=r[2]; > >>> x.source1 > >>> 38 Copy 3 17 0 00 > >r[17]=r[3]; > >>> x.source2 > >>> 39 CursorHint 5 0 0 > >AND(EQ(c0,r[16]),EQ(c1,r[17])) > >>> 00 > >>> 40 Copy 2 18 0 00 > >r[18]=r[2]; > >>> x.source1 > >>> 41 IsNull 18 54 0 00 if > >>> r[18]==NULL goto 54 > >>> 42 Copy 3 19 0 00 > >r[19]=r[3]; > >>> x.source2 > >>> 43 IsNull 19 54 0 00 if > >>> r[19]==NULL goto 54 > >>> 44 Null 0 20 0 00 > >r[20]=NULL > >>> 45 SeekGT 5 54 18 3 00 > >key=r[18..20] > >>> 46 IdxGT 5 54 18 2 00 > >>> key=r[18..19] > >>> 47 Noop 0 0 0 00 > >Begin > >>> WHERE-core > >>> 48 Column 5 2 21 00 > >>> r[21]=rolling.ts > >>> 49 CollSeq 0 0 0 (BINARY) 00 > >>> 50 AggStep 0 21 14 min(1) 01 > >accum=r[14] > >>> step(r[21]) > >>> 51 Goto 0 55 0 00 > >min() by > >>> index > >>> 52 Noop 0 0 0 00 End > >>> WHERE-core > >>> 53 Next 5 46 0 00 > >>> 54 Noop 0 0 0 00 End > >>> WHERE-loop0: rolling > >>> 55 AggFinal 14 1 0 min(1) 00 > >accum=r[14] > >>> N=1 > >>> 56 Copy 14 12 0 00 > >r[12]=r[14] > >>> 57 DecrJumpZero 13 58 0 00 if > >>> (--r[13])==0 goto 58 > >>> 58 Copy 12 11 0 00 > >r[11]=r[12] > >>> 59 ResultRow 9 3 0 00 > >>> output=r[9..11] > >>> 60 Noop 0 0 0 00 End > >WHERE-core > >>> 61 Goto 0 26 0 00 > >>> 62 Noop 0 0 0 00 End > >>> WHERE-loop0: x > >>> 63 Halt 0 0 0 00 > >>> 64 Transaction 0 0 3 0 01 > >>> usesStmtJournal=0 > >>> 65 Goto 0 1 0 00 > >>> aaa|X1|0 > >>> aaa|X2|0 > >>> aaa|X3|0 > >>> aaa|X4|0 > >>> bbb|X1|0 > >>> bbb|X2|0 > >>> bbb|X3|0 > >>> bbb|X4|0 > >>> Run Time: real 0.134 user 0.000000 sys 0.000000 > >>> sqlite> > >>> > >>> --- > >>> The fact that there's a Highway to Hell but only a Stairway to > >Heaven > >>> says a lot about anticipated traffic volume. > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> sqlite-users mailing list > >>> sqlite-users@mailinglists.sqlite.org > >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite- > >users > >>> > >> > >_______________________________________________ > >sqlite-users mailing list > >sqlite-users@mailinglists.sqlite.org > >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users