Re: [sqlite] Query Regression IN and Virtual Tables - followup
On 5/4/62 16:44, Hick Gunter wrote: I patched my SQlite 3.24 code to include the fix from the ticket < if( rc==SQLITE_OK && (mBest = (pNew->prereq & ~mPrereq))!=0 ){ --- // from SQLite bugfix if( rc==SQLITE_OK && ((mBest = (pNew->prereq & ~mPrereq))!=0 || bIn) ){ and changed the xBestIndex return value to be lower if the equality constraint from IN is not usable The generated code as reported is invalid (instruction 16 with the init of R6 is not shown) So, after applying the patch to 3.24 you executed the EXPLAIN statement shown below in the shell tool and it mysteriously omitted instruction 16 from the output? Are there any other problems? Does the SQL statement return the correct results if you execute it without the EXPLAIN? Dan. explain select lsn from atx_txlog where period_no between 7300 and 7313 and event_Type in (140001,180001); addr opcode p1p2p3p4 p5 comment - - -- - 0 Init 0 16000 Start at 16 1 VOpen 0 0 0 vtab:B90B5000 2 Explain2 0 0 SCAN TABLE atx_txlog VIRTUAL TABLE INDEX 1: 00 3 Integer7300 3 000 r[3]=7300 4 Integer7313 4 000 r[4]=7313 5 Integer1 1 000 r[1]=1 6 Integer2 2 000 r[2]=2 7 VFilter0 151 00 iplan=r[1] zplan=' ' 8 Noop 0 0 000 begin IN expr 9 VColumn0 15500 r[5]=vcolumn(15); atx_txlog.event_type 10 Eq 5 126 (BINARY) 43 if r[6]==r[5] goto 12 11 Ne 5 147 (BINARY) 53 if r[7]!=r[5] goto 14; end IN expr 12 VColumn0 21800 r[8]=vcolumn(21); atx_txlog.lsn 13 ResultRow 8 1 000 output=r[8] 14VNext 0 8 000 15Halt 0 0 000 17Integer180001 7 000 r[7]=180001 18Goto 0 1 000 -Ursprüngliche Nachricht- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Dan Kennedy Gesendet: Freitag, 29. März 2019 14:30 An: sqlite-users@mailinglists.sqlite.org Betreff: [EXTERNAL] Re: [sqlite] Query Regression IN and Virtual Tables On 29/3/62 14:32, Hick Gunter wrote: When upgrading from 3.7.14.1 to 3.24 I noticed the following problem Given a virtual table like CREATE VIRTUAL TABLE vt ( key1 INTEGER, key2 INTEGER, key3 INTEGER, attr1 INTEGER,...); whose xBestIndex function simulates (in unsupported syntax) CREATE VIRTUAL INDEX vt_key ON vt (key1, key2, key3); but also handles simple comparisons internally, the query SELECT * FROM vt WHERE key1 = AND key2 BETWEEN AND AND attr1 IN (); Thanks for reporting this. What is supposed to happen in this scenario is that xBestIndex() is invoked once with all 4 constraints marked as usable. The IN(...) is represented as an SQLITE_INDEX_CONSTRAINT_EQ constraint. If the xBestIndex() implementation elects to use the IN(...) operator, then xBestIndex() is invoked a second time, this time with the IN(...) marked as not usable. SQLite evaluates both plans, considering the cost estimates provided by the virtual table implementation and its own estimate of the cardinality of the IN(...) operator. And chooses the most efficient plan overall. There was a bug preventing the second call to xBestIndex() from being made in some circumstances - including for your query. Now fixed here: https://sqlite.org/src/info/f5752517f590b37b So if you upgrade to trunk, or else apply the patch linked above to 3.27.2, and the virtual table implementation provides relatively accurate cost estimates, SQLite should make an intelligent decision about which plan to use. Dan. SQLite 3.7.14.1 asks for (key1,'='), (key2,'>='), (key2,'<=') and xBestIndex accepts all 3 constraints yielding query plan - materialize IN as anonymous ephemeral table - search vt using index 1 (key1 = ? AND key2 >= ? AND key2 <= ?) - retrieve column attr1 - search anonymous ephemeral table i.e. perform a single partial table scan on vt and check attr1 SQLite3.24 asks for ...,(attr1,'=') and xBestIndex accepts all 4 constraints yielding - materialize IN () as anonymous ephemeral table - scan anonymous ephemeral table - search vt using index 1 (key1 = ? AND key2 >= ? AND key2 <= ? AND attr1 = ?) i.e. perform a partial table scan of vt FOR EACH attr1, which is slower by the cardinality of the IN list Fortunately, CTEs come to
[sqlite] Query Regression IN and Virtual Tables - followup
I patched my SQlite 3.24 code to include the fix from the ticket < if( rc==SQLITE_OK && (mBest = (pNew->prereq & ~mPrereq))!=0 ){ --- > // from SQLite bugfix > if( rc==SQLITE_OK && ((mBest = (pNew->prereq & ~mPrereq))!=0 || bIn) ){ and changed the xBestIndex return value to be lower if the equality constraint from IN is not usable The generated code as reported is invalid (instruction 16 with the init of R6 is not shown) explain select lsn from atx_txlog where period_no between 7300 and 7313 and event_Type in (140001,180001); addr opcode p1p2p3p4 p5 comment - - -- - 0 Init 0 16000 Start at 16 1 VOpen 0 0 0 vtab:B90B5000 2 Explain2 0 0 SCAN TABLE atx_txlog VIRTUAL TABLE INDEX 1: 00 3 Integer7300 3 000 r[3]=7300 4 Integer7313 4 000 r[4]=7313 5 Integer1 1 000 r[1]=1 6 Integer2 2 000 r[2]=2 7 VFilter0 151 00 iplan=r[1] zplan=' ' 8 Noop 0 0 000 begin IN expr 9 VColumn0 15500 r[5]=vcolumn(15); atx_txlog.event_type 10 Eq 5 126 (BINARY) 43 if r[6]==r[5] goto 12 11 Ne 5 147 (BINARY) 53 if r[7]!=r[5] goto 14; end IN expr 12 VColumn0 21800 r[8]=vcolumn(21); atx_txlog.lsn 13 ResultRow 8 1 000 output=r[8] 14VNext 0 8 000 15Halt 0 0 000 17Integer180001 7 000 r[7]=180001 18Goto 0 1 000 -Ursprüngliche Nachricht- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Dan Kennedy Gesendet: Freitag, 29. März 2019 14:30 An: sqlite-users@mailinglists.sqlite.org Betreff: [EXTERNAL] Re: [sqlite] Query Regression IN and Virtual Tables On 29/3/62 14:32, Hick Gunter wrote: > When upgrading from 3.7.14.1 to 3.24 I noticed the following problem > > Given a virtual table like > > CREATE VIRTUAL TABLE vt ( key1 INTEGER, key2 INTEGER, key3 INTEGER, > attr1 INTEGER,...); > > whose xBestIndex function simulates (in unsupported syntax) > > CREATE VIRTUAL INDEX vt_key ON vt (key1, key2, key3); > > but also handles simple comparisons internally, the query > > SELECT * FROM vt WHERE key1 = AND key2 BETWEEN AND > AND attr1 IN (); Thanks for reporting this. What is supposed to happen in this scenario is that xBestIndex() is invoked once with all 4 constraints marked as usable. The IN(...) is represented as an SQLITE_INDEX_CONSTRAINT_EQ constraint. If the xBestIndex() implementation elects to use the IN(...) operator, then xBestIndex() is invoked a second time, this time with the IN(...) marked as not usable. SQLite evaluates both plans, considering the cost estimates provided by the virtual table implementation and its own estimate of the cardinality of the IN(...) operator. And chooses the most efficient plan overall. There was a bug preventing the second call to xBestIndex() from being made in some circumstances - including for your query. Now fixed here: https://sqlite.org/src/info/f5752517f590b37b So if you upgrade to trunk, or else apply the patch linked above to 3.27.2, and the virtual table implementation provides relatively accurate cost estimates, SQLite should make an intelligent decision about which plan to use. Dan. > > SQLite 3.7.14.1 asks for (key1,'='), (key2,'>='), (key2,'<=') and > xBestIndex accepts all 3 constraints yielding query plan > > - materialize IN as anonymous ephemeral table > - search vt using index 1 (key1 = ? AND key2 >= ? AND key2 <= ?) >- retrieve column attr1 >- search anonymous ephemeral table > > i.e. perform a single partial table scan on vt and check attr1 > > > SQLite3.24 asks for ...,(attr1,'=') and xBestIndex accepts all 4 > constraints yielding > > - materialize IN () as anonymous ephemeral table > - scan anonymous ephemeral table >- search vt using index 1 (key1 = ? AND key2 >= ? AND key2 <= ? AND > attr1 = ?) > > i.e. perform a partial table scan of vt FOR EACH attr1, which is > slower by the cardinality of the IN list > > Fortunately, CTEs come to the rescue: > > WITH attrs (attr1) AS (VALUES <(list)>) SELECT * FROM vt CROSS JOIN > attrs a ON (a.attr1 = vt.attr1) WHERE key1 = AND key2 BE
Re: [sqlite] Query Regression IN and Virtual Tables
On 29/3/62 14:32, Hick Gunter wrote: When upgrading from 3.7.14.1 to 3.24 I noticed the following problem Given a virtual table like CREATE VIRTUAL TABLE vt ( key1 INTEGER, key2 INTEGER, key3 INTEGER, attr1 INTEGER,...); whose xBestIndex function simulates (in unsupported syntax) CREATE VIRTUAL INDEX vt_key ON vt (key1, key2, key3); but also handles simple comparisons internally, the query SELECT * FROM vt WHERE key1 = AND key2 BETWEEN AND AND attr1 IN (); Thanks for reporting this. What is supposed to happen in this scenario is that xBestIndex() is invoked once with all 4 constraints marked as usable. The IN(...) is represented as an SQLITE_INDEX_CONSTRAINT_EQ constraint. If the xBestIndex() implementation elects to use the IN(...) operator, then xBestIndex() is invoked a second time, this time with the IN(...) marked as not usable. SQLite evaluates both plans, considering the cost estimates provided by the virtual table implementation and its own estimate of the cardinality of the IN(...) operator. And chooses the most efficient plan overall. There was a bug preventing the second call to xBestIndex() from being made in some circumstances - including for your query. Now fixed here: https://sqlite.org/src/info/f5752517f590b37b So if you upgrade to trunk, or else apply the patch linked above to 3.27.2, and the virtual table implementation provides relatively accurate cost estimates, SQLite should make an intelligent decision about which plan to use. Dan. SQLite 3.7.14.1 asks for (key1,'='), (key2,'>='), (key2,'<=') and xBestIndex accepts all 3 constraints yielding query plan - materialize IN as anonymous ephemeral table - search vt using index 1 (key1 = ? AND key2 >= ? AND key2 <= ?) - retrieve column attr1 - search anonymous ephemeral table i.e. perform a single partial table scan on vt and check attr1 SQLite3.24 asks for ...,(attr1,'=') and xBestIndex accepts all 4 constraints yielding - materialize IN () as anonymous ephemeral table - scan anonymous ephemeral table - search vt using index 1 (key1 = ? AND key2 >= ? AND key2 <= ? AND attr1 = ?) i.e. perform a partial table scan of vt FOR EACH attr1, which is slower by the cardinality of the IN list Fortunately, CTEs come to the rescue: WITH attrs (attr1) AS (VALUES <(list)>) SELECT * FROM vt CROSS JOIN attrs a ON (a.attr1 = vt.attr1) WHERE key1 = AND key2 BETWEEN AND This prevents SQLite 3.24 from adding the last constraint, yielding - materialize IN (<(list)>) as epehemeral table attrs - search vt using index 1 (key1 = ? AND key2 >= ? AND key2 <= ?) - retrieve column attr1 - scan ephemeral table attrs The only issue is that the previously generated ephemeral table was implemented as a covering index (allowing the sequence IfNull, Affinity, NotFound) whereas the named ephemeral table is implemented as a table (requiring a full table scan of the ephemeral table, even though at most 1 row can match) Optimisation opportunity: 32 Rewind 1 40000 33Column 1 0 10 00 r[10]=events.event_type 34VColumn0 1511 00 r[11]=vcolumn(15); atx_txlog.event_type 35Ne 113810(BINARY) 53 if r[10]!=r[11] goto 38 36VColumn0 6 12 00 r[12]=vcolumn(6); atx_txlog.sync_offset 37ResultRow 121 000 output=r[12] 38 Next 1 33001 Could IMHO be rewritten as 32 VColumn0 1511 00 r[11]=vcolumn(15); atx_txlog.event_type 33 Rewind 1 40000 34Column 1 0 10 00 r[10]=events.event_type 35Ne 113810(BINARY) 53 if r[10]!=r[11] goto 38 36VColumn0 6 12 00 r[12]=vcolumn(6); atx_txlog.sync_offset 37ResultRow 121 000 output=r[12] 38 Next 1 33001 ___ Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0 May be privileged. May be confidential. Please delete if not the addressee. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Query Regression IN and Virtual Tables
When upgrading from 3.7.14.1 to 3.24 I noticed the following problem Given a virtual table like CREATE VIRTUAL TABLE vt ( key1 INTEGER, key2 INTEGER, key3 INTEGER, attr1 INTEGER,...); whose xBestIndex function simulates (in unsupported syntax) CREATE VIRTUAL INDEX vt_key ON vt (key1, key2, key3); but also handles simple comparisons internally, the query SELECT * FROM vt WHERE key1 = AND key2 BETWEEN AND AND attr1 IN (); SQLite 3.7.14.1 asks for (key1,'='), (key2,'>='), (key2,'<=') and xBestIndex accepts all 3 constraints yielding query plan - materialize IN as anonymous ephemeral table - search vt using index 1 (key1 = ? AND key2 >= ? AND key2 <= ?) - retrieve column attr1 - search anonymous ephemeral table i.e. perform a single partial table scan on vt and check attr1 SQLite3.24 asks for ...,(attr1,'=') and xBestIndex accepts all 4 constraints yielding - materialize IN () as anonymous ephemeral table - scan anonymous ephemeral table - search vt using index 1 (key1 = ? AND key2 >= ? AND key2 <= ? AND attr1 = ?) i.e. perform a partial table scan of vt FOR EACH attr1, which is slower by the cardinality of the IN list Fortunately, CTEs come to the rescue: WITH attrs (attr1) AS (VALUES <(list)>) SELECT * FROM vt CROSS JOIN attrs a ON (a.attr1 = vt.attr1) WHERE key1 = AND key2 BETWEEN AND This prevents SQLite 3.24 from adding the last constraint, yielding - materialize IN (<(list)>) as epehemeral table attrs - search vt using index 1 (key1 = ? AND key2 >= ? AND key2 <= ?) - retrieve column attr1 - scan ephemeral table attrs The only issue is that the previously generated ephemeral table was implemented as a covering index (allowing the sequence IfNull, Affinity, NotFound) whereas the named ephemeral table is implemented as a table (requiring a full table scan of the ephemeral table, even though at most 1 row can match) Optimisation opportunity: 32 Rewind 1 40000 33Column 1 0 10 00 r[10]=events.event_type 34VColumn0 1511 00 r[11]=vcolumn(15); atx_txlog.event_type 35Ne 113810(BINARY) 53 if r[10]!=r[11] goto 38 36VColumn0 6 12 00 r[12]=vcolumn(6); atx_txlog.sync_offset 37ResultRow 121 000 output=r[12] 38 Next 1 33001 Could IMHO be rewritten as 32 VColumn0 1511 00 r[11]=vcolumn(15); atx_txlog.event_type 33 Rewind 1 40000 34Column 1 0 10 00 r[10]=events.event_type 35Ne 113810(BINARY) 53 if r[10]!=r[11] goto 38 36VColumn0 6 12 00 r[12]=vcolumn(6); atx_txlog.sync_offset 37ResultRow 121 000 output=r[12] 38 Next 1 33001 ___ Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0 May be privileged. May be confidential. Please delete if not the addressee. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users