Re: [sqlite] Query Regression IN and Virtual Tables - followup

2019-04-05 Thread Dan Kennedy


On 5/4/62 16:44, Hick Gunter wrote:

I patched my SQlite 3.24 code to include the fix from the ticket

<   if( rc==SQLITE_OK && (mBest = (pNew->prereq & ~mPrereq))!=0 ){
---

// from SQLite bugfix
  if( rc==SQLITE_OK && ((mBest = (pNew->prereq & ~mPrereq))!=0 || bIn) ){

and changed the xBestIndex return value to be lower if the equality constraint 
from IN is not usable

The generated code as reported is invalid (instruction 16 with the init of R6 
is not shown)



So, after applying the patch to 3.24 you executed the EXPLAIN statement 
shown below in the shell tool and it mysteriously omitted instruction 16 
from the output?


Are there any other problems? Does the SQL statement return the correct 
results if you execute it without the EXPLAIN?


Dan.






explain select lsn from atx_txlog where period_no between 7300 and 7313 and 
event_Type in (140001,180001);

addr  opcode p1p2p3p4 p5  comment
  -        -  --  -
0 Init   0 16000  Start at 16
1 VOpen  0 0 0 vtab:B90B5000
2 Explain2 0 0 SCAN TABLE atx_txlog VIRTUAL TABLE INDEX 
1:
   00
3 Integer7300  3 000  r[3]=7300
4 Integer7313  4 000  r[4]=7313
5 Integer1 1 000  r[1]=1
6 Integer2 2 000  r[2]=2
7 VFilter0 151
   00  iplan=r[1] zplan='
'
8   Noop   0 0 000  begin IN expr
9   VColumn0 15500  r[5]=vcolumn(15); 
atx_txlog.event_type
10  Eq 5 126 (BINARY)   43  if r[6]==r[5] goto 
12
11  Ne 5 147 (BINARY)   53  if r[7]!=r[5] goto 
14; end IN expr
12  VColumn0 21800  r[8]=vcolumn(21); 
atx_txlog.lsn
13  ResultRow  8 1 000  output=r[8]
14VNext  0 8 000
15Halt   0 0 000
17Integer180001  7 000  r[7]=180001
18Goto   0 1 000

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Dan Kennedy
Gesendet: Freitag, 29. März 2019 14:30
An: sqlite-users@mailinglists.sqlite.org
Betreff: [EXTERNAL] Re: [sqlite] Query Regression IN and Virtual Tables


On 29/3/62 14:32, Hick Gunter wrote:

When upgrading from 3.7.14.1 to 3.24 I noticed the following problem

Given a virtual table like

CREATE VIRTUAL TABLE vt ( key1 INTEGER, key2 INTEGER, key3 INTEGER,
attr1 INTEGER,...);

whose xBestIndex function simulates (in unsupported syntax)

CREATE VIRTUAL INDEX vt_key ON vt (key1, key2, key3);

but also handles simple comparisons internally, the query

SELECT * FROM vt WHERE key1 =  AND key2 BETWEEN  AND
 AND attr1 IN ();


Thanks for reporting this.

What is supposed to happen in this scenario is that xBestIndex() is invoked 
once with all 4 constraints marked as usable. The IN(...) is represented as an 
SQLITE_INDEX_CONSTRAINT_EQ constraint. If the
xBestIndex() implementation elects to use the IN(...) operator, then
xBestIndex() is invoked a second time, this time with the IN(...) marked as not 
usable. SQLite evaluates both plans, considering the cost estimates provided by 
the virtual table implementation and its own estimate of the cardinality of the 
IN(...) operator. And chooses the most efficient plan overall.

There was a bug preventing the second call to xBestIndex() from being made in 
some circumstances - including for your query. Now fixed here:

https://sqlite.org/src/info/f5752517f590b37b

So if you upgrade to trunk, or else apply the patch linked above to 3.27.2, and 
the virtual table implementation provides relatively accurate cost estimates, 
SQLite should make an intelligent decision about which plan to use.

Dan.



SQLite 3.7.14.1 asks for (key1,'='), (key2,'>='), (key2,'<=') and
xBestIndex accepts all 3 constraints yielding query plan

- materialize IN  as anonymous ephemeral table
- search vt using index 1 (key1 = ? AND key2 >= ? AND key2 <= ?)
- retrieve column attr1
- search anonymous ephemeral table

i.e. perform a single partial table scan on vt and check attr1


SQLite3.24 asks for ...,(attr1,'=') and xBestIndex accepts all 4
constraints yielding

- materialize IN () as anonymous ephemeral table
- scan anonymous ephemeral table
- search vt using index 1 (key1 = ? AND key2 >= ? AND key2 <= ? AND
attr1 = ?)

i.e. perform a partial table scan of vt FOR EACH attr1, which is
slower by the cardinality of the IN list

Fortunately, CTEs come to

[sqlite] Query Regression IN and Virtual Tables - followup

2019-04-05 Thread Hick Gunter
I patched my SQlite 3.24 code to include the fix from the ticket

<   if( rc==SQLITE_OK && (mBest = (pNew->prereq & ~mPrereq))!=0 ){
---
> // from SQLite bugfix
>  if( rc==SQLITE_OK && ((mBest = (pNew->prereq & ~mPrereq))!=0 || bIn) ){

and changed the xBestIndex return value to be lower if the equality constraint 
from IN is not usable

The generated code as reported is invalid (instruction 16 with the init of R6 
is not shown)

explain select lsn from atx_txlog where period_no between 7300 and 7313 and 
event_Type in (140001,180001);

addr  opcode p1p2p3p4 p5  comment
  -        -  --  -
0 Init   0 16000  Start at 16
1 VOpen  0 0 0 vtab:B90B5000
2 Explain2 0 0 SCAN TABLE atx_txlog VIRTUAL TABLE INDEX 
1:
  00
3 Integer7300  3 000  r[3]=7300
4 Integer7313  4 000  r[4]=7313
5 Integer1 1 000  r[1]=1
6 Integer2 2 000  r[2]=2
7 VFilter0 151
  00  iplan=r[1] zplan='
'
8   Noop   0 0 000  begin IN expr
9   VColumn0 15500  r[5]=vcolumn(15); 
atx_txlog.event_type
10  Eq 5 126 (BINARY)   43  if r[6]==r[5] goto 
12
11  Ne 5 147 (BINARY)   53  if r[7]!=r[5] goto 
14; end IN expr
12  VColumn0 21800  r[8]=vcolumn(21); 
atx_txlog.lsn
13  ResultRow  8 1 000  output=r[8]
14VNext  0 8 000
15Halt   0 0 000
17Integer180001  7 000  r[7]=180001
18Goto   0 1 000

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Dan Kennedy
Gesendet: Freitag, 29. März 2019 14:30
An: sqlite-users@mailinglists.sqlite.org
Betreff: [EXTERNAL] Re: [sqlite] Query Regression IN and Virtual Tables


On 29/3/62 14:32, Hick Gunter wrote:
> When upgrading from 3.7.14.1 to 3.24 I noticed the following problem
>
> Given a virtual table like
>
> CREATE VIRTUAL TABLE vt ( key1 INTEGER, key2 INTEGER, key3 INTEGER,
> attr1 INTEGER,...);
>
> whose xBestIndex function simulates (in unsupported syntax)
>
> CREATE VIRTUAL INDEX vt_key ON vt (key1, key2, key3);
>
> but also handles simple comparisons internally, the query
>
> SELECT * FROM vt WHERE key1 =  AND key2 BETWEEN  AND
>  AND attr1 IN ();


Thanks for reporting this.

What is supposed to happen in this scenario is that xBestIndex() is invoked 
once with all 4 constraints marked as usable. The IN(...) is represented as an 
SQLITE_INDEX_CONSTRAINT_EQ constraint. If the
xBestIndex() implementation elects to use the IN(...) operator, then
xBestIndex() is invoked a second time, this time with the IN(...) marked as not 
usable. SQLite evaluates both plans, considering the cost estimates provided by 
the virtual table implementation and its own estimate of the cardinality of the 
IN(...) operator. And chooses the most efficient plan overall.

There was a bug preventing the second call to xBestIndex() from being made in 
some circumstances - including for your query. Now fixed here:

   https://sqlite.org/src/info/f5752517f590b37b

So if you upgrade to trunk, or else apply the patch linked above to 3.27.2, and 
the virtual table implementation provides relatively accurate cost estimates, 
SQLite should make an intelligent decision about which plan to use.

Dan.


>
> SQLite 3.7.14.1 asks for (key1,'='), (key2,'>='), (key2,'<=') and
> xBestIndex accepts all 3 constraints yielding query plan
>
> - materialize IN  as anonymous ephemeral table
> - search vt using index 1 (key1 = ? AND key2 >= ? AND key2 <= ?)
>- retrieve column attr1
>- search anonymous ephemeral table
>
> i.e. perform a single partial table scan on vt and check attr1
>
>
> SQLite3.24 asks for ...,(attr1,'=') and xBestIndex accepts all 4
> constraints yielding
>
> - materialize IN () as anonymous ephemeral table
> - scan anonymous ephemeral table
>- search vt using index 1 (key1 = ? AND key2 >= ? AND key2 <= ? AND
> attr1 = ?)
>
> i.e. perform a partial table scan of vt FOR EACH attr1, which is
> slower by the cardinality of the IN list
>
> Fortunately, CTEs come to the rescue:
>
> WITH attrs (attr1) AS (VALUES <(list)>) SELECT * FROM vt CROSS JOIN
> attrs a ON (a.attr1 = vt.attr1) WHERE key1 =  AND key2 BE

Re: [sqlite] Query Regression IN and Virtual Tables

2019-03-29 Thread Dan Kennedy


On 29/3/62 14:32, Hick Gunter wrote:

When upgrading from 3.7.14.1 to 3.24 I noticed the following problem

Given a virtual table like

CREATE VIRTUAL TABLE vt ( key1 INTEGER, key2 INTEGER, key3 INTEGER, attr1 
INTEGER,...);

whose xBestIndex function simulates (in unsupported syntax)

CREATE VIRTUAL INDEX vt_key ON vt (key1, key2, key3);

but also handles simple comparisons internally, the query

SELECT * FROM vt WHERE key1 =  AND key2 BETWEEN  AND  AND attr1 IN 
();



Thanks for reporting this.

What is supposed to happen in this scenario is that xBestIndex() is 
invoked once with all 4 constraints marked as usable. The IN(...) is 
represented as an SQLITE_INDEX_CONSTRAINT_EQ constraint. If the 
xBestIndex() implementation elects to use the IN(...) operator, then 
xBestIndex() is invoked a second time, this time with the IN(...) marked 
as not usable. SQLite evaluates both plans, considering the cost 
estimates provided by the virtual table implementation and its own 
estimate of the cardinality of the IN(...) operator. And chooses the 
most efficient plan overall.


There was a bug preventing the second call to xBestIndex() from being 
made in some circumstances - including for your query. Now fixed here:


  https://sqlite.org/src/info/f5752517f590b37b

So if you upgrade to trunk, or else apply the patch linked above to 
3.27.2, and the virtual table implementation provides relatively 
accurate cost estimates, SQLite should make an intelligent decision 
about which plan to use.


Dan.




SQLite 3.7.14.1 asks for (key1,'='), (key2,'>='), (key2,'<=') and xBestIndex 
accepts all 3 constraints yielding query plan

- materialize IN  as anonymous ephemeral table
- search vt using index 1 (key1 = ? AND key2 >= ? AND key2 <= ?)
   - retrieve column attr1
   - search anonymous ephemeral table

i.e. perform a single partial table scan on vt and check attr1


SQLite3.24 asks for ...,(attr1,'=') and xBestIndex accepts all 4 constraints 
yielding

- materialize IN () as anonymous ephemeral table
- scan anonymous ephemeral table
   - search vt using index 1 (key1 = ? AND key2 >= ? AND key2 <= ? AND attr1 = 
?)

i.e. perform a partial table scan of vt FOR EACH attr1, which is slower by the 
cardinality of the IN list

Fortunately, CTEs come to the rescue:

WITH attrs (attr1) AS (VALUES <(list)>) SELECT * FROM vt CROSS JOIN attrs a ON (a.attr1 = 
vt.attr1) WHERE key1 =  AND key2 BETWEEN  AND 

This prevents SQLite 3.24 from adding the last constraint, yielding

- materialize IN (<(list)>) as epehemeral table attrs
- search vt using index 1 (key1 = ? AND key2 >= ? AND key2 <= ?)
   - retrieve column attr1
   - scan ephemeral table attrs

The only issue is that the previously generated ephemeral table was implemented 
as a covering index (allowing the sequence IfNull, Affinity, NotFound) whereas 
the named ephemeral table is implemented as a table (requiring a full table 
scan of the ephemeral table, even though at most 1 row can match)

Optimisation opportunity:

32  Rewind 1 40000
33Column 1 0 10   00  
r[10]=events.event_type
34VColumn0 1511   00  
r[11]=vcolumn(15); atx_txlog.event_type
35Ne 113810(BINARY)   53  if r[10]!=r[11] 
goto 38
36VColumn0 6 12   00  r[12]=vcolumn(6); 
atx_txlog.sync_offset
37ResultRow  121 000  output=r[12]
38  Next   1 33001

Could IMHO be rewritten as

32  VColumn0 1511   00  r[11]=vcolumn(15); 
atx_txlog.event_type
33  Rewind 1 40000
34Column 1 0 10   00  
r[10]=events.event_type
35Ne 113810(BINARY)   53  if r[10]!=r[11] 
goto 38
36VColumn0 6 12   00  r[12]=vcolumn(6); 
atx_txlog.sync_offset
37ResultRow  121 000  output=r[12]
38  Next   1 33001


___
  Gunter Hick | Software Engineer | Scientific Games International GmbH | 
Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 
1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Query Regression IN and Virtual Tables

2019-03-29 Thread Hick Gunter
When upgrading from 3.7.14.1 to 3.24 I noticed the following problem

Given a virtual table like

CREATE VIRTUAL TABLE vt ( key1 INTEGER, key2 INTEGER, key3 INTEGER, attr1 
INTEGER,...);

whose xBestIndex function simulates (in unsupported syntax)

CREATE VIRTUAL INDEX vt_key ON vt (key1, key2, key3);

but also handles simple comparisons internally, the query

SELECT * FROM vt WHERE key1 =  AND key2 BETWEEN  AND  AND 
attr1 IN ();

SQLite 3.7.14.1 asks for (key1,'='), (key2,'>='), (key2,'<=') and xBestIndex 
accepts all 3 constraints yielding query plan

- materialize IN  as anonymous ephemeral table
- search vt using index 1 (key1 = ? AND key2 >= ? AND key2 <= ?)
  - retrieve column attr1
  - search anonymous ephemeral table

i.e. perform a single partial table scan on vt and check attr1


SQLite3.24 asks for ...,(attr1,'=') and xBestIndex accepts all 4 constraints 
yielding

- materialize IN () as anonymous ephemeral table
- scan anonymous ephemeral table
  - search vt using index 1 (key1 = ? AND key2 >= ? AND key2 <= ? AND attr1 = ?)

i.e. perform a partial table scan of vt FOR EACH attr1, which is slower by the 
cardinality of the IN list

Fortunately, CTEs come to the rescue:

WITH attrs (attr1) AS (VALUES <(list)>) SELECT * FROM vt CROSS JOIN attrs a ON 
(a.attr1 = vt.attr1) WHERE key1 =  AND key2 BETWEEN  AND 

This prevents SQLite 3.24 from adding the last constraint, yielding

- materialize IN (<(list)>) as epehemeral table attrs
- search vt using index 1 (key1 = ? AND key2 >= ? AND key2 <= ?)
  - retrieve column attr1
  - scan ephemeral table attrs

The only issue is that the previously generated ephemeral table was implemented 
as a covering index (allowing the sequence IfNull, Affinity, NotFound) whereas 
the named ephemeral table is implemented as a table (requiring a full table 
scan of the ephemeral table, even though at most 1 row can match)

Optimisation opportunity:

32  Rewind 1 40000
33Column 1 0 10   00  
r[10]=events.event_type
34VColumn0 1511   00  
r[11]=vcolumn(15); atx_txlog.event_type
35Ne 113810(BINARY)   53  if r[10]!=r[11] 
goto 38
36VColumn0 6 12   00  r[12]=vcolumn(6); 
atx_txlog.sync_offset
37ResultRow  121 000  output=r[12]
38  Next   1 33001

Could IMHO be rewritten as

32  VColumn0 1511   00  r[11]=vcolumn(15); 
atx_txlog.event_type
33  Rewind 1 40000
34Column 1 0 10   00  
r[10]=events.event_type
35Ne 113810(BINARY)   53  if r[10]!=r[11] 
goto 38
36VColumn0 6 12   00  r[12]=vcolumn(6); 
atx_txlog.sync_offset
37ResultRow  121 000  output=r[12]
38  Next   1 33001


___
 Gunter Hick | Software Engineer | Scientific Games International GmbH | 
Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 
1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users