Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used

2015-02-04 Thread Josef Kucera

Hello,
I went forward and implemented the suggestion I made. It works quite nice, 
so I decided to share a patch 
(https://www.dropbox.com/sh/buim6b4hevg7db3/AAD75j1NkXSakx-1_iGu1QW4a?dl=0).


Statement:
I (the author of this code patch) dedicate any and all copyright interest in 
this code patch to the public domain. I make this dedication for the benefit 
of the public at large and to the detriment of my heirs and successors. I 
intend this dedication to be an overt act of relinquishment in perpetuity of 
all present and future rights to this code under copyright law.


Regards,
Joe

- Original Message - 
From: "Josef Kucera" 

To: "General Discussion of SQLite Database" 
Sent: Monday, December 15, 2014 5:37 PM
Subject: Re: [sqlite] Query Planner for Virtual Tables: link table 
evaluation & transitive property of constraints not used



- Original Message - 
From: "Nico Williams" 

To: "General Discussion of SQLite Database" 
Sent: Monday, December 15, 2014 5:16 PM
Subject: Re: [sqlite] Query Planner for Virtual Tables: link table 
evaluation & transitive property of constraints not used




On Mon, Dec 15, 2014 at 06:23:31PM +0700, Dan Kennedy wrote:

It's tricky. As you say, xBestIndex() will currently be invoked
twice - once with no constraints usable and once with both "b.id=?"
and "b.linkid=?" usable. I guess the reason it is not invoked in the
other ways you suggest is that that strategy might conceivably
require a huge number of xBestIndex() calls if there were more than
a few other tables in the join.


Perhaps there should be a method by which distinct indices could be
expressed by the virtual table to the SQLite3 query optimizer.

One possible compromise would be to call xBestIndex() with each
constraint, then once more with all of them.

Nico


Unfortunately this will not help much, as the implementation of the 
virtual table module will be complicated if multi-column join is involved.


As I follow the discussion, the best compromise can be:
- currently xBestIndex() is called four times at most - phase 0 to 3 (only 
const, without in, with "variables"/joins);
- phases 2 and 3 can call xBestInfo() once for all constraints (current 
behavior) and consecutively for constraints separated by tables used on 
the right (at most N-1 added calls, where N is the number of tables in the 
join);


Joe


--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users




___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used

2014-12-15 Thread Josef Kucera
- Original Message - 
From: "Nico Williams" 

To: "General Discussion of SQLite Database" 
Sent: Monday, December 15, 2014 5:16 PM
Subject: Re: [sqlite] Query Planner for Virtual Tables: link table 
evaluation & transitive property of constraints not used




On Mon, Dec 15, 2014 at 06:23:31PM +0700, Dan Kennedy wrote:

It's tricky. As you say, xBestIndex() will currently be invoked
twice - once with no constraints usable and once with both "b.id=?"
and "b.linkid=?" usable. I guess the reason it is not invoked in the
other ways you suggest is that that strategy might conceivably
require a huge number of xBestIndex() calls if there were more than
a few other tables in the join.


Perhaps there should be a method by which distinct indices could be
expressed by the virtual table to the SQLite3 query optimizer.

One possible compromise would be to call xBestIndex() with each
constraint, then once more with all of them.

Nico


Unfortunately this will not help much, as the implementation of the virtual 
table module will be complicated if multi-column join is involved.


As I follow the discussion, the best compromise can be:
- currently xBestIndex() is called four times at most - phase 0 to 3 (only 
const, without in, with "variables"/joins);
- phases 2 and 3 can call xBestInfo() once for all constraints (current 
behavior) and consecutively for constraints separated by tables used on the 
right (at most N-1 added calls, where N is the number of tables in the 
join);


Joe


--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used

2014-12-15 Thread Nico Williams
On Mon, Dec 15, 2014 at 06:23:31PM +0700, Dan Kennedy wrote:
> It's tricky. As you say, xBestIndex() will currently be invoked
> twice - once with no constraints usable and once with both "b.id=?"
> and "b.linkid=?" usable. I guess the reason it is not invoked in the
> other ways you suggest is that that strategy might conceivably
> require a huge number of xBestIndex() calls if there were more than
> a few other tables in the join.

Perhaps there should be a method by which distinct indices could be
expressed by the virtual table to the SQLite3 query optimizer.

One possible compromise would be to call xBestIndex() with each
constraint, then once more with all of them.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used

2014-12-15 Thread Josef Kučera
- Original Message - 
From: "Hick Gunter" 

To: "'General Discussion of SQLite Database'" 
Sent: Monday, December 15, 2014 2:40 PM
Subject: Re: [sqlite] Query Planner for Virtual Tables: link table 
evaluation & transitive property of constraints not used



I would concur in that SQLite is asking "which subset of the given 
constraints yields the most efficient access".


The possible query plans are

1) A() -> B(ID) -> C(LINKID)

2) C() -> B(LINKID) -> A(ID)

3) B() -> A(ID) + C(LINKID) or B() -> C(LINKID) + A(ID)

4) A() -> C() -> B(ID,LINKID) or C() -> A() -> B(ID,LINKID)

Assuming unique keys in A and C and cardinalities of a, b and c we have 
estimated costs (in # of records retrieved):


1) a + a*b/a + a*b/a*1 = a + 2b

2) c + c*b/c + c*b/c*1 = c + 2b

3) b + b*1 + b*1 = 3b

4) a + a*c + a*c*b/a/c = a + a*c + b (resp. c + a*c + b)

So which is the smallest cost?

We know that b <= a*c, which makes query plan 4 at least as expensive as 
plans 1 or 2 respectively.


Choosing between plans 1 and 2 means starting with the smaller of the two 
tables (assume a < c).


So how do plans 1 and 3 compare? Plan 3 is better only for very sparse 
link tables where b < a < c is true.




The reasoning is absolutely correct.

I forgot to mention, that most often the query has a where constraint 
(generated at runtime) limiting A or C to a small subset. In that scenario 
the smallest cost would be 1) or 2) respectively.


There is one more problem with plan 4): Assuming A and/or B and/or C has 
many records (100.000s) the query B(ID,LINKID) is executed for each and 
every record in A*C - if there is an constant cost for each B(ID,LINKID) 
execution then 4) is the most expensive option.


Joe 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used

2014-12-15 Thread Hick Gunter
I would concur in that SQLite is asking "which subset of the given constraints 
yields the most efficient access".

The possible query plans are

1) A() -> B(ID) -> C(LINKID)

2) C() -> B(LINKID) -> A(ID)

3) B() -> A(ID) + C(LINKID) or B() -> C(LINKID) + A(ID)

4) A() -> C() -> B(ID,LINKID) or C() -> A() -> B(ID,LINKID)

Assuming unique keys in A and C and cardinalities of a, b and c we have 
estimated costs (in # of records retrieved):

1) a + a*b/a + a*b/a*1 = a + 2b

2) c + c*b/c + c*b/c*1 = c + 2b

3) b + b*1 + b*1 = 3b

4) a + a*c + a*c*b/a/c = a + a*c + b (resp. c + a*c + b)

So which is the smallest cost?

We know that b <= a*c, which makes query plan 4 at least as expensive as plans 
1 or 2 respectively.

Choosing between plans 1 and 2 means starting with the smaller of the two 
tables (assume a < c).

So how do plans 1 and 3 compare? Plan 3 is better only for very sparse link 
tables where b < a < c is true.



-Ursprüngliche Nachricht-
Von: Dan Kennedy [mailto:danielk1...@gmail.com]
Gesendet: Montag, 15. Dezember 2014 12:24
An: General Discussion of SQLite Database
Betreff: Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & 
transitive property of constraints not used

On 12/12/2014 09:22 PM, Josef Kučera wrote:
> Hello,
> I am trying to use SQLite's marvellous Virtual Table mechanism as a
> SQL layer for querying an in memory storage. This works good, but I
> have a problem with more complex queries. When querying a real SQLite
> database it correctly moves the constant conditions across joined
> tables to optimize the execution plan (I think this was implemented in the 
> 3.7.17 release).
> Unfortunately for virtual tables this does not seem to be supported. I
> can overcome this limitation by manually tuning the SQL, but it will
> help if the query planner can do this automatically.
>
> The major problem I have is with link table evaluation. Imagine a SQL
> like "select * from A join B on A.ID=B.ID join C on C.ID=B.LINKID".
> The current implementation evaluates cost of B only as B (ID, LINKID)
> causing the execution to perform a full scan on either A or C. This
> seems to be caused by the implementation of whereLoopAddVirtual()
> function. I think it should evaluate cost for terms separated by tables in 
> the right term as well, e.g.
> for the mentioned SQL, table B, it should try B(), B(ID), B(LINKID),
> B(ID,
> LINKID) instead of only B() and B(ID, LINKID).
>
> What should I do?

You want this (or the same thing with the roles of "A" and "C" reversed):

   * a full-scan on A,
   * a lookup on B by (b.id=?)
   * a lookup on C by (c.id=?)

correct?

It's tricky. As you say, xBestIndex() will currently be invoked twice - once 
with no constraints usable and once with both "b.id=?" and "b.linkid=?" usable. 
I guess the reason it is not invoked in the other ways you suggest is that that 
strategy might conceivably require a huge number of xBestIndex() calls if there 
were more than a few other tables in the join.

You could change the query so that only one of the constraints is visible to 
the virtual table implementation. Say:

   select * from A join B on A.ID=B.ID join C on C.ID=+B.LINKID

Or rework the virtual table code so that it knows only to use one of "b.id=?" 
or "b.linkid=?" at a time. If the xBestIndex only uses one of the constraints, 
the planner should do the right thing.

Dan.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used

2014-12-15 Thread Josef Kučera
On 12/15/2014 13:23 PM, Dan Kennedy wrote:
> On 12/12/2014 09:22 PM, Josef Kučera wrote:
> > Hello,
> > I am trying to use SQLite's marvellous Virtual Table mechanism as a SQL
> > layer for querying an in memory storage. This works good, but I have a
> > problem with more complex queries. When querying a real SQLite database it
> > correctly moves the constant conditions across joined tables to optimize
> > the execution plan (I think this was implemented in the 3.7.17 release).
> > Unfortunately for virtual tables this does not seem to be supported. I can
> > overcome this limitation by manually tuning the SQL, but it will help if
> > the query planner can do this automatically.
> >
> > The major problem I have is with link table evaluation. Imagine a SQL like
> > "select * from A join B on A.ID=B.ID join C on C.ID=B.LINKID". The current
> > implementation evaluates cost of B only as B (ID, LINKID) causing the
> > execution to perform a full scan on either A or C. This seems to be caused
> > by the implementation of whereLoopAddVirtual() function. I think it should
> > evaluate cost for terms separated by tables in the right term as well, e.g.
> > for the mentioned SQL, table B, it should try B(), B(ID), B(LINKID), B(ID,
> > LINKID) instead of only B() and B(ID, LINKID).
> >
> > What should I do?
>
> You want this (or the same thing with the roles of "A" and "C" reversed):
>
>* a full-scan on A,
>* a lookup on B by (b.id=?)
>* a lookup on C by (c.id=?)
>
> correct?
>
Yes, this is exactly what I want. It makes even more sense when there
is a WHERE
condition on table A.

> It's tricky. As you say, xBestIndex() will currently be invoked twice -
> once with no constraints usable and once with both "b.id=?" and
> "b.linkid=?" usable. I guess the reason it is not invoked in the other
> ways you suggest is that that strategy might conceivably require a huge
> number of xBestIndex() calls if there were more than a few other tables
> in the join.
>
You are absolutely correct. I do not think calling xBestIndex() for every
possible table combination is possible (too much xBestIndex calls and too much
WhereLoop variants to evaluate). I thought about adding a single call for each
table in the join, that could keep the amount of xBestIndex() calls reasonable,
and really help for this type of joined queries.

> You could change the query so that only one of the constraints is
> visible to the virtual table implementation. Say:
>
>select * from A join B on A.ID=B.ID join C on C.ID=+B.LINKID
>
This is great, it gives the possibility to choose the plan by the command.
Many thanks for the tip.

> Or rework the virtual table code so that it knows only to use one of
> "b.id=?" or "b.linkid=?" at a time. If the xBestIndex only uses one of
> the constraints, the planner should do the right thing.
>
Unfortunately this would be hard to implement, currently I use a generic
virtual table mechanism to make cross-database queries. I hoped the decision
which condition to use could be made by the planner, without hard-coding it in
the virtual table implementation.

> Dan.

Joe
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used

2014-12-15 Thread Dan Kennedy

On 12/12/2014 09:22 PM, Josef Kučera wrote:

Hello,
I am trying to use SQLite's marvellous Virtual Table mechanism as a SQL
layer for querying an in memory storage. This works good, but I have a
problem with more complex queries. When querying a real SQLite database it
correctly moves the constant conditions across joined tables to optimize
the execution plan (I think this was implemented in the 3.7.17 release).
Unfortunately for virtual tables this does not seem to be supported. I can
overcome this limitation by manually tuning the SQL, but it will help if
the query planner can do this automatically.

The major problem I have is with link table evaluation. Imagine a SQL like
"select * from A join B on A.ID=B.ID join C on C.ID=B.LINKID". The current
implementation evaluates cost of B only as B (ID, LINKID) causing the
execution to perform a full scan on either A or C. This seems to be caused
by the implementation of whereLoopAddVirtual() function. I think it should
evaluate cost for terms separated by tables in the right term as well, e.g.
for the mentioned SQL, table B, it should try B(), B(ID), B(LINKID), B(ID,
LINKID) instead of only B() and B(ID, LINKID).

What should I do?


You want this (or the same thing with the roles of "A" and "C" reversed):

  * a full-scan on A,
  * a lookup on B by (b.id=?)
  * a lookup on C by (c.id=?)

correct?

It's tricky. As you say, xBestIndex() will currently be invoked twice - 
once with no constraints usable and once with both "b.id=?" and 
"b.linkid=?" usable. I guess the reason it is not invoked in the other 
ways you suggest is that that strategy might conceivably require a huge 
number of xBestIndex() calls if there were more than a few other tables 
in the join.


You could change the query so that only one of the constraints is 
visible to the virtual table implementation. Say:


  select * from A join B on A.ID=B.ID join C on C.ID=+B.LINKID

Or rework the virtual table code so that it knows only to use one of 
"b.id=?" or "b.linkid=?" at a time. If the xBestIndex only uses one of 
the constraints, the planner should do the right thing.


Dan.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used

2014-12-12 Thread Josef Kučera
Hello,
I am trying to use SQLite's marvellous Virtual Table mechanism as a SQL
layer for querying an in memory storage. This works good, but I have a
problem with more complex queries. When querying a real SQLite database it
correctly moves the constant conditions across joined tables to optimize
the execution plan (I think this was implemented in the 3.7.17 release).
Unfortunately for virtual tables this does not seem to be supported. I can
overcome this limitation by manually tuning the SQL, but it will help if
the query planner can do this automatically.

The major problem I have is with link table evaluation. Imagine a SQL like
"select * from A join B on A.ID=B.ID join C on C.ID=B.LINKID". The current
implementation evaluates cost of B only as B (ID, LINKID) causing the
execution to perform a full scan on either A or C. This seems to be caused
by the implementation of whereLoopAddVirtual() function. I think it should
evaluate cost for terms separated by tables in the right term as well, e.g.
for the mentioned SQL, table B, it should try B(), B(ID), B(LINKID), B(ID,
LINKID) instead of only B() and B(ID, LINKID).

What should I do?

Best regards,
Joe
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users