Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used
Hello, I went forward and implemented the suggestion I made. It works quite nice, so I decided to share a patch (https://www.dropbox.com/sh/buim6b4hevg7db3/AAD75j1NkXSakx-1_iGu1QW4a?dl=0). Statement: I (the author of this code patch) dedicate any and all copyright interest in this code patch to the public domain. I make this dedication for the benefit of the public at large and to the detriment of my heirs and successors. I intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this code under copyright law. Regards, Joe - Original Message - From: "Josef Kucera" To: "General Discussion of SQLite Database" Sent: Monday, December 15, 2014 5:37 PM Subject: Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used - Original Message - From: "Nico Williams" To: "General Discussion of SQLite Database" Sent: Monday, December 15, 2014 5:16 PM Subject: Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used On Mon, Dec 15, 2014 at 06:23:31PM +0700, Dan Kennedy wrote: It's tricky. As you say, xBestIndex() will currently be invoked twice - once with no constraints usable and once with both "b.id=?" and "b.linkid=?" usable. I guess the reason it is not invoked in the other ways you suggest is that that strategy might conceivably require a huge number of xBestIndex() calls if there were more than a few other tables in the join. Perhaps there should be a method by which distinct indices could be expressed by the virtual table to the SQLite3 query optimizer. One possible compromise would be to call xBestIndex() with each constraint, then once more with all of them. Nico Unfortunately this will not help much, as the implementation of the virtual table module will be complicated if multi-column join is involved. As I follow the discussion, the best compromise can be: - currently xBestIndex() is called four times at most - phase 0 to 3 (only const, without in, with "variables"/joins); - phases 2 and 3 can call xBestInfo() once for all constraints (current behavior) and consecutively for constraints separated by tables used on the right (at most N-1 added calls, where N is the number of tables in the join); Joe -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used
- Original Message - From: "Nico Williams" To: "General Discussion of SQLite Database" Sent: Monday, December 15, 2014 5:16 PM Subject: Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used On Mon, Dec 15, 2014 at 06:23:31PM +0700, Dan Kennedy wrote: It's tricky. As you say, xBestIndex() will currently be invoked twice - once with no constraints usable and once with both "b.id=?" and "b.linkid=?" usable. I guess the reason it is not invoked in the other ways you suggest is that that strategy might conceivably require a huge number of xBestIndex() calls if there were more than a few other tables in the join. Perhaps there should be a method by which distinct indices could be expressed by the virtual table to the SQLite3 query optimizer. One possible compromise would be to call xBestIndex() with each constraint, then once more with all of them. Nico Unfortunately this will not help much, as the implementation of the virtual table module will be complicated if multi-column join is involved. As I follow the discussion, the best compromise can be: - currently xBestIndex() is called four times at most - phase 0 to 3 (only const, without in, with "variables"/joins); - phases 2 and 3 can call xBestInfo() once for all constraints (current behavior) and consecutively for constraints separated by tables used on the right (at most N-1 added calls, where N is the number of tables in the join); Joe -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used
On Mon, Dec 15, 2014 at 06:23:31PM +0700, Dan Kennedy wrote: > It's tricky. As you say, xBestIndex() will currently be invoked > twice - once with no constraints usable and once with both "b.id=?" > and "b.linkid=?" usable. I guess the reason it is not invoked in the > other ways you suggest is that that strategy might conceivably > require a huge number of xBestIndex() calls if there were more than > a few other tables in the join. Perhaps there should be a method by which distinct indices could be expressed by the virtual table to the SQLite3 query optimizer. One possible compromise would be to call xBestIndex() with each constraint, then once more with all of them. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used
- Original Message - From: "Hick Gunter" To: "'General Discussion of SQLite Database'" Sent: Monday, December 15, 2014 2:40 PM Subject: Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used I would concur in that SQLite is asking "which subset of the given constraints yields the most efficient access". The possible query plans are 1) A() -> B(ID) -> C(LINKID) 2) C() -> B(LINKID) -> A(ID) 3) B() -> A(ID) + C(LINKID) or B() -> C(LINKID) + A(ID) 4) A() -> C() -> B(ID,LINKID) or C() -> A() -> B(ID,LINKID) Assuming unique keys in A and C and cardinalities of a, b and c we have estimated costs (in # of records retrieved): 1) a + a*b/a + a*b/a*1 = a + 2b 2) c + c*b/c + c*b/c*1 = c + 2b 3) b + b*1 + b*1 = 3b 4) a + a*c + a*c*b/a/c = a + a*c + b (resp. c + a*c + b) So which is the smallest cost? We know that b <= a*c, which makes query plan 4 at least as expensive as plans 1 or 2 respectively. Choosing between plans 1 and 2 means starting with the smaller of the two tables (assume a < c). So how do plans 1 and 3 compare? Plan 3 is better only for very sparse link tables where b < a < c is true. The reasoning is absolutely correct. I forgot to mention, that most often the query has a where constraint (generated at runtime) limiting A or C to a small subset. In that scenario the smallest cost would be 1) or 2) respectively. There is one more problem with plan 4): Assuming A and/or B and/or C has many records (100.000s) the query B(ID,LINKID) is executed for each and every record in A*C - if there is an constant cost for each B(ID,LINKID) execution then 4) is the most expensive option. Joe ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used
I would concur in that SQLite is asking "which subset of the given constraints yields the most efficient access". The possible query plans are 1) A() -> B(ID) -> C(LINKID) 2) C() -> B(LINKID) -> A(ID) 3) B() -> A(ID) + C(LINKID) or B() -> C(LINKID) + A(ID) 4) A() -> C() -> B(ID,LINKID) or C() -> A() -> B(ID,LINKID) Assuming unique keys in A and C and cardinalities of a, b and c we have estimated costs (in # of records retrieved): 1) a + a*b/a + a*b/a*1 = a + 2b 2) c + c*b/c + c*b/c*1 = c + 2b 3) b + b*1 + b*1 = 3b 4) a + a*c + a*c*b/a/c = a + a*c + b (resp. c + a*c + b) So which is the smallest cost? We know that b <= a*c, which makes query plan 4 at least as expensive as plans 1 or 2 respectively. Choosing between plans 1 and 2 means starting with the smaller of the two tables (assume a < c). So how do plans 1 and 3 compare? Plan 3 is better only for very sparse link tables where b < a < c is true. -Ursprüngliche Nachricht- Von: Dan Kennedy [mailto:danielk1...@gmail.com] Gesendet: Montag, 15. Dezember 2014 12:24 An: General Discussion of SQLite Database Betreff: Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used On 12/12/2014 09:22 PM, Josef Kučera wrote: > Hello, > I am trying to use SQLite's marvellous Virtual Table mechanism as a > SQL layer for querying an in memory storage. This works good, but I > have a problem with more complex queries. When querying a real SQLite > database it correctly moves the constant conditions across joined > tables to optimize the execution plan (I think this was implemented in the > 3.7.17 release). > Unfortunately for virtual tables this does not seem to be supported. I > can overcome this limitation by manually tuning the SQL, but it will > help if the query planner can do this automatically. > > The major problem I have is with link table evaluation. Imagine a SQL > like "select * from A join B on A.ID=B.ID join C on C.ID=B.LINKID". > The current implementation evaluates cost of B only as B (ID, LINKID) > causing the execution to perform a full scan on either A or C. This > seems to be caused by the implementation of whereLoopAddVirtual() > function. I think it should evaluate cost for terms separated by tables in > the right term as well, e.g. > for the mentioned SQL, table B, it should try B(), B(ID), B(LINKID), > B(ID, > LINKID) instead of only B() and B(ID, LINKID). > > What should I do? You want this (or the same thing with the roles of "A" and "C" reversed): * a full-scan on A, * a lookup on B by (b.id=?) * a lookup on C by (c.id=?) correct? It's tricky. As you say, xBestIndex() will currently be invoked twice - once with no constraints usable and once with both "b.id=?" and "b.linkid=?" usable. I guess the reason it is not invoked in the other ways you suggest is that that strategy might conceivably require a huge number of xBestIndex() calls if there were more than a few other tables in the join. You could change the query so that only one of the constraints is visible to the virtual table implementation. Say: select * from A join B on A.ID=B.ID join C on C.ID=+B.LINKID Or rework the virtual table code so that it knows only to use one of "b.id=?" or "b.linkid=?" at a time. If the xBestIndex only uses one of the constraints, the planner should do the right thing. Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ Gunter Hick Software Engineer Scientific Games International GmbH FN 157284 a, HG Wien Klitschgasse 2-4, A-1130 Vienna, Austria Tel: +43 1 80100 0 E-Mail: h...@scigames.at This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used
On 12/15/2014 13:23 PM, Dan Kennedy wrote: > On 12/12/2014 09:22 PM, Josef Kučera wrote: > > Hello, > > I am trying to use SQLite's marvellous Virtual Table mechanism as a SQL > > layer for querying an in memory storage. This works good, but I have a > > problem with more complex queries. When querying a real SQLite database it > > correctly moves the constant conditions across joined tables to optimize > > the execution plan (I think this was implemented in the 3.7.17 release). > > Unfortunately for virtual tables this does not seem to be supported. I can > > overcome this limitation by manually tuning the SQL, but it will help if > > the query planner can do this automatically. > > > > The major problem I have is with link table evaluation. Imagine a SQL like > > "select * from A join B on A.ID=B.ID join C on C.ID=B.LINKID". The current > > implementation evaluates cost of B only as B (ID, LINKID) causing the > > execution to perform a full scan on either A or C. This seems to be caused > > by the implementation of whereLoopAddVirtual() function. I think it should > > evaluate cost for terms separated by tables in the right term as well, e.g. > > for the mentioned SQL, table B, it should try B(), B(ID), B(LINKID), B(ID, > > LINKID) instead of only B() and B(ID, LINKID). > > > > What should I do? > > You want this (or the same thing with the roles of "A" and "C" reversed): > >* a full-scan on A, >* a lookup on B by (b.id=?) >* a lookup on C by (c.id=?) > > correct? > Yes, this is exactly what I want. It makes even more sense when there is a WHERE condition on table A. > It's tricky. As you say, xBestIndex() will currently be invoked twice - > once with no constraints usable and once with both "b.id=?" and > "b.linkid=?" usable. I guess the reason it is not invoked in the other > ways you suggest is that that strategy might conceivably require a huge > number of xBestIndex() calls if there were more than a few other tables > in the join. > You are absolutely correct. I do not think calling xBestIndex() for every possible table combination is possible (too much xBestIndex calls and too much WhereLoop variants to evaluate). I thought about adding a single call for each table in the join, that could keep the amount of xBestIndex() calls reasonable, and really help for this type of joined queries. > You could change the query so that only one of the constraints is > visible to the virtual table implementation. Say: > >select * from A join B on A.ID=B.ID join C on C.ID=+B.LINKID > This is great, it gives the possibility to choose the plan by the command. Many thanks for the tip. > Or rework the virtual table code so that it knows only to use one of > "b.id=?" or "b.linkid=?" at a time. If the xBestIndex only uses one of > the constraints, the planner should do the right thing. > Unfortunately this would be hard to implement, currently I use a generic virtual table mechanism to make cross-database queries. I hoped the decision which condition to use could be made by the planner, without hard-coding it in the virtual table implementation. > Dan. Joe ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used
On 12/12/2014 09:22 PM, Josef Kučera wrote: Hello, I am trying to use SQLite's marvellous Virtual Table mechanism as a SQL layer for querying an in memory storage. This works good, but I have a problem with more complex queries. When querying a real SQLite database it correctly moves the constant conditions across joined tables to optimize the execution plan (I think this was implemented in the 3.7.17 release). Unfortunately for virtual tables this does not seem to be supported. I can overcome this limitation by manually tuning the SQL, but it will help if the query planner can do this automatically. The major problem I have is with link table evaluation. Imagine a SQL like "select * from A join B on A.ID=B.ID join C on C.ID=B.LINKID". The current implementation evaluates cost of B only as B (ID, LINKID) causing the execution to perform a full scan on either A or C. This seems to be caused by the implementation of whereLoopAddVirtual() function. I think it should evaluate cost for terms separated by tables in the right term as well, e.g. for the mentioned SQL, table B, it should try B(), B(ID), B(LINKID), B(ID, LINKID) instead of only B() and B(ID, LINKID). What should I do? You want this (or the same thing with the roles of "A" and "C" reversed): * a full-scan on A, * a lookup on B by (b.id=?) * a lookup on C by (c.id=?) correct? It's tricky. As you say, xBestIndex() will currently be invoked twice - once with no constraints usable and once with both "b.id=?" and "b.linkid=?" usable. I guess the reason it is not invoked in the other ways you suggest is that that strategy might conceivably require a huge number of xBestIndex() calls if there were more than a few other tables in the join. You could change the query so that only one of the constraints is visible to the virtual table implementation. Say: select * from A join B on A.ID=B.ID join C on C.ID=+B.LINKID Or rework the virtual table code so that it knows only to use one of "b.id=?" or "b.linkid=?" at a time. If the xBestIndex only uses one of the constraints, the planner should do the right thing. Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used
Hello, I am trying to use SQLite's marvellous Virtual Table mechanism as a SQL layer for querying an in memory storage. This works good, but I have a problem with more complex queries. When querying a real SQLite database it correctly moves the constant conditions across joined tables to optimize the execution plan (I think this was implemented in the 3.7.17 release). Unfortunately for virtual tables this does not seem to be supported. I can overcome this limitation by manually tuning the SQL, but it will help if the query planner can do this automatically. The major problem I have is with link table evaluation. Imagine a SQL like "select * from A join B on A.ID=B.ID join C on C.ID=B.LINKID". The current implementation evaluates cost of B only as B (ID, LINKID) causing the execution to perform a full scan on either A or C. This seems to be caused by the implementation of whereLoopAddVirtual() function. I think it should evaluate cost for terms separated by tables in the right term as well, e.g. for the mentioned SQL, table B, it should try B(), B(ID), B(LINKID), B(ID, LINKID) instead of only B() and B(ID, LINKID). What should I do? Best regards, Joe ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users