[sqlite] BestIndex problem
Thank you so much for your answers, we managed to do what we wanted. Best, Manos Stamatis On 21 May 2015 at 21:42, Dan Kennedy wrote: > On 05/21/2015 10:20 PM, Emmanouil Karvounis wrote: > >> Greetings, >> >> We are having an issue with BestIndex in our Virtual Table implementation. >> Allow us to illustrate the problem with a simple example. >> >> We create a virtual table named 'vt' that conceptually has 3 columns >> "value1, value2, value3" and then we want to execute the following >> queries: >> >> 1) select value1 from vt where value1 = 7; >> >> In this case, BestIndex passes the equal constraint on the first column >> and >> by setting the output as pdxInfo->aConstraintUsage[0].argvIndex = 1, >> we indicate that we accept this specific constraint. So, at Filter we get >> the value 7 as argument0. Everything behaves normally so far. >> >> However, in case we run any of the following equivalent queries (that >> should pass to Filter more than one value), we get an error message >> "xBestIndex returned an invalid plan": >> >> 2) select value1 from vt where value1 = 7 or value1 = 8; >> 3) select value1 from vt where value1 in (select * from tableA);// >> suppose tableA contains an integer 'id' column and records (7, 8) >> 4) select value1 from vt, tableA where value1 = tableA.id; >> >> Again, in each case we set pdxInfo->aConstraintUsage[0].argvIndex = 1 but >> we get the above error message. >> >> This behavior seems rather weird, so we'd like some expert help on what we >> might be doing wrong. >> > > That happens if you set the argvIndex variable on a constraint for which > the "usable" flag is not set. For each constraint in the aConstraint[] > array, you need to check that the "usable" flag is set - and ignore the > constraint if it is not. > > Search for "usable" in this section of the docs: > > https://www.sqlite.org/vtab.html#xbestindex > > Dan. > > > > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >
[sqlite] BestIndex problem
Greetings, We are having an issue with BestIndex in our Virtual Table implementation. Allow us to illustrate the problem with a simple example. We create a virtual table named 'vt' that conceptually has 3 columns "value1, value2, value3" and then we want to execute the following queries: 1) select value1 from vt where value1 = 7; In this case, BestIndex passes the equal constraint on the first column and by setting the output as pdxInfo->aConstraintUsage[0].argvIndex = 1, we indicate that we accept this specific constraint. So, at Filter we get the value 7 as argument0. Everything behaves normally so far. However, in case we run any of the following equivalent queries (that should pass to Filter more than one value), we get an error message "xBestIndex returned an invalid plan": 2) select value1 from vt where value1 = 7 or value1 = 8; 3) select value1 from vt where value1 in (select * from tableA);// suppose tableA contains an integer 'id' column and records (7, 8) 4) select value1 from vt, tableA where value1 = tableA.id; Again, in each case we set pdxInfo->aConstraintUsage[0].argvIndex = 1 but we get the above error message. This behavior seems rather weird, so we'd like some expert help on what we might be doing wrong. Thank you for your time, Manos Karvounis Stamatis Christoforidis
Re: [sqlite] Streaming group by in union of already sorted tables
Thank you very much, Richard. I'm sure this enhancement, modelled in a more general way, e.g., like an isSorted flag on the subquery to be used by the outer query, can be a great enhancement for many other types of queries employing nesting. I think it will help both in time and in space complexity, something very important when aiming at the embedded systems enviroment. Let me know if you need us to provide any further details or other assistance. Regards, Manos On 23 January 2015 at 21:30, Richard Hipp <d...@sqlite.org> wrote: > On 1/23/15, Emmanouil Karvounis <man...@di.uoa.gr> wrote: > > We have two tables that are already sorted on a combination of > > two fields (which are their primary keys) and we want to union them and > > apply group by on both the aforementioned fields, like so: > > > > select c1, c2, myagg(*) from ( > > select * from tableA > > union all > > select * from tableB > > ) > > group by c1, c2; > > > > where tableA, tableB have primary key (c1, c2) and their schema > comprises 3 > > integers: c1, c2, and prop. > > > > The sqlite query plan creates a temporary B-tree to hold all the records > of > > both tables to execute the group by. > > You are correct. Here is the test case I am using: > > CREATE TABLE tableA(c1 INT, c2 INT, payload INT, PRIMARY KEY(c1,c2)); > CREATE TABLE tableB(c1 INT, c2 INT, payload INT, PRIMARY KEY(c1,c2)); > > explain query plan > SELECT * FROM tableA > UNION ALL > SELECT * FROM tableB > ORDER BY 1, 2; > > .print - > > explain query plan > SELECT c1, c2, sum(payload) FROM ( > SELECT * FROM tableA > UNION ALL > SELECT * FROM tableB > ) > GROUP BY c1, c2; > > The first query - the UNION ALL with the ORDER BY does not use a > separate B-tree for sorting. So it seems logical that the second > query that does a GROUP BY over the same UNION ALL should be able to > get by without a separate sorting B-Tree too. > > For various technical reasons, this is complex change in SQLite. But > it seems like a worthwhile enhancement, so we'll take your suggestion > under advisement and attempt to do a better job of handling your query > in future releases. > > Thanks for posting. > > -- > D. Richard Hipp > d...@sqlite.org > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Streaming group by in union of already sorted tables
> > Sorry, but SQLite does not understand how the subquery (inside the > brackets) is going to be used by the main query. It hqs to complete the > subquery first and only then can it inspect the main query to find out how > to optimize it. This is not a bug, there just isn't enough flexibility to > do this the way you want. This is unfortunate, especially for an RDBMS aiming to minimize memory consumption for embedded systems. I believe there are many other queries where optimizing the subqueries with regard to the external query will result in optimal memory usage. > One of the following may or may not be useful: > > You may be able to use > > select c1, c2, myagg(*) from ( > select c1, c2 from tableA group by c1,c2 > union all > select c1, c2 from tableB group by c1,c2 > ) group by c1,c2 > > Thank you for your suggestion. But the query plan is the same, how would that be any faster? We actually run an experiment and it is a bit slower, I suppose because of the additional overhead of creating each group in the subquery, which doesn't contribute anything given the way the external query is handled. Alternatively, is there a good reason for tableA and tableB not to be > merged with, perhaps, an extra column indicating 'A' or 'B' ? This would > allow you to create an index and get your answer almost instantly. When > you see two tables with the same columns it's often an indication that > there should really be one table. > This is an alternative we wish to explore, but in our usecase it would be considered a hack, and might cause side-effects in other parts of the code. If we are forced to do it, we might be able to pull it off though. Thank you for your suggestion. Best, Manos > > Simon. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Streaming group by in union of already sorted tables
To explain that with an example: tableA (1,2) (2,3) (2,4) (3,5) tableB (1,2) (2,3) (4,6) Get a pointer on tableA and one on tableB. (1,2) and (1,2) form a group, run count(*) and output 2. Advance both pointers. (2,3) and (2,3) form a group, run count(*) and output 2. Advance both pointers. (2,4) is less than (4,6) (because 2 < 4). (2,4) forms a group, run count(*) and output 1. Advance pointer in tableA. (3,5) is less than (4,6) (because 3 < 5). (3,5) forms a group, run count(*) and output 1. pointer on table A is depleted. Continue with tableB only. (4,6) forms a group. run count(*) and output 1. The sqlite plan calls for merging both tables in a B-tree and scanning it, which incurs unneeded time and space complecity compared to the above. Manos On 23 January 2015 at 18:59, Emmanouil Karvounis <man...@di.uoa.gr> wrote: > Dear Simon, > > Thank you for your answer and I'm sorry if I have used inappropriate > wording and confused you. > > The issue is actually very simple: > > tableA and tableB have both primary key on (c1, c2) > > explain query plan > select c1, c2, count(*) from ( > select c1, c2 from tableA > union all > select c1, c2 from tableB > ) > group by c1,c2 > > 2|0|0|SCAN TABLE tableA USING COVERING INDEX sqlite_autoindex_tableA_1 > 3|0|0|SCAN TABLE tableB USING COVERING INDEX sqlite_autoindex_tableB_1 > 1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION ALL) > 0|0|0|SCAN SUBQUERY 1 > 0|0|0|USE TEMP B-TREE FOR GROUP BY > > There is no reason to create a new temp B-tree when you can sequentially > and in-synch scan the B-tree of tableA and of tableB and get the groups in > one pass. Think of how you would execute the sort step of mergesort on two > already sorted subarrays. > > > Manos > > On 23 January 2015 at 18:42, Simon Slavin <slav...@bigfraud.org> wrote: > >> >> On 23 Jan 2015, at 4:15pm, Emmanouil Karvounis <man...@di.uoa.gr> wrote: >> >> > In short, we have two tables that are already sorted on a combination of >> > two fields >> >> There is no such thing as a 'sorted table' in SQL. Each table is a set >> of rows and the rows have no order. >> >> If you want to make it easy for SQL to access a table's rows in a >> particular order, create an index or make that order the table's primary >> key (which is another way of making an index). >> >> > select c1, c2, myagg(*) from ( >> > select * from tableA >> > union all >> > select * from tableB >> > ) >> > group by c1, c2; >> >> This command tells SQL that you want to construct a list of every row of >> tableA and every row of tableB. In other words, if you have 300 rows in >> tableA and 500 rows in tableB, you are telling SQL to construct a new table >> of 800 rows. And because this table doesn't yet exist, it doesn't have any >> indexes so it can't be searched quickly. Is that what you wanted ? >> >> Is there a good reason for needing this data in two separate tables >> rather than one for which you can create an index on (c1, c2) ? >> >> Do the groups occur entirely within one table or do you have to add the >> tables together before SQL can figure out the groups. >> >> > where tableA, tableB have primary key (c1, c2) and their schema >> comprises 3 >> > integers: c1, c2, and prop. >> >> It might be worth testing with something like 'total(*)' just to make >> sure it isn't your own function which is causing the problems. >> >> Simon. >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Streaming group by in union of already sorted tables
Dear Simon, Thank you for your answer and I'm sorry if I have used inappropriate wording and confused you. The issue is actually very simple: tableA and tableB have both primary key on (c1, c2) explain query plan select c1, c2, count(*) from ( select c1, c2 from tableA union all select c1, c2 from tableB ) group by c1,c2 2|0|0|SCAN TABLE tableA USING COVERING INDEX sqlite_autoindex_tableA_1 3|0|0|SCAN TABLE tableB USING COVERING INDEX sqlite_autoindex_tableB_1 1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION ALL) 0|0|0|SCAN SUBQUERY 1 0|0|0|USE TEMP B-TREE FOR GROUP BY There is no reason to create a new temp B-tree when you can sequentially and in-synch scan the B-tree of tableA and of tableB and get the groups in one pass. Think of how you would execute the sort step of mergesort on two already sorted subarrays. Manos On 23 January 2015 at 18:42, Simon Slavin <slav...@bigfraud.org> wrote: > > On 23 Jan 2015, at 4:15pm, Emmanouil Karvounis <man...@di.uoa.gr> wrote: > > > In short, we have two tables that are already sorted on a combination of > > two fields > > There is no such thing as a 'sorted table' in SQL. Each table is a set of > rows and the rows have no order. > > If you want to make it easy for SQL to access a table's rows in a > particular order, create an index or make that order the table's primary > key (which is another way of making an index). > > > select c1, c2, myagg(*) from ( > > select * from tableA > > union all > > select * from tableB > > ) > > group by c1, c2; > > This command tells SQL that you want to construct a list of every row of > tableA and every row of tableB. In other words, if you have 300 rows in > tableA and 500 rows in tableB, you are telling SQL to construct a new table > of 800 rows. And because this table doesn't yet exist, it doesn't have any > indexes so it can't be searched quickly. Is that what you wanted ? > > Is there a good reason for needing this data in two separate tables rather > than one for which you can create an index on (c1, c2) ? > > Do the groups occur entirely within one table or do you have to add the > tables together before SQL can figure out the groups. > > > where tableA, tableB have primary key (c1, c2) and their schema > comprises 3 > > integers: c1, c2, and prop. > > It might be worth testing with something like 'total(*)' just to make sure > it isn't your own function which is causing the problems. > > Simon. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Streaming group by in union of already sorted tables
Thank you Clemens. In the general case, myagg() doesn't have the appropriate property to distribute over tableA and tableB seperately. Let me be clear, we are not looking for alternatives to get our query running more efficiently, we sincerely believe that we have found a defect in the sqlite query planner. This is a bug report, not a request for assistance on formulating our query. I do appreciate the effort and your good will, though. Thank you again. Manos On 23 January 2015 at 18:48, Clemens Ladisch <clem...@ladisch.de> wrote: > Emmanouil Karvounis wrote: > > In short, we have two tables that are already sorted on a combination of > > two fields (which are their primary keys) and we want to union them and > > apply group by on both the aforementioned fields, like so: > > > > select c1, c2, myagg(*) from ( > > select * from tableA > > union all > > select * from tableB > > ) > > group by c1, c2; > > > > The sqlite query plan creates a temporary B-tree to hold all the records > of > > both tables to execute the group by. This incurs too much time and space > > complexity > > If the number of result rows is small compared to the number of table > rows, and if myagg() works correctly when called on its own results, > then you can reduce the size of the temporary B-tree by doing another > GROUP BY in the subqueries: > > select c1, c2, myagg(prop) from ( > select c1, c2, myagg(prop) as prop from tableA group by c1, c2 > union all > select c1, c2, myagg(prop) from tableB group by c2, c2 > ) > group by c1, c2; > > > Regards, > Clemens > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Streaming group by in union of already sorted tables
Greetings, We are having an issue with the sqlite query plan for one of our queries, which seems to be sub-optimal both in time and in space complexity. In short, we have two tables that are already sorted on a combination of two fields (which are their primary keys) and we want to union them and apply group by on both the aforementioned fields, like so: select c1, c2, myagg(*) from ( select * from tableA union all select * from tableB ) group by c1, c2; where tableA, tableB have primary key (c1, c2) and their schema comprises 3 integers: c1, c2, and prop. The sqlite query plan creates a temporary B-tree to hold all the records of both tables to execute the group by. This incurs too much time and space complexity compared to the better plan of sequentially iterating over both tables using their B-tree indices on the primary key (since they contain the records already sorted in the group by fields). Like what one would do on the sort step of mergesort to sort two already sorted lists. When we manually implemented the idea of the sequential scan of the two tables and manually gathering the groups and applying the aggregate function, we had a 2X speedup compared to the sqlite plan of creating a temp B-tree. Of course, this speedup would be even greater if we could push the agg function inside the sqlite engine and having it do the sequential scanning plan on the indices directly. Note: In the general case, tableA and tableB have a non empty intersection and neither one is a subset of the other. At the end of the email we provide extensive details for the tables we use and the query we want to run, plus a rundown on the complexity of the sqlite plan and our proposed solution. We are at your disposal for any further clarifications and we are looking forward to your reaction on this. Kind Regards, Manos Karvounis -- SQLite version: 3.8.8.1 (latest) pragma table_info(tableA); 0|c1|int(11)|0||1 1|c2|int(11)|0||2 2|prop|int(11)|0||0 pragma table_info(tableB); 0|c1|int(11)|0||1 1|c2|int(11)|0||2 2|prop|int(11)|0||0 select count(*) from tableA; 11095298 select count(*) from tableB; 100 pragma index_list(tableA); 0|sqlite_autoindex_tableA_1|1 pragma index_info(sqlite_autoindex_tableA_1); 0|0|c1 1|1|c2 pragma index_list(tableB); 0|sqlite_autoindex_tableB_1|1 pragma index_info(sqlite_autoindex_tableB_1); 0|0|c1 1|1|c2 explain query plan select c1, c2, myagg(*) from ( select * from tableA union all select * from tableB ) group by c1, c2; 2|0|0|SCAN TABLE tableA 3|0|0|SCAN TABLE tableB 1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION ALL) 0|0|0|SCAN SUBQUERY 1 0|0|0|USE TEMP B-TREE FOR GROUP BY Note that we get the same query plan even if we explicitly ask for ordering tableA, tableB on c1, c2 prior to group by. explain query plan select c1, c2, myagg(*) from ( select * from tableA union all select * from tableB order by c1, c2 ) group by c1, c2; 2|0|0|SCAN TABLE tableA USING INDEX sqlite_autoindex_tableA_1 3|0|0|SCAN TABLE tableB USING INDEX sqlite_autoindex_tableB_1 1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION ALL) 0|0|0|SCAN SUBQUERY 1 0|0|0|USE TEMP B-TREE FOR GROUP BY SQLite Query Plan: (1) Add every record of tableA and tableB on a new B-tree Time complexity: O((|tableA| + |tableB|)log(|tableA|+|tableB|)). Space overhead: O(|tableA|+|tableB|). (2) Run through the B-tree to get groups Time complexity: O(|tableA| + |tableB|). Alternative Query Plan: (1) Iterate over both B-trees in synch since they are already sorted (i.e., the way we would merge two already sorted lists using two running pointers, or the behavior you indicate in 3.2 here https://www.sqlite.org/queryplanner.html). Time complexity: O(|tableA|+|tableB|) Space overhead: O(1) Yet in the following queries the plan (seems to) correctly use the primary key index to speed up the order by. explain query plan select c1, c2, prop from ( select * from tableA union all select * from tableB ) order by c1, c2; 1|0|0|SCAN TABLE tableA USING INDEX sqlite_autoindex_tableA_1 2|0|0|SCAN TABLE tableB USING INDEX sqlite_autoindex_tableB_1 0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL) explain query plan select c1, c2, prop from table A order by c1, c2; 0|0|0|SCAN TABLE tableA USING INDEX sqlite_autoindex_tableA_1 Even if one of the tables did not have an index, again the better plan would be to create a temp B-tree for that table and then iterate in synch. Yet sqlite again creates a temp B-tree as shown below. pragma table_info(tableC); 0|c1|int(11)|0||0 1|c2|int(11)|0||0 2|prop|int(11)|0||0 select count(*) from tableC; 100 pragma index_list(tableC); explain query plan select c1, c2, myagg(*) from ( select * from tableA union all select * from tableC ) group by c1, c2; 2|0|0|SCAN TABLE tableA 3|0|0|SCAN TABLE tableC 1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION ALL) 0|0|0|SCAN SUBQUERY 1 0|0|0|USE TEMP B-TREE FOR GROUP BY