Re: [sqlite] Optimisation opportunity on subquery?
On Wednesday, 2 January, 2019 16:58, Jonathan Moules wrote: >Gah, sorry. Another typo. I really should be more awake when I post >to this list. The non-simplified code does have the item_id on the >subquery (otherwise it simply wouldn't execute at all of course). So: >SELECT * > FROM item_info > JOIN ( > select > count(1) as num, > item_id > from users > group by item_id) > USING (item_id) > where item_id = ?; Now perhaps we are getting somewhere. So now what exactly is your complaint? You have asked for the following join to occur: JOIN this table, which you can see by the following command: select * from item_info with the table that you can see by the following command: select count(1) as num, item_id from users group by item_id using the common column item_id in both tables and then to return only the row where the item_id is the specific one provided. So, the ONLY way to solve this is to generate the table on the RHS of the join first, and then join it to the item_info table, and then return only the row you have asked for. This produces the following plan: CREATE TABLE users ( item_id TEXT REFERENCES item_info (item_id) NOT NULL COLLATE NOCASE, some_data TEXT ); CREATE INDEX users__item_id__idx ON users ( item_id ); CREATE TABLE item_info ( item_id TEXT PRIMARY KEY ON CONFLICT IGNORE NOT NULL COLLATE NOCASE, more_data TEXT ); .eqp full SELECT * FROM item_info JOIN ( select count(1) as num, item_id from users group by item_id) USING (item_id) where item_id = ?; QUERY PLAN |--MATERIALIZE 1 | `--SCAN TABLE users USING COVERING INDEX users__item_id__idx (~1048576 rows) |--SEARCH TABLE item_info USING INDEX sqlite_autoindex_item_info_1 (item_id=?) (~1 row) `--SCAN SUBQUERY 1 (~96 rows) addr opcode p1p2p3p4 p5 comment - - -- - 0 Init 0 77000 Start at 77 1 Integer451 000 r[1]=45 2 Once 0 45000 materialize "subquery_1" 3 OpenEphemeral 1 2 000 nColumn=2 4 Noop 3 1 000 5 Integer0 5 000 r[5]=0; clear abort flag 6 Null 0 8 800 r[8..8]=NULL 7 Gosub 7 42000 8 OpenRead 4 3 0 k(2,NOCASE,) 00 root=3 iDb=0; users__item_id__idx 9 ColumnsUsed4 0 0 1 00 10Explain100 0 SCAN TABLE users USING COVERING INDEX users__item_id__idx (~1048576 rows) 00 11Noop 0 0 000 Begin WHERE-loop0: users 12Rewind 4 28100 00 13 Noop 0 0 000 Begin WHERE-core 14 Column 4 0 900 r[9]=users.item_id 15 Compare8 9 1 k(1,NOCASE)00 r[8] <-> r[9] 16 Jump 172117 00 17 Move 9 8 100 r[8]=r[9] 18 Gosub 6 33000 output one row 19 IfPos 5 45000 if r[5]>0 then r[5]-=0, goto 45; check abort flag 20 Gosub 7 42000 reset accumulator 21 Integer1 10000 r[10]=1 22 AggStep0 102 count(1) 01 accum=r[2] step(r[10]) 23 If 4 25000 24 Column 4 0 300 r[3]=users.item_id 25 Integer1 4 000 r[4]=1; indicate data in accumulator 26 Noop 0 0 000 End WHERE-core 27Next 4 13001 28Noop 0 0 000 End WHERE-loop0: users 29Gosub 6 33000 output final row 30Goto 0 45000 31Integer1 5 000 r[5]=1; set abort flag 32Return 6 0 000 33IfPos 4 35000 if r[4]>0 then r[4]-=0, goto 35; Groupby result generator entry point 34Return 6 0 000 35AggFinal 2 1 0 count(1) 00 accum=r[2] N=1 36SCopy 2 11000 r[11]=r[2] 37SCopy 3 12000 r[12]=r[3] 38MakeRecord 112 10 00 r[10]=mkr
Re: [sqlite] Optimisation opportunity on subquery?
Gah, sorry. Another typo. I really should be more awake when I post to this list. The non-simplified code does have the item_id on the subquery (otherwise it simply wouldn't execute at all of course). So: SELECT * FROM item_info JOIN ( select count(1) as num, item_id from users group by item_id) USING (item_id) where item_id = ?; On 2019-01-02 22:56, Keith Medcalf wrote: When you are executing the query: SELECT * FROM item_info JOIN (select count(1) from users group by item_id) USING (item_id) where item_id = ?; You are telling the SQL Database Engine (whatever it may happen to be, in this case SQLite3) that you want to take the table produced by: select * from item_info; and join it (an equijoin) against the table produced by running the query: select count(1) from users group by item_id; based on the equality of the column item_id in both tables. If you run the two queries you will see that the second table DOES NOT produce a column called item_id. Therefore you cannot join those tables and instead you get an error message telling you that the column item_id does not exist in both tables. FROM tables are siblings not correlates. --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. -Original Message- From: sqlite-users [mailto:sqlite-users- boun...@mailinglists.sqlite.org] On Behalf Of Jonathan Moules Sent: Wednesday, 2 January, 2019 15:21 To: sqlite-users@mailinglists.sqlite.org Subject: Re: [sqlite] Optimisation opportunity on subquery? Hi Simon, Thanks for that - I'm always trying to improve my SQL. I think I see what you mean now. Assuming my original query: SELECT * FROM item_info JOIN (select count(1) from users group by item_id) USING (item_id) where item_id = ?; There are three uses of "item_id". * The first is the "group by", which unless my SQL is even worse than I imagine, can only be referencing the users table. * The second is "USING" - which is referencing both. * The third is the one I guess you mean is ambiguous? My thinking was that because item_id is going through the USING it wasn't ambiguous as they're the same thing; though that's apparently not how the query planner sees it, and hence your reference to ambiguity. That right? So I tried using aliases (I'm assuming that removes the ambiguity), but the query times remained at about 0.5s for both versions (whether i.item_id or u.item_id): SELECT * FROM item_info i JOIN (select count(1) from users group by item_id) u USING (item_id) where u.item_id = ?; Thanks again for clarifying, but after checking, it doesn't seem like it was an ambiguity thing. Cheers, Jonathan On 2019-01-02 22:04, Simon Slavin wrote: On 2 Jan 2019, at 9:50pm, Jonathan Moules li...@lightpear.com> wrote: Sorry, but which column is ambiguous? The users.item_id is a foreign key to the item_info.item_id - that's why it's a "REFERENCES" - why would I want to change it to be something else? Isn't the convention for FK's to have the same name across tables? That's what "USING" is for right? (or NATURAL, but I prefer to be explicit.) Happy to be corrected. It may be that our careers developed with different ideas about how to use SQL. You had a JOIN, both tables had a column "item_id", and a reference inside the JOIN to "item_id" would be ambiguous. Since the SQL standard does not make it clear which table would be used, it would be possible for different implementations of SQL to think you meant different tables. I understand that, in your example, the values would be the same. But that doesn't explain to you what the optimizer thinks you're trying to do. The simplest way to tell the optimizer what you need would be to rename one of the columns. You could try both tables, see which solution was faster, and use that one. However, I see other posters have gained better clarity for your problem. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite- users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimisation opportunity on subquery?
When you are executing the query: SELECT * FROM item_info JOIN (select count(1) from users group by item_id) USING (item_id) where item_id = ?; You are telling the SQL Database Engine (whatever it may happen to be, in this case SQLite3) that you want to take the table produced by: select * from item_info; and join it (an equijoin) against the table produced by running the query: select count(1) from users group by item_id; based on the equality of the column item_id in both tables. If you run the two queries you will see that the second table DOES NOT produce a column called item_id. Therefore you cannot join those tables and instead you get an error message telling you that the column item_id does not exist in both tables. FROM tables are siblings not correlates. --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users [mailto:sqlite-users- >boun...@mailinglists.sqlite.org] On Behalf Of Jonathan Moules >Sent: Wednesday, 2 January, 2019 15:21 >To: sqlite-users@mailinglists.sqlite.org >Subject: Re: [sqlite] Optimisation opportunity on subquery? > >Hi Simon, > >Thanks for that - I'm always trying to improve my SQL. I think I see >what you mean now. > >Assuming my original query: > > SELECT > * > FROM > item_info > JOIN (select count(1) from users group by item_id) > USING (item_id) > where item_id = ?; > >There are three uses of "item_id". > >* The first is the "group by", which unless my SQL is even worse than >I >imagine, can only be referencing the users table. > >* The second is "USING" - which is referencing both. > >* The third is the one I guess you mean is ambiguous? My thinking was >that because item_id is going through the USING it wasn't ambiguous >as >they're the same thing; though that's apparently not how the query >planner sees it, and hence your reference to ambiguity. That right? > >So I tried using aliases (I'm assuming that removes the ambiguity), >but >the query times remained at about 0.5s for both versions (whether >i.item_id or u.item_id): > > SELECT > * > FROM > item_info i > JOIN (select count(1) from users group by item_id) u > USING (item_id) > where u.item_id = ?; > >Thanks again for clarifying, but after checking, it doesn't seem like >it >was an ambiguity thing. >Cheers, >Jonathan > >On 2019-01-02 22:04, Simon Slavin wrote: >> On 2 Jan 2019, at 9:50pm, Jonathan Moules li...@lightpear.com> wrote: >> >>> Sorry, but which column is ambiguous? The users.item_id is a >foreign key to the item_info.item_id - that's why it's a "REFERENCES" >- why would I want to change it to be something else? Isn't the >convention for FK's to have the same name across tables? That's what >"USING" is for right? (or NATURAL, but I prefer to be explicit.) >Happy to be corrected. >> It may be that our careers developed with different ideas about how >to use SQL. You had a JOIN, both tables had a column "item_id", and >a reference inside the JOIN to "item_id" would be ambiguous. Since >the SQL standard does not make it clear which table would be used, it >would be possible for different implementations of SQL to think you >meant different tables. >> >> I understand that, in your example, the values would be the same. >But that doesn't explain to you what the optimizer thinks you're >trying to do. The simplest way to tell the optimizer what you need >would be to rename one of the columns. You could try both tables, >see which solution was faster, and use that one. >> >> However, I see other posters have gained better clarity for your >problem. >> >> Simon. >> ___ >> sqlite-users mailing list >> sqlite-users@mailinglists.sqlite.org >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite- >users > > > >___ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimisation opportunity on subquery?
However, if you want to do that then you want to use a correlated subquery as it is designed for retrieving correlated data by running a subquery per result row: select *, (select count(*) from users where item_id = item_info.item_id) as count from item_info where item_id = ?; and not a subselect in the FROM clause. Tables in the "FROM" clause are siblings and exist independently of each other. They are not generators ... (except in some special cases where they are, such as a table valued function). --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users [mailto:sqlite-users- >boun...@mailinglists.sqlite.org] On Behalf Of David Raymond >Sent: Wednesday, 2 January, 2019 13:56 >To: SQLite mailing list >Subject: Re: [sqlite] Optimisation opportunity on subquery? > >Yup, there was a typo, but you guys know what he meant. He's saying >there WHERE could be pushed through the join to the subquery. > > >sqlite> explain query plan select * from item_info join (select >item_id, count(1) from users group by item_id) using (item_id) where >item_id = ?; >QUERY PLAN >|--MATERIALIZE 1 >| `--SCAN TABLE users USING COVERING INDEX user__item_id__idx >|--SEARCH TABLE item_info USING INDEX sqlite_autoindex_item_info_1 >(item_id=?) >`--SCAN SUBQUERY 1 > >sqlite> explain query plan select * from item_info join (select >item_id, count(1) from users where item_id = ?) using (item_id) where >item_id = ?; >QUERY PLAN >|--MATERIALIZE 1 >| `--SEARCH TABLE users USING COVERING INDEX user__item_id__idx >(item_id=?) >|--SEARCH TABLE item_info USING INDEX sqlite_autoindex_item_info_1 >(item_id=?) >`--SCAN SUBQUERY 1 > > >It basically comes down to not doing algebra, and proving you can use >it for each case can be easy for a human, but error prone or long to >do for a computer, especially for non-trivial cases. Here the WHERE >clause is on one field, that field is used in an INNER join to the >subquery, the sub query isn't used anywhere else in the larger query, >the field that matches in the sub query is the group by target and >not an aggregate field, etc. All those had to be true, and probably >more things I'm not thinking of at the moment. So it becomes a not so >trivial thing to identify and use safely. > > >-Original Message- >From: sqlite-users [mailto:sqlite-users- >boun...@mailinglists.sqlite.org] On Behalf Of Simon Slavin >Sent: Wednesday, January 02, 2019 3:34 PM >To: SQLite mailing list >Subject: Re: [sqlite] Optimisation opportunity on subquery? > >On 2 Jan 2019, at 4:44pm, Jonathan Moules li...@lightpear.com> wrote: > >> SELECT >> * >> FROM >> item_info >> JOIN (select count(1) from users group by item_id) >> USING (item_id) >> where item_id = ?; > >You have an ambiguous column name, and I don't think SQLite is doing >what you think it's doing. > >Please change the name of the column users.item_id to something else, >then try your SELECT again. > >Simon. >___ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >___ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimisation opportunity on subquery?
Hi Simon, Thanks for that - I'm always trying to improve my SQL. I think I see what you mean now. Assuming my original query: SELECT * FROM item_info JOIN (select count(1) from users group by item_id) USING (item_id) where item_id = ?; There are three uses of "item_id". * The first is the "group by", which unless my SQL is even worse than I imagine, can only be referencing the users table. * The second is "USING" - which is referencing both. * The third is the one I guess you mean is ambiguous? My thinking was that because item_id is going through the USING it wasn't ambiguous as they're the same thing; though that's apparently not how the query planner sees it, and hence your reference to ambiguity. That right? So I tried using aliases (I'm assuming that removes the ambiguity), but the query times remained at about 0.5s for both versions (whether i.item_id or u.item_id): SELECT * FROM item_info i JOIN (select count(1) from users group by item_id) u USING (item_id) where u.item_id = ?; Thanks again for clarifying, but after checking, it doesn't seem like it was an ambiguity thing. Cheers, Jonathan On 2019-01-02 22:04, Simon Slavin wrote: On 2 Jan 2019, at 9:50pm, Jonathan Moules wrote: Sorry, but which column is ambiguous? The users.item_id is a foreign key to the item_info.item_id - that's why it's a "REFERENCES" - why would I want to change it to be something else? Isn't the convention for FK's to have the same name across tables? That's what "USING" is for right? (or NATURAL, but I prefer to be explicit.) Happy to be corrected. It may be that our careers developed with different ideas about how to use SQL. You had a JOIN, both tables had a column "item_id", and a reference inside the JOIN to "item_id" would be ambiguous. Since the SQL standard does not make it clear which table would be used, it would be possible for different implementations of SQL to think you meant different tables. I understand that, in your example, the values would be the same. But that doesn't explain to you what the optimizer thinks you're trying to do. The simplest way to tell the optimizer what you need would be to rename one of the columns. You could try both tables, see which solution was faster, and use that one. However, I see other posters have gained better clarity for your problem. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimisation opportunity on subquery?
On 2 Jan 2019, at 9:50pm, Jonathan Moules wrote: > Sorry, but which column is ambiguous? The users.item_id is a foreign key to > the item_info.item_id - that's why it's a "REFERENCES" - why would I want to > change it to be something else? Isn't the convention for FK's to have the > same name across tables? That's what "USING" is for right? (or NATURAL, but I > prefer to be explicit.) Happy to be corrected. It may be that our careers developed with different ideas about how to use SQL. You had a JOIN, both tables had a column "item_id", and a reference inside the JOIN to "item_id" would be ambiguous. Since the SQL standard does not make it clear which table would be used, it would be possible for different implementations of SQL to think you meant different tables. I understand that, in your example, the values would be the same. But that doesn't explain to you what the optimizer thinks you're trying to do. The simplest way to tell the optimizer what you need would be to rename one of the columns. You could try both tables, see which solution was faster, and use that one. However, I see other posters have gained better clarity for your problem. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimisation opportunity on subquery?
Sorry, but which column is ambiguous? The users.item_id is a foreign key to the item_info.item_id - that's why it's a "REFERENCES" - why would I want to change it to be something else? Isn't the convention for FK's to have the same name across tables? That's what "USING" is for right? (or NATURAL, but I prefer to be explicit.) Happy to be corrected. I fixed the typo (a stray comma!), and of course, the REFERENCES table has to be second, not first. Sorry, should have checked CREATE TABLE item_info ( item_id TEXT PRIMARY KEY ON CONFLICT IGNORE NOT NULL COLLATE NOCASE, more_data TEXT ); CREATE TABLE users ( item_id TEXT REFERENCES item_info (item_id) NOT NULL COLLATE NOCASE, some_data TEXT ); CREATE INDEX users__item_id__idx ON users ( item_id ); --- In the end to get the desired result I had to invert the query by keeping the WHERE clause inside - this one gets the full speed without needing the WHERE clause twice: SELECT * FROM item_info JOIN (select count(1) from users where item_id = ?) USING (item_id); Anyway, just an observation. Thanks, Jonathan On 2019-01-02 20:33, Simon Slavin wrote: On 2 Jan 2019, at 4:44pm, Jonathan Moules wrote: SELECT * FROM item_info JOIN (select count(1) from users group by item_id) USING (item_id) where item_id = ?; You have an ambiguous column name, and I don't think SQLite is doing what you think it's doing. Please change the name of the column users.item_id to something else, then try your SELECT again. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimisation opportunity on subquery?
Yup, there was a typo, but you guys know what he meant. He's saying there WHERE could be pushed through the join to the subquery. sqlite> explain query plan select * from item_info join (select item_id, count(1) from users group by item_id) using (item_id) where item_id = ?; QUERY PLAN |--MATERIALIZE 1 | `--SCAN TABLE users USING COVERING INDEX user__item_id__idx |--SEARCH TABLE item_info USING INDEX sqlite_autoindex_item_info_1 (item_id=?) `--SCAN SUBQUERY 1 sqlite> explain query plan select * from item_info join (select item_id, count(1) from users where item_id = ?) using (item_id) where item_id = ?; QUERY PLAN |--MATERIALIZE 1 | `--SEARCH TABLE users USING COVERING INDEX user__item_id__idx (item_id=?) |--SEARCH TABLE item_info USING INDEX sqlite_autoindex_item_info_1 (item_id=?) `--SCAN SUBQUERY 1 It basically comes down to not doing algebra, and proving you can use it for each case can be easy for a human, but error prone or long to do for a computer, especially for non-trivial cases. Here the WHERE clause is on one field, that field is used in an INNER join to the subquery, the sub query isn't used anywhere else in the larger query, the field that matches in the sub query is the group by target and not an aggregate field, etc. All those had to be true, and probably more things I'm not thinking of at the moment. So it becomes a not so trivial thing to identify and use safely. -Original Message- From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Simon Slavin Sent: Wednesday, January 02, 2019 3:34 PM To: SQLite mailing list Subject: Re: [sqlite] Optimisation opportunity on subquery? On 2 Jan 2019, at 4:44pm, Jonathan Moules wrote: > SELECT > * > FROM > item_info > JOIN (select count(1) from users group by item_id) > USING (item_id) > where item_id = ?; You have an ambiguous column name, and I don't think SQLite is doing what you think it's doing. Please change the name of the column users.item_id to something else, then try your SELECT again. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimisation opportunity on subquery?
On 2 Jan 2019, at 4:44pm, Jonathan Moules wrote: > SELECT > * > FROM > item_info > JOIN (select count(1) from users group by item_id) > USING (item_id) > where item_id = ?; You have an ambiguous column name, and I don't think SQLite is doing what you think it's doing. Please change the name of the column users.item_id to something else, then try your SELECT again. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimisation opportunity on subquery?
UNABLE TO REPRODUCE. Your CREATE TABLEs do not work (they contain syntax errors). The query you complain about taking a long time does not and cannot work because it is attempting to join two tables using a common column name, that column name NOT being contained in one of the tables (that is, the statement is in error and the error is thrown as soon as you try and prepare the statement). --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users [mailto:sqlite-users- >boun...@mailinglists.sqlite.org] On Behalf Of Jonathan Moules >Sent: Wednesday, 2 January, 2019 09:44 >To: SQLite mailing list >Subject: [sqlite] Optimisation opportunity on subquery? > >Hi List, > >The below seems to my very-non-expert mind like there's scope for >query-plan optimisation. > >I have two tables (simplified below): > >CREATE TABLE users ( > item_id TEXT REFERENCES item_info (item_id) > NOT NULL > COLLATE NOCASE, > some_data TEXT, >); > >CREATE INDEX users__item_id__idx ON users ( > item_id >); > >CREATE TABLE item_info ( > item_id TEXT PRIMARY KEY ON CONFLICT IGNORE > NOT NULL > COLLATE NOCASE, > more_data TEXT >); > >There are about 1 million records in users and 100,000 records in >item_info. > >These queries are all fast, taking about 0.002s: > select * from item_info where item_id = ?; > > select count(1) from users group by item_id; > > select count(1) from users where item_id = ?; > >But when I try and join them together, they're much slower at about >0.5s. > SELECT > * > FROM > item_info > JOIN (select count(1) from users group by item_id) > USING (item_id) > where item_id = ?; > >I kind of expected SQLite would figure out that the outer WHERE >clause >also applies to the subquery given the combination of USING and GROUP >BY >means it has to apply anyway. > >If I explicitly include the WHERE inside the subquery, the EXPLAIN >QUERY >PLAN is identical, but it's back to the expected fast speed (0.002s): > SELECT > * > FROM > item_info > JOIN (select count(1) from users WHERE item_id = ?) > USING (item_id) > where item_id = ?; > >sqlite 3.24.0 > >Cheers, > >Jonathan > > > >___ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Optimisation opportunity on subquery?
Hi List, The below seems to my very-non-expert mind like there's scope for query-plan optimisation. I have two tables (simplified below): CREATE TABLE users ( item_id TEXT REFERENCES item_info (item_id) NOT NULL COLLATE NOCASE, some_data TEXT, ); CREATE INDEX users__item_id__idx ON users ( item_id ); CREATE TABLE item_info ( item_id TEXT PRIMARY KEY ON CONFLICT IGNORE NOT NULL COLLATE NOCASE, more_data TEXT ); There are about 1 million records in users and 100,000 records in item_info. These queries are all fast, taking about 0.002s: select * from item_info where item_id = ?; select count(1) from users group by item_id; select count(1) from users where item_id = ?; But when I try and join them together, they're much slower at about 0.5s. SELECT * FROM item_info JOIN (select count(1) from users group by item_id) USING (item_id) where item_id = ?; I kind of expected SQLite would figure out that the outer WHERE clause also applies to the subquery given the combination of USING and GROUP BY means it has to apply anyway. If I explicitly include the WHERE inside the subquery, the EXPLAIN QUERY PLAN is identical, but it's back to the expected fast speed (0.002s): SELECT * FROM item_info JOIN (select count(1) from users WHERE item_id = ?) USING (item_id) where item_id = ?; sqlite 3.24.0 Cheers, Jonathan ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users