Re: [sqlite] Optimisation opportunity on subquery?

2019-01-03 Thread Keith Medcalf

On Wednesday, 2 January, 2019 16:58, Jonathan Moules 
 wrote:

>Gah, sorry. Another typo. I really should be more awake when I post
>to this list. The non-simplified code does have the item_id on the
>subquery (otherwise it simply wouldn't execute at all of course). So:

>SELECT *
>   FROM item_info
>   JOIN (
> select
>   count(1) as num,
>   item_id
> from users
> group by item_id)
>  USING (item_id)
>  where item_id = ?;

Now perhaps we are getting somewhere.  So now what exactly is your complaint?  
You have asked for the following join to occur:

JOIN this table, which you can see by the following command:
  select * from item_info
with the table that you can see by the following command:
  select count(1) as num, item_id from users group by item_id
using the common column item_id in both tables
and then to return only the row where the item_id is the specific one provided.

So, the ONLY way to solve this is to generate the table on the RHS of the join 
first, and then join it to the item_info table, and then return only the row 
you have asked for.  This produces the following plan:

CREATE TABLE users (
 item_id   TEXT REFERENCES item_info (item_id)
   NOT NULL
   COLLATE NOCASE,
 some_data TEXT
);

CREATE INDEX users__item_id__idx ON users (
 item_id
);

CREATE TABLE item_info (
 item_id   TEXT PRIMARY KEY ON CONFLICT IGNORE
   NOT NULL
   COLLATE NOCASE,
 more_data TEXT
);
.eqp full
SELECT *
   FROM item_info
   JOIN (
 select
   count(1) as num,
   item_id
 from users
 group by item_id)
  USING (item_id)
  where item_id = ?;
QUERY PLAN
|--MATERIALIZE 1
|  `--SCAN TABLE users USING COVERING INDEX users__item_id__idx (~1048576 rows)
|--SEARCH TABLE item_info USING INDEX sqlite_autoindex_item_info_1 (item_id=?) 
(~1 row)
`--SCAN SUBQUERY 1 (~96 rows)
addr  opcode p1p2p3p4 p5  comment
  -        -  --  -
0 Init   0 77000  Start at 77
1 Integer451 000  r[1]=45
2 Once   0 45000  materialize 
"subquery_1"
3 OpenEphemeral  1 2 000  nColumn=2
4 Noop   3 1 000
5 Integer0 5 000  r[5]=0; clear abort 
flag
6 Null   0 8 800  r[8..8]=NULL
7 Gosub  7 42000
8 OpenRead   4 3 0 k(2,NOCASE,)   00  root=3 iDb=0; 
users__item_id__idx
9 ColumnsUsed4 0 0 1  00
10Explain100 0 SCAN TABLE users USING COVERING INDEX 
users__item_id__idx (~1048576 rows)  00
11Noop   0 0 000  Begin WHERE-loop0: 
users
12Rewind 4 28100  00
13  Noop   0 0 000  Begin WHERE-core
14  Column 4 0 900  r[9]=users.item_id
15  Compare8 9 1 k(1,NOCASE)00  r[8] <-> r[9]
16  Jump   172117   00
17  Move   9 8 100  r[8]=r[9]
18  Gosub  6 33000  output one row
19  IfPos  5 45000  if r[5]>0 then 
r[5]-=0, goto 45; check abort flag
20  Gosub  7 42000  reset accumulator
21  Integer1 10000  r[10]=1
22  AggStep0 102 count(1)   01  accum=r[2] 
step(r[10])
23  If 4 25000
24  Column 4 0 300  r[3]=users.item_id
25  Integer1 4 000  r[4]=1; indicate 
data in accumulator
26  Noop   0 0 000  End WHERE-core
27Next   4 13001
28Noop   0 0 000  End WHERE-loop0: users
29Gosub  6 33000  output final row
30Goto   0 45000
31Integer1 5 000  r[5]=1; set abort flag
32Return 6 0 000
33IfPos  4 35000  if r[4]>0 then 
r[4]-=0, goto 35; Groupby result generator entry point
34Return 6 0 000
35AggFinal   2 1 0 count(1)   00  accum=r[2] N=1
36SCopy  2 11000  r[11]=r[2]
37SCopy  3 12000  r[12]=r[3]
38MakeRecord 112 10   00  r[10]=mkr

Re: [sqlite] Optimisation opportunity on subquery?

2019-01-02 Thread Jonathan Moules
Gah, sorry. Another typo. I really should be more awake when I post to 
this list. The non-simplified code does have the item_id on the subquery 
(otherwise it simply wouldn't execute at all of course). So:


SELECT *
  FROM item_info
  JOIN (
select
  count(1) as num,
  item_id
from users
group by item_id)
 USING (item_id)
 where item_id = ?;



On 2019-01-02 22:56, Keith Medcalf wrote:

When you are executing the query:

SELECT *
   FROM item_info
   JOIN (select count(1)
   from users
   group by item_id)
  USING (item_id)
  where item_id = ?;

You are telling the SQL Database Engine (whatever it may happen to be, in this 
case SQLite3) that you want to take the table produced by:

select * from item_info;

and join it (an equijoin) against the table produced by running the query:

select count(1) from users group by item_id;

based on the equality of the column item_id in both tables.  If you run the two 
queries you will see that the second table DOES NOT produce a column called 
item_id.  Therefore you cannot join those tables and instead you get an error 
message telling you that the column item_id does not exist in both tables.

FROM tables are siblings not correlates.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.



-Original Message-
From: sqlite-users [mailto:sqlite-users-
boun...@mailinglists.sqlite.org] On Behalf Of Jonathan Moules
Sent: Wednesday, 2 January, 2019 15:21
To: sqlite-users@mailinglists.sqlite.org
Subject: Re: [sqlite] Optimisation opportunity on subquery?

Hi Simon,

Thanks for that - I'm always trying to improve my SQL. I think I see
what you mean now.

Assuming my original query:

     SELECT
         *
     FROM
         item_info
     JOIN (select count(1) from users group by item_id)
     USING (item_id)
     where item_id = ?;

There are three uses of "item_id".

* The first is the "group by", which unless my SQL is even worse than
I
imagine, can only be referencing the users table.

* The second is "USING" - which is referencing both.

* The third is the one I guess you mean is ambiguous? My thinking was
that because item_id is going through the USING it wasn't ambiguous
as
they're the same thing; though that's apparently not how the query
planner sees it, and hence your reference to ambiguity. That right?

So I tried using aliases (I'm assuming that removes the ambiguity),
but
the query times remained at about 0.5s for both versions (whether
i.item_id or u.item_id):

     SELECT
         *
     FROM
         item_info i
     JOIN (select count(1) from users group by item_id) u
     USING (item_id)
     where u.item_id = ?;

Thanks again for clarifying, but after checking, it doesn't seem like
it
was an ambiguity thing.
Cheers,
Jonathan

On 2019-01-02 22:04, Simon Slavin wrote:

On 2 Jan 2019, at 9:50pm, Jonathan Moules 
li...@lightpear.com> wrote:

Sorry, but which column is ambiguous? The users.item_id is a

foreign key to the item_info.item_id - that's why it's a "REFERENCES"
- why would I want to change it to be something else? Isn't the
convention for FK's to have the same name across tables? That's what
"USING" is for right? (or NATURAL, but I prefer to be explicit.)
Happy to be corrected.

It may be that our careers developed with different ideas about how

to use SQL.  You had a JOIN, both tables had a column "item_id", and
a reference inside the JOIN to "item_id" would be ambiguous.  Since
the SQL standard does not make it clear which table would be used, it
would be possible for different implementations of SQL to think you
meant different tables.

I understand that, in your example, the values would be the same.

But that doesn't explain to you what the optimizer thinks you're
trying to do.  The simplest way to tell the optimizer what you need
would be to rename one of the columns.  You could try both tables,
see which solution was faster, and use that one.

However, I see other posters have gained better clarity for your

problem.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-

users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimisation opportunity on subquery?

2019-01-02 Thread Keith Medcalf

When you are executing the query:

SELECT *
  FROM item_info
  JOIN (select count(1) 
  from users 
  group by item_id)
 USING (item_id)
 where item_id = ?;

You are telling the SQL Database Engine (whatever it may happen to be, in this 
case SQLite3) that you want to take the table produced by:

select * from item_info;

and join it (an equijoin) against the table produced by running the query:

select count(1) from users group by item_id;

based on the equality of the column item_id in both tables.  If you run the two 
queries you will see that the second table DOES NOT produce a column called 
item_id.  Therefore you cannot join those tables and instead you get an error 
message telling you that the column item_id does not exist in both tables.

FROM tables are siblings not correlates.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Jonathan Moules
>Sent: Wednesday, 2 January, 2019 15:21
>To: sqlite-users@mailinglists.sqlite.org
>Subject: Re: [sqlite] Optimisation opportunity on subquery?
>
>Hi Simon,
>
>Thanks for that - I'm always trying to improve my SQL. I think I see
>what you mean now.
>
>Assuming my original query:
>
>     SELECT
>         *
>     FROM
>         item_info
>     JOIN (select count(1) from users group by item_id)
>     USING (item_id)
>     where item_id = ?;
>
>There are three uses of "item_id".
>
>* The first is the "group by", which unless my SQL is even worse than
>I
>imagine, can only be referencing the users table.
>
>* The second is "USING" - which is referencing both.
>
>* The third is the one I guess you mean is ambiguous? My thinking was
>that because item_id is going through the USING it wasn't ambiguous
>as
>they're the same thing; though that's apparently not how the query
>planner sees it, and hence your reference to ambiguity. That right?
>
>So I tried using aliases (I'm assuming that removes the ambiguity),
>but
>the query times remained at about 0.5s for both versions (whether
>i.item_id or u.item_id):
>
>     SELECT
>         *
>     FROM
>         item_info i
>     JOIN (select count(1) from users group by item_id) u
>     USING (item_id)
>     where u.item_id = ?;
>
>Thanks again for clarifying, but after checking, it doesn't seem like
>it
>was an ambiguity thing.
>Cheers,
>Jonathan
>
>On 2019-01-02 22:04, Simon Slavin wrote:
>> On 2 Jan 2019, at 9:50pm, Jonathan Moules li...@lightpear.com> wrote:
>>
>>> Sorry, but which column is ambiguous? The users.item_id is a
>foreign key to the item_info.item_id - that's why it's a "REFERENCES"
>- why would I want to change it to be something else? Isn't the
>convention for FK's to have the same name across tables? That's what
>"USING" is for right? (or NATURAL, but I prefer to be explicit.)
>Happy to be corrected.
>> It may be that our careers developed with different ideas about how
>to use SQL.  You had a JOIN, both tables had a column "item_id", and
>a reference inside the JOIN to "item_id" would be ambiguous.  Since
>the SQL standard does not make it clear which table would be used, it
>would be possible for different implementations of SQL to think you
>meant different tables.
>>
>> I understand that, in your example, the values would be the same.
>But that doesn't explain to you what the optimizer thinks you're
>trying to do.  The simplest way to tell the optimizer what you need
>would be to rename one of the columns.  You could try both tables,
>see which solution was faster, and use that one.
>>
>> However, I see other posters have gained better clarity for your
>problem.
>>
>> Simon.
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
>users
>
>
>
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimisation opportunity on subquery?

2019-01-02 Thread Keith Medcalf

However, if you want to do that then you want to use a correlated subquery as 
it is designed for retrieving correlated data by running a subquery per result 
row:

select *,
   (select count(*) from users where item_id = item_info.item_id) as count
  from item_info
 where item_id = ?;

and not a subselect in the FROM clause.  Tables in the "FROM" clause are 
siblings and exist independently of each other.  They are not generators ... 
(except in some special cases where they are, such as a table valued function).


---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of David Raymond
>Sent: Wednesday, 2 January, 2019 13:56
>To: SQLite mailing list
>Subject: Re: [sqlite] Optimisation opportunity on subquery?
>
>Yup, there was a typo, but you guys know what he meant. He's saying
>there WHERE could be pushed through the join to the subquery.
>
>
>sqlite> explain query plan select * from item_info join (select
>item_id, count(1) from users group by item_id) using (item_id) where
>item_id = ?;
>QUERY PLAN
>|--MATERIALIZE 1
>|  `--SCAN TABLE users USING COVERING INDEX user__item_id__idx
>|--SEARCH TABLE item_info USING INDEX sqlite_autoindex_item_info_1
>(item_id=?)
>`--SCAN SUBQUERY 1
>
>sqlite> explain query plan select * from item_info join (select
>item_id, count(1) from users where item_id = ?) using (item_id) where
>item_id = ?;
>QUERY PLAN
>|--MATERIALIZE 1
>|  `--SEARCH TABLE users USING COVERING INDEX user__item_id__idx
>(item_id=?)
>|--SEARCH TABLE item_info USING INDEX sqlite_autoindex_item_info_1
>(item_id=?)
>`--SCAN SUBQUERY 1
>
>
>It basically comes down to not doing algebra, and proving you can use
>it for each case can be easy for a human, but error prone or long to
>do for a computer, especially for non-trivial cases. Here the WHERE
>clause is on one field, that field is used in an INNER join to the
>subquery, the sub query isn't used anywhere else in the larger query,
>the field that matches in the sub query is the group by target and
>not an aggregate field, etc. All those had to be true, and probably
>more things I'm not thinking of at the moment. So it becomes a not so
>trivial thing to identify and use safely.
>
>
>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Simon Slavin
>Sent: Wednesday, January 02, 2019 3:34 PM
>To: SQLite mailing list
>Subject: Re: [sqlite] Optimisation opportunity on subquery?
>
>On 2 Jan 2019, at 4:44pm, Jonathan Moules li...@lightpear.com> wrote:
>
>> SELECT
>> *
>> FROM
>> item_info
>> JOIN (select count(1) from users group by item_id)
>> USING (item_id)
>> where item_id = ?;
>
>You have an ambiguous column name, and I don't think SQLite is doing
>what you think it's doing.
>
>Please change the name of the column users.item_id to something else,
>then try your SELECT again.
>
>Simon.
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimisation opportunity on subquery?

2019-01-02 Thread Jonathan Moules

Hi Simon,

Thanks for that - I'm always trying to improve my SQL. I think I see 
what you mean now.


Assuming my original query:

    SELECT
        *
    FROM
        item_info
    JOIN (select count(1) from users group by item_id)
    USING (item_id)
    where item_id = ?;

There are three uses of "item_id".

* The first is the "group by", which unless my SQL is even worse than I 
imagine, can only be referencing the users table.


* The second is "USING" - which is referencing both.

* The third is the one I guess you mean is ambiguous? My thinking was 
that because item_id is going through the USING it wasn't ambiguous as 
they're the same thing; though that's apparently not how the query 
planner sees it, and hence your reference to ambiguity. That right?


So I tried using aliases (I'm assuming that removes the ambiguity), but 
the query times remained at about 0.5s for both versions (whether 
i.item_id or u.item_id):


    SELECT
        *
    FROM
        item_info i
    JOIN (select count(1) from users group by item_id) u
    USING (item_id)
    where u.item_id = ?;

Thanks again for clarifying, but after checking, it doesn't seem like it 
was an ambiguity thing.

Cheers,
Jonathan

On 2019-01-02 22:04, Simon Slavin wrote:

On 2 Jan 2019, at 9:50pm, Jonathan Moules  wrote:


Sorry, but which column is ambiguous? The users.item_id is a foreign key to the item_info.item_id - 
that's why it's a "REFERENCES" - why would I want to change it to be something else? 
Isn't the convention for FK's to have the same name across tables? That's what "USING" is 
for right? (or NATURAL, but I prefer to be explicit.) Happy to be corrected.

It may be that our careers developed with different ideas about how to use SQL.  You had a JOIN, 
both tables had a column "item_id", and a reference inside the JOIN to 
"item_id" would be ambiguous.  Since the SQL standard does not make it clear which table 
would be used, it would be possible for different implementations of SQL to think you meant 
different tables.

I understand that, in your example, the values would be the same.  But that 
doesn't explain to you what the optimizer thinks you're trying to do.  The 
simplest way to tell the optimizer what you need would be to rename one of the 
columns.  You could try both tables, see which solution was faster, and use 
that one.

However, I see other posters have gained better clarity for your problem.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimisation opportunity on subquery?

2019-01-02 Thread Simon Slavin
On 2 Jan 2019, at 9:50pm, Jonathan Moules  wrote:

> Sorry, but which column is ambiguous? The users.item_id is a foreign key to 
> the item_info.item_id - that's why it's a "REFERENCES" - why would I want to 
> change it to be something else? Isn't the convention for FK's to have the 
> same name across tables? That's what "USING" is for right? (or NATURAL, but I 
> prefer to be explicit.) Happy to be corrected.

It may be that our careers developed with different ideas about how to use SQL. 
 You had a JOIN, both tables had a column "item_id", and a reference inside the 
JOIN to "item_id" would be ambiguous.  Since the SQL standard does not make it 
clear which table would be used, it would be possible for different 
implementations of SQL to think you meant different tables.

I understand that, in your example, the values would be the same.  But that 
doesn't explain to you what the optimizer thinks you're trying to do.  The 
simplest way to tell the optimizer what you need would be to rename one of the 
columns.  You could try both tables, see which solution was faster, and use 
that one.

However, I see other posters have gained better clarity for your problem.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimisation opportunity on subquery?

2019-01-02 Thread Jonathan Moules
Sorry, but which column is ambiguous? The users.item_id is a foreign key 
to the item_info.item_id - that's why it's a "REFERENCES" - why would I 
want to change it to be something else? Isn't the convention for FK's to 
have the same name across tables? That's what "USING" is for right? (or 
NATURAL, but I prefer to be explicit.) Happy to be corrected.



I fixed the typo (a stray comma!), and of course, the REFERENCES table 
has to be second, not first. Sorry, should have checked


CREATE TABLE item_info (

item_id TEXT PRIMARY KEY ON CONFLICT IGNORE

NOT NULL

COLLATE NOCASE,

more_data TEXT

);


CREATE TABLE users (

item_id TEXT REFERENCES item_info (item_id)

NOT NULL

COLLATE NOCASE,

some_data TEXT

);


CREATE INDEX users__item_id__idx ON users (

item_id

);


---


In the end to get the desired result I had to invert the query by 
keeping the WHERE clause inside - this one gets the full speed without 
needing the WHERE clause twice:


SELECT
*
FROM
item_info
JOIN (select count(1) from users where item_id = ?)
USING (item_id);


Anyway, just an observation.


Thanks,
Jonathan


On 2019-01-02 20:33, Simon Slavin wrote:

On 2 Jan 2019, at 4:44pm, Jonathan Moules  wrote:


 SELECT
 *
 FROM
 item_info
 JOIN (select count(1) from users group by item_id)
 USING (item_id)
 where item_id = ?;

You have an ambiguous column name, and I don't think SQLite is doing what you 
think it's doing.

Please change the name of the column users.item_id to something else, then try 
your SELECT again.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimisation opportunity on subquery?

2019-01-02 Thread David Raymond
Yup, there was a typo, but you guys know what he meant. He's saying there WHERE 
could be pushed through the join to the subquery.


sqlite> explain query plan select * from item_info join (select item_id, 
count(1) from users group by item_id) using (item_id) where item_id = ?;
QUERY PLAN
|--MATERIALIZE 1
|  `--SCAN TABLE users USING COVERING INDEX user__item_id__idx
|--SEARCH TABLE item_info USING INDEX sqlite_autoindex_item_info_1 (item_id=?)
`--SCAN SUBQUERY 1

sqlite> explain query plan select * from item_info join (select item_id, 
count(1) from users where item_id = ?) using (item_id) where item_id = ?;
QUERY PLAN
|--MATERIALIZE 1
|  `--SEARCH TABLE users USING COVERING INDEX user__item_id__idx (item_id=?)
|--SEARCH TABLE item_info USING INDEX sqlite_autoindex_item_info_1 (item_id=?)
`--SCAN SUBQUERY 1


It basically comes down to not doing algebra, and proving you can use it for 
each case can be easy for a human, but error prone or long to do for a 
computer, especially for non-trivial cases. Here the WHERE clause is on one 
field, that field is used in an INNER join to the subquery, the sub query isn't 
used anywhere else in the larger query, the field that matches in the sub query 
is the group by target and not an aggregate field, etc. All those had to be 
true, and probably more things I'm not thinking of at the moment. So it becomes 
a not so trivial thing to identify and use safely.


-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of Simon Slavin
Sent: Wednesday, January 02, 2019 3:34 PM
To: SQLite mailing list
Subject: Re: [sqlite] Optimisation opportunity on subquery?

On 2 Jan 2019, at 4:44pm, Jonathan Moules  wrote:

> SELECT
> *
> FROM
> item_info
> JOIN (select count(1) from users group by item_id)
> USING (item_id)
> where item_id = ?;

You have an ambiguous column name, and I don't think SQLite is doing what you 
think it's doing.

Please change the name of the column users.item_id to something else, then try 
your SELECT again.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimisation opportunity on subquery?

2019-01-02 Thread Simon Slavin
On 2 Jan 2019, at 4:44pm, Jonathan Moules  wrote:

> SELECT
> *
> FROM
> item_info
> JOIN (select count(1) from users group by item_id)
> USING (item_id)
> where item_id = ?;

You have an ambiguous column name, and I don't think SQLite is doing what you 
think it's doing.

Please change the name of the column users.item_id to something else, then try 
your SELECT again.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimisation opportunity on subquery?

2019-01-02 Thread Keith Medcalf

UNABLE TO REPRODUCE.  

Your CREATE TABLEs do not work (they contain syntax errors).  

The query you complain about taking a long time does not and cannot work 
because it is attempting to join two tables using a common column name, that 
column name NOT being contained in one of the tables (that is, the statement is 
in error and the error is thrown as soon as you try and prepare the statement).

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Jonathan Moules
>Sent: Wednesday, 2 January, 2019 09:44
>To: SQLite mailing list
>Subject: [sqlite] Optimisation opportunity on subquery?
>
>Hi List,
>
>The below seems to my very-non-expert mind like there's scope for
>query-plan optimisation.
>
>I have two tables (simplified below):
>
>CREATE TABLE users (
>     item_id   TEXT REFERENCES item_info (item_id)
>   NOT NULL
>   COLLATE NOCASE,
>     some_data TEXT,
>);
>
>CREATE INDEX users__item_id__idx ON users (
>     item_id
>);
>
>CREATE TABLE item_info (
>     item_id   TEXT PRIMARY KEY ON CONFLICT IGNORE
>   NOT NULL
>   COLLATE NOCASE,
>     more_data     TEXT
>);
>
>There are about 1 million records in users and 100,000 records in
>item_info.
>
>These queries are all fast, taking about 0.002s:
>     select * from item_info where item_id = ?;
>
>     select count(1) from users group by item_id;
>
>     select count(1) from users where item_id = ?;
>
>But when I try and join them together, they're much slower at about
>0.5s.
>     SELECT
>         *
>     FROM
>         item_info
>     JOIN (select count(1) from users group by item_id)
>     USING (item_id)
>     where item_id = ?;
>
>I kind of expected SQLite would figure out that the outer WHERE
>clause
>also applies to the subquery given the combination of USING and GROUP
>BY
>means it has to apply anyway.
>
>If I explicitly include the WHERE inside the subquery, the EXPLAIN
>QUERY
>PLAN is identical, but it's back to the expected fast speed (0.002s):
>     SELECT
>         *
>     FROM
>         item_info
>     JOIN (select count(1) from users WHERE item_id = ?)
>     USING (item_id)
>     where item_id = ?;
>
>sqlite 3.24.0
>
>Cheers,
>
>Jonathan
>
>
>
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Optimisation opportunity on subquery?

2019-01-02 Thread Jonathan Moules

Hi List,

The below seems to my very-non-expert mind like there's scope for 
query-plan optimisation.


I have two tables (simplified below):

CREATE TABLE users (
    item_id   TEXT REFERENCES item_info (item_id)
  NOT NULL
  COLLATE NOCASE,
    some_data TEXT,
);

CREATE INDEX users__item_id__idx ON users (
    item_id
);

CREATE TABLE item_info (
    item_id   TEXT PRIMARY KEY ON CONFLICT IGNORE
  NOT NULL
  COLLATE NOCASE,
    more_data     TEXT
);

There are about 1 million records in users and 100,000 records in item_info.

These queries are all fast, taking about 0.002s:
    select * from item_info where item_id = ?;

    select count(1) from users group by item_id;

    select count(1) from users where item_id = ?;

But when I try and join them together, they're much slower at about 0.5s.
    SELECT
        *
    FROM
        item_info
    JOIN (select count(1) from users group by item_id)
    USING (item_id)
    where item_id = ?;

I kind of expected SQLite would figure out that the outer WHERE clause 
also applies to the subquery given the combination of USING and GROUP BY 
means it has to apply anyway.


If I explicitly include the WHERE inside the subquery, the EXPLAIN QUERY 
PLAN is identical, but it's back to the expected fast speed (0.002s):

    SELECT
        *
    FROM
        item_info
    JOIN (select count(1) from users WHERE item_id = ?)
    USING (item_id)
    where item_id = ?;

sqlite 3.24.0

Cheers,

Jonathan



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users