Re: [sqlite] query performance question

2010-10-05 Thread Simon Slavin

On 5 Oct 2010, at 1:14pm, Igor Tandetnik wrote:

> Simon Slavin  wrote:
>> On 5 Oct 2010, at 10:43am, Mail wrote:
>> 
>>> SELECT COUNT(DISTINCT t0.Z_PK) FROM ZFFFOOD t0 WHERE (t0.ZFOODGROUP = ? AND 
>>> (SELECT COUNT(*) FROM ZFFWEIGHT t1 WHERE (t0.Z_PK =
>>> t1.ZFOOD) ) > ?) 
>> 
>> Did you miss out 'AS' from your FROM clause between ZFFFOOD and t0 ?  And 
>> again in the sub-SELECT ?
> 
> AS keyword is always optional.

I just noticed that on the syntax diagram.  Here's me trying to make my code 
understandable but SQL doesn't care.  Bah.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] query performance question

2010-10-05 Thread Igor Tandetnik
Simon Slavin  wrote:
> On 5 Oct 2010, at 10:43am, Mail wrote:
> 
>> SELECT COUNT(DISTINCT t0.Z_PK) FROM ZFFFOOD t0 WHERE (t0.ZFOODGROUP = ? AND 
>> (SELECT COUNT(*) FROM ZFFWEIGHT t1 WHERE (t0.Z_PK =
>> t1.ZFOOD) ) > ?) 
> 
> Did you miss out 'AS' from your FROM clause between ZFFFOOD and t0 ?  And 
> again in the sub-SELECT ?

AS keyword is always optional.
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] query performance question

2010-10-05 Thread Simon Slavin

On 5 Oct 2010, at 10:43am, Mail wrote:

> SELECT COUNT(DISTINCT t0.Z_PK) FROM ZFFFOOD t0 WHERE (t0.ZFOODGROUP = ? AND 
> (SELECT COUNT(*) FROM ZFFWEIGHT t1 WHERE (t0.Z_PK = t1.ZFOOD) ) > ?)

Did you miss out 'AS' from your FROM clause between ZFFFOOD and t0 ?  And again 
in the sub-SELECT ?

Looking at your WHERE clause you have

't0.ZFOODGROUP = ?'

and I can see an index on the ZFOODGROUP column, so that should be okay.  So I 
suspect that the time-suck is the sub-SELECT.  Can you rephrase the main SELECT 
so it has the appropriate kind of JOIN instead of a sub-SELECT ?

But I think your biggest problem is that COUNT(*) is extremely slow because it 
can't use indexes.  Could you put a field in ZFFFOOD which tells you how many 
groups that ZFFFOOD has ?  I know it raises problems of database integrity but 
if you need to do this operation a lot it's going to speed it up tremendously.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] query performance question

2010-10-05 Thread Sven
Thank you very much! Your query is exactly what I was looking for.

If I understand you correctly, the main bottleneck is the count in the inner 
select - is that correct? (disregarding the overall loop for all groups)


Am 05.10.2010 um 12:24 schrieb Drake Wilson:

> Quoth Mail , on 2010-10-05 11:43:29 +0200:
>> SELECT COUNT(DISTINCT t0.Z_PK) FROM ZFFFOOD t0 WHERE (t0.ZFOODGROUP
>> = ? AND (SELECT COUNT(*) FROM ZFFWEIGHT t1 WHERE (t0.Z_PK =
>> t1.ZFOOD) ) > ?)
> 
> There should not need to be a DISTINCT when talking about a primary
> key.  They will already be distinct by definition, no?  (I'm assuming
> your primary keys will not be null, even though you don't have a NOT
> NULL constraint on the column.)
> 
> Aside from that, if you want "at least one", EXISTS is the obvious
> operator to use rather than first requesting the count (and having to
> iterate all the rows even though only the first one is needed to
> demonstrate existence.)
> 
>> My current task is to get the number of foods that belong to each
>> group and have at least one weight data related to them.
> 
> That suggests something like:
> 
>  SELECT g.Z_PK AS "group", COUNT(f.Z_PK) AS "count"
>FROM ZFFFOODGROUP g
> INNER JOIN ZFFFOOD f ON f.ZFOODGROUP = g.Z_PK
>   WHERE EXISTS (SELECT w.Z_PK FROM ZFFWEIGHT w WHERE w.ZFOOD = f.Z_PK)
>   GROUP BY g.Z_PK;
> 
> (I think the WHERE EXISTS could be done with another join, but it
> would probably be more convoluted.)
> 
>> When inserting 12 for ZFOODGROUP and 0 for count (I never check for
>> another count value here, it's always 0), I get a result of 761
>> which takes 0.0591 seconds. As I have to fetch the counts of all
>> available groups, this query is sent in a loop which easily takes
>> several seconds to finish.
> 
> Do the loop in the query itself using GROUP BY instead, if you can,
> rather than issuing several queries.
> 
> My local, unrigorous measurements suggest that on my machine, with the
> provided example database, your original query takes 40 ms of CPU per
> run and the above takes 16 ms total; I imagine the use of GROUP BY and
> EXISTS and the lack of the extra DISTINCT are the primary factors, but
> I haven't checked thoroughly enough to say so confidently.  I'm using
> SQLite 3.7.2 on Debian GNU/Linux sid AMD64.
> 
>   ---> Drake Wilson
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] query performance question

2010-10-05 Thread Igor Tandetnik
Mail  wrote:
> My current task is to get the number of foods that belong to each group and 
> have at least one weight data related to them.
> 
> The query I am using for this is:
> SELECT COUNT(DISTINCT t0.Z_PK) FROM ZFFFOOD t0 WHERE (t0.ZFOODGROUP = ? AND 
> (SELECT COUNT(*) FROM ZFFWEIGHT t1 WHERE (t0.Z_PK =
> t1.ZFOOD) ) > ?) 
> 
> When inserting 12 for ZFOODGROUP and 0 for count (I never check for another 
> count value here, it's always 0), I get a result of
> 761 which takes 0.0591 seconds. As I have to fetch the counts of all 
> available groups, this query is sent in a loop which easily
> takes several seconds to finish.  

select ZFOODGROUP, count(Z_PK) from ZFFFOOD
where Z_PK in (select ZFOOD from ZFFWEIGHT)
group by ZFOODGROUP;

This gives you counts for all groups at once.

Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] query performance question

2010-10-05 Thread Drake Wilson
Quoth Drake Wilson , on 2010-10-05 03:24:01 -0700:
> > My current task is to get the number of foods that belong to each
> > group and have at least one weight data related to them.
> 
> That suggests something like:
> 
>   SELECT g.Z_PK AS "group", COUNT(f.Z_PK) AS "count"
> FROM ZFFFOODGROUP g
>  INNER JOIN ZFFFOOD f ON f.ZFOODGROUP = g.Z_PK
>WHERE EXISTS (SELECT w.Z_PK FROM ZFFWEIGHT w WHERE w.ZFOOD = f.Z_PK)
>GROUP BY g.Z_PK;

Oh---actually I may have misinterpreted your semantics slightly.  This
will only find groups that have at least one such row.  If you want to
find groups with no such foods and get an explicit zero as a result,
you would have to change that to a LEFT JOIN and then move the EXISTS
somewhere else to ensure at least one row for the zero-result groups.
I'd probably drop the WHERE, then replace the COUNT with SUM(EXISTS
(SELECT ...)) because the EXISTS will return 1 for rows where the
subselect returns non-empty, though there might be a better way to do
that.

   ---> Drake Wilson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] query performance question

2010-10-05 Thread Drake Wilson
Quoth Mail , on 2010-10-05 11:43:29 +0200:
> SELECT COUNT(DISTINCT t0.Z_PK) FROM ZFFFOOD t0 WHERE (t0.ZFOODGROUP
> = ? AND (SELECT COUNT(*) FROM ZFFWEIGHT t1 WHERE (t0.Z_PK =
> t1.ZFOOD) ) > ?)

There should not need to be a DISTINCT when talking about a primary
key.  They will already be distinct by definition, no?  (I'm assuming
your primary keys will not be null, even though you don't have a NOT
NULL constraint on the column.)

Aside from that, if you want "at least one", EXISTS is the obvious
operator to use rather than first requesting the count (and having to
iterate all the rows even though only the first one is needed to
demonstrate existence.)

> My current task is to get the number of foods that belong to each
> group and have at least one weight data related to them.

That suggests something like:

  SELECT g.Z_PK AS "group", COUNT(f.Z_PK) AS "count"
FROM ZFFFOODGROUP g
 INNER JOIN ZFFFOOD f ON f.ZFOODGROUP = g.Z_PK
   WHERE EXISTS (SELECT w.Z_PK FROM ZFFWEIGHT w WHERE w.ZFOOD = f.Z_PK)
   GROUP BY g.Z_PK;

(I think the WHERE EXISTS could be done with another join, but it
would probably be more convoluted.)

> When inserting 12 for ZFOODGROUP and 0 for count (I never check for
> another count value here, it's always 0), I get a result of 761
> which takes 0.0591 seconds. As I have to fetch the counts of all
> available groups, this query is sent in a loop which easily takes
> several seconds to finish.

Do the loop in the query itself using GROUP BY instead, if you can,
rather than issuing several queries.

My local, unrigorous measurements suggest that on my machine, with the
provided example database, your original query takes 40 ms of CPU per
run and the above takes 16 ms total; I imagine the use of GROUP BY and
EXISTS and the lack of the extra DISTINCT are the primary factors, but
I haven't checked thoroughly enough to say so confidently.  I'm using
SQLite 3.7.2 on Debian GNU/Linux sid AMD64.

   ---> Drake Wilson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] query performance question

2010-10-05 Thread Mail
Hi everybody,

after reading some parts of the documentation and numerous Google searches, I 
still have no solution for fixing my slow query. I would really appreciate if 
you could point me in the right direction.

The basis is a food database (~38 megabyte total size) that contains 
groups/categories, foods themselves and nutrition information.

The database schema:
CREATE TABLE ZFFFOOD ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER, Z_OPT INTEGER, 
ZFOODGROUP INTEGER, ZSHORTDESC VARCHAR, ZLONGDESC VARCHAR );
CREATE TABLE ZFFFOODGROUP ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER, Z_OPT 
INTEGER, ZLONGDESC VARCHAR, ZICON VARCHAR );
CREATE TABLE ZFFFOODNUTRIENT ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER, Z_OPT 
INTEGER, ZFOOD INTEGER, ZNUTRIENT INTEGER, ZAMOUNTINHUNDREDGRAM FLOAT );
CREATE TABLE ZFFNUTRIENT ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER, Z_OPT 
INTEGER, ZTAGNAME VARCHAR, ZUNITS VARCHAR, ZLONGDESC VARCHAR );
CREATE TABLE ZFFWEIGHT ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER, Z_OPT 
INTEGER, ZFOOD INTEGER, ZGRAMWEIGHT FLOAT, ZAMOUNT FLOAT, ZLONGDESC VARCHAR );
CREATE TABLE Z_METADATA (Z_VERSION INTEGER PRIMARY KEY, Z_UUID VARCHAR(255), 
Z_PLIST BLOB);
CREATE TABLE Z_PRIMARYKEY (Z_ENT INTEGER PRIMARY KEY, Z_NAME VARCHAR, Z_SUPER 
INTEGER, Z_MAX INTEGER);
CREATE INDEX ZFFFOODNUTRIENT_ZAMOUNTINHUNDREDGRAM_INDEX ON ZFFFOODNUTRIENT 
(ZAMOUNTINHUNDREDGRAM);
CREATE INDEX ZFFFOODNUTRIENT_ZFOOD_INDEX ON ZFFFOODNUTRIENT (ZFOOD);
CREATE INDEX ZFFFOODNUTRIENT_ZNUTRIENT_INDEX ON ZFFFOODNUTRIENT (ZNUTRIENT);
CREATE INDEX ZFFFOOD_ZFOODGROUP_INDEX ON ZFFFOOD (ZFOODGROUP);
CREATE INDEX ZFFFOOD_ZLONGDESC_INDEX ON ZFFFOOD (ZLONGDESC);
CREATE INDEX ZFFNUTRIENT_ZTAGNAME_INDEX ON ZFFNUTRIENT (ZTAGNAME);
CREATE INDEX ZFFWEIGHT_ZFOOD_INDEX ON ZFFWEIGHT (ZFOOD);
CREATE INDEX ZFFWEIGHT_ZLONGDESC_INDEX ON ZFFWEIGHT (ZLONGDESC);

My current task is to get the number of foods that belong to each group and 
have at least one weight data related to them.

The query I am using for this is:
SELECT COUNT(DISTINCT t0.Z_PK) FROM ZFFFOOD t0 WHERE (t0.ZFOODGROUP = ? AND 
(SELECT COUNT(*) FROM ZFFWEIGHT t1 WHERE (t0.Z_PK = t1.ZFOOD) ) > ?)

When inserting 12 for ZFOODGROUP and 0 for count (I never check for another 
count value here, it's always 0), I get a result of 761 which takes 0.0591 
seconds. As I have to fetch the counts of all available groups, this query is 
sent in a loop which easily takes several seconds to finish.

I dropped the inner select - disregarding the wrong count as this is just a 
speed test - and the query is a bit faster, but still too slow (0.0314 seconds):
SELECT COUNT(DISTINCT t0.Z_PK) FROM ZFFFOOD t0 WHERE t0.ZFOODGROUP = ?

Due to the size of the database, I put it online for download rather than 
attaching it. ;-)
Please download a copy here: http://dl.dropbox.com/u/8867517/food.sqlite.zip

Any hints are welcome. I'm keen to learn. :-)

Cheers,
Sven
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users