Re: [sqlite] query performance question
On 5 Oct 2010, at 1:14pm, Igor Tandetnik wrote: > Simon Slavinwrote: >> On 5 Oct 2010, at 10:43am, Mail wrote: >> >>> SELECT COUNT(DISTINCT t0.Z_PK) FROM ZFFFOOD t0 WHERE (t0.ZFOODGROUP = ? AND >>> (SELECT COUNT(*) FROM ZFFWEIGHT t1 WHERE (t0.Z_PK = >>> t1.ZFOOD) ) > ?) >> >> Did you miss out 'AS' from your FROM clause between ZFFFOOD and t0 ? And >> again in the sub-SELECT ? > > AS keyword is always optional. I just noticed that on the syntax diagram. Here's me trying to make my code understandable but SQL doesn't care. Bah. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] query performance question
Simon Slavinwrote: > On 5 Oct 2010, at 10:43am, Mail wrote: > >> SELECT COUNT(DISTINCT t0.Z_PK) FROM ZFFFOOD t0 WHERE (t0.ZFOODGROUP = ? AND >> (SELECT COUNT(*) FROM ZFFWEIGHT t1 WHERE (t0.Z_PK = >> t1.ZFOOD) ) > ?) > > Did you miss out 'AS' from your FROM clause between ZFFFOOD and t0 ? And > again in the sub-SELECT ? AS keyword is always optional. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] query performance question
On 5 Oct 2010, at 10:43am, Mail wrote: > SELECT COUNT(DISTINCT t0.Z_PK) FROM ZFFFOOD t0 WHERE (t0.ZFOODGROUP = ? AND > (SELECT COUNT(*) FROM ZFFWEIGHT t1 WHERE (t0.Z_PK = t1.ZFOOD) ) > ?) Did you miss out 'AS' from your FROM clause between ZFFFOOD and t0 ? And again in the sub-SELECT ? Looking at your WHERE clause you have 't0.ZFOODGROUP = ?' and I can see an index on the ZFOODGROUP column, so that should be okay. So I suspect that the time-suck is the sub-SELECT. Can you rephrase the main SELECT so it has the appropriate kind of JOIN instead of a sub-SELECT ? But I think your biggest problem is that COUNT(*) is extremely slow because it can't use indexes. Could you put a field in ZFFFOOD which tells you how many groups that ZFFFOOD has ? I know it raises problems of database integrity but if you need to do this operation a lot it's going to speed it up tremendously. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] query performance question
Thank you very much! Your query is exactly what I was looking for. If I understand you correctly, the main bottleneck is the count in the inner select - is that correct? (disregarding the overall loop for all groups) Am 05.10.2010 um 12:24 schrieb Drake Wilson: > Quoth Mail, on 2010-10-05 11:43:29 +0200: >> SELECT COUNT(DISTINCT t0.Z_PK) FROM ZFFFOOD t0 WHERE (t0.ZFOODGROUP >> = ? AND (SELECT COUNT(*) FROM ZFFWEIGHT t1 WHERE (t0.Z_PK = >> t1.ZFOOD) ) > ?) > > There should not need to be a DISTINCT when talking about a primary > key. They will already be distinct by definition, no? (I'm assuming > your primary keys will not be null, even though you don't have a NOT > NULL constraint on the column.) > > Aside from that, if you want "at least one", EXISTS is the obvious > operator to use rather than first requesting the count (and having to > iterate all the rows even though only the first one is needed to > demonstrate existence.) > >> My current task is to get the number of foods that belong to each >> group and have at least one weight data related to them. > > That suggests something like: > > SELECT g.Z_PK AS "group", COUNT(f.Z_PK) AS "count" >FROM ZFFFOODGROUP g > INNER JOIN ZFFFOOD f ON f.ZFOODGROUP = g.Z_PK > WHERE EXISTS (SELECT w.Z_PK FROM ZFFWEIGHT w WHERE w.ZFOOD = f.Z_PK) > GROUP BY g.Z_PK; > > (I think the WHERE EXISTS could be done with another join, but it > would probably be more convoluted.) > >> When inserting 12 for ZFOODGROUP and 0 for count (I never check for >> another count value here, it's always 0), I get a result of 761 >> which takes 0.0591 seconds. As I have to fetch the counts of all >> available groups, this query is sent in a loop which easily takes >> several seconds to finish. > > Do the loop in the query itself using GROUP BY instead, if you can, > rather than issuing several queries. > > My local, unrigorous measurements suggest that on my machine, with the > provided example database, your original query takes 40 ms of CPU per > run and the above takes 16 ms total; I imagine the use of GROUP BY and > EXISTS and the lack of the extra DISTINCT are the primary factors, but > I haven't checked thoroughly enough to say so confidently. I'm using > SQLite 3.7.2 on Debian GNU/Linux sid AMD64. > > ---> Drake Wilson > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] query performance question
wrote: > My current task is to get the number of foods that belong to each group and > have at least one weight data related to them. > > The query I am using for this is: > SELECT COUNT(DISTINCT t0.Z_PK) FROM ZFFFOOD t0 WHERE (t0.ZFOODGROUP = ? AND > (SELECT COUNT(*) FROM ZFFWEIGHT t1 WHERE (t0.Z_PK = > t1.ZFOOD) ) > ?) > > When inserting 12 for ZFOODGROUP and 0 for count (I never check for another > count value here, it's always 0), I get a result of > 761 which takes 0.0591 seconds. As I have to fetch the counts of all > available groups, this query is sent in a loop which easily > takes several seconds to finish. select ZFOODGROUP, count(Z_PK) from ZFFFOOD where Z_PK in (select ZFOOD from ZFFWEIGHT) group by ZFOODGROUP; This gives you counts for all groups at once. Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] query performance question
Quoth Drake Wilson, on 2010-10-05 03:24:01 -0700: > > My current task is to get the number of foods that belong to each > > group and have at least one weight data related to them. > > That suggests something like: > > SELECT g.Z_PK AS "group", COUNT(f.Z_PK) AS "count" > FROM ZFFFOODGROUP g > INNER JOIN ZFFFOOD f ON f.ZFOODGROUP = g.Z_PK >WHERE EXISTS (SELECT w.Z_PK FROM ZFFWEIGHT w WHERE w.ZFOOD = f.Z_PK) >GROUP BY g.Z_PK; Oh---actually I may have misinterpreted your semantics slightly. This will only find groups that have at least one such row. If you want to find groups with no such foods and get an explicit zero as a result, you would have to change that to a LEFT JOIN and then move the EXISTS somewhere else to ensure at least one row for the zero-result groups. I'd probably drop the WHERE, then replace the COUNT with SUM(EXISTS (SELECT ...)) because the EXISTS will return 1 for rows where the subselect returns non-empty, though there might be a better way to do that. ---> Drake Wilson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] query performance question
Quoth Mail, on 2010-10-05 11:43:29 +0200: > SELECT COUNT(DISTINCT t0.Z_PK) FROM ZFFFOOD t0 WHERE (t0.ZFOODGROUP > = ? AND (SELECT COUNT(*) FROM ZFFWEIGHT t1 WHERE (t0.Z_PK = > t1.ZFOOD) ) > ?) There should not need to be a DISTINCT when talking about a primary key. They will already be distinct by definition, no? (I'm assuming your primary keys will not be null, even though you don't have a NOT NULL constraint on the column.) Aside from that, if you want "at least one", EXISTS is the obvious operator to use rather than first requesting the count (and having to iterate all the rows even though only the first one is needed to demonstrate existence.) > My current task is to get the number of foods that belong to each > group and have at least one weight data related to them. That suggests something like: SELECT g.Z_PK AS "group", COUNT(f.Z_PK) AS "count" FROM ZFFFOODGROUP g INNER JOIN ZFFFOOD f ON f.ZFOODGROUP = g.Z_PK WHERE EXISTS (SELECT w.Z_PK FROM ZFFWEIGHT w WHERE w.ZFOOD = f.Z_PK) GROUP BY g.Z_PK; (I think the WHERE EXISTS could be done with another join, but it would probably be more convoluted.) > When inserting 12 for ZFOODGROUP and 0 for count (I never check for > another count value here, it's always 0), I get a result of 761 > which takes 0.0591 seconds. As I have to fetch the counts of all > available groups, this query is sent in a loop which easily takes > several seconds to finish. Do the loop in the query itself using GROUP BY instead, if you can, rather than issuing several queries. My local, unrigorous measurements suggest that on my machine, with the provided example database, your original query takes 40 ms of CPU per run and the above takes 16 ms total; I imagine the use of GROUP BY and EXISTS and the lack of the extra DISTINCT are the primary factors, but I haven't checked thoroughly enough to say so confidently. I'm using SQLite 3.7.2 on Debian GNU/Linux sid AMD64. ---> Drake Wilson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] query performance question
Hi everybody, after reading some parts of the documentation and numerous Google searches, I still have no solution for fixing my slow query. I would really appreciate if you could point me in the right direction. The basis is a food database (~38 megabyte total size) that contains groups/categories, foods themselves and nutrition information. The database schema: CREATE TABLE ZFFFOOD ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER, Z_OPT INTEGER, ZFOODGROUP INTEGER, ZSHORTDESC VARCHAR, ZLONGDESC VARCHAR ); CREATE TABLE ZFFFOODGROUP ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER, Z_OPT INTEGER, ZLONGDESC VARCHAR, ZICON VARCHAR ); CREATE TABLE ZFFFOODNUTRIENT ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER, Z_OPT INTEGER, ZFOOD INTEGER, ZNUTRIENT INTEGER, ZAMOUNTINHUNDREDGRAM FLOAT ); CREATE TABLE ZFFNUTRIENT ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER, Z_OPT INTEGER, ZTAGNAME VARCHAR, ZUNITS VARCHAR, ZLONGDESC VARCHAR ); CREATE TABLE ZFFWEIGHT ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER, Z_OPT INTEGER, ZFOOD INTEGER, ZGRAMWEIGHT FLOAT, ZAMOUNT FLOAT, ZLONGDESC VARCHAR ); CREATE TABLE Z_METADATA (Z_VERSION INTEGER PRIMARY KEY, Z_UUID VARCHAR(255), Z_PLIST BLOB); CREATE TABLE Z_PRIMARYKEY (Z_ENT INTEGER PRIMARY KEY, Z_NAME VARCHAR, Z_SUPER INTEGER, Z_MAX INTEGER); CREATE INDEX ZFFFOODNUTRIENT_ZAMOUNTINHUNDREDGRAM_INDEX ON ZFFFOODNUTRIENT (ZAMOUNTINHUNDREDGRAM); CREATE INDEX ZFFFOODNUTRIENT_ZFOOD_INDEX ON ZFFFOODNUTRIENT (ZFOOD); CREATE INDEX ZFFFOODNUTRIENT_ZNUTRIENT_INDEX ON ZFFFOODNUTRIENT (ZNUTRIENT); CREATE INDEX ZFFFOOD_ZFOODGROUP_INDEX ON ZFFFOOD (ZFOODGROUP); CREATE INDEX ZFFFOOD_ZLONGDESC_INDEX ON ZFFFOOD (ZLONGDESC); CREATE INDEX ZFFNUTRIENT_ZTAGNAME_INDEX ON ZFFNUTRIENT (ZTAGNAME); CREATE INDEX ZFFWEIGHT_ZFOOD_INDEX ON ZFFWEIGHT (ZFOOD); CREATE INDEX ZFFWEIGHT_ZLONGDESC_INDEX ON ZFFWEIGHT (ZLONGDESC); My current task is to get the number of foods that belong to each group and have at least one weight data related to them. The query I am using for this is: SELECT COUNT(DISTINCT t0.Z_PK) FROM ZFFFOOD t0 WHERE (t0.ZFOODGROUP = ? AND (SELECT COUNT(*) FROM ZFFWEIGHT t1 WHERE (t0.Z_PK = t1.ZFOOD) ) > ?) When inserting 12 for ZFOODGROUP and 0 for count (I never check for another count value here, it's always 0), I get a result of 761 which takes 0.0591 seconds. As I have to fetch the counts of all available groups, this query is sent in a loop which easily takes several seconds to finish. I dropped the inner select - disregarding the wrong count as this is just a speed test - and the query is a bit faster, but still too slow (0.0314 seconds): SELECT COUNT(DISTINCT t0.Z_PK) FROM ZFFFOOD t0 WHERE t0.ZFOODGROUP = ? Due to the size of the database, I put it online for download rather than attaching it. ;-) Please download a copy here: http://dl.dropbox.com/u/8867517/food.sqlite.zip Any hints are welcome. I'm keen to learn. :-) Cheers, Sven ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users