Re: [sqlite] Group by optimizer
> The documentation ought to say somewhere that the value of any column in an > aggregate query that is not part of the GROUP BY clause and which is not an > argument to an aggregate function is undefined. Does it not say so > already? Looks like it doesn't. Searching on the site only this vague sentence (from http://www.sqlite.org/lang_select.html): The GROUP BY clause causes one or more rows of the result to be combined into a single row of output. This is especially useful when the result contains aggregate functions. And this ticket http://www.sqlite.org/cvstrac/tktview?tn=3629,39 which was marked by you as "documentation bug". Pavel On Fri, May 7, 2010 at 4:38 PM, Richard Hippwrote: > On Fri, May 7, 2010 at 3:42 PM, Pavel Ivanov wrote: > >> > Absent the min statement, and no other constraints, SQLite should take >> > the first item and call it quits. >> >> No, it shouldn't. As I said without min() it's invalid SQL and SQLite >> has the right to do whatever it wants to. But of course I'd better see >> it returning error and not executing such SQL at all. >> >> > The documentation ought to say somewhere that the value of any column in an > aggregate query that is not part of the GROUP BY clause and which is not an > argument to an aggregate function is undefined. Does it not say so > already? If not, please remind me to add it. > > >> >> Pavel >> >> On Fri, May 7, 2010 at 3:38 PM, Matt Young wrote: >> > Yes, I have added the min and things work fine. >> > >> > select 'Start2';select min(c1),c2 from T0 group by c2 limit 1; >> > >> > Absent the min statement, and no other constraints, SQLite should take >> > the first item and call it quits. >> > select 'Start1';select * from T0 group by c2 limit 1; >> > >> > The second example without the min picks the last of the c2, not the >> first. >> > >> > >> > On 5/7/10, Matt Young wrote: >> >> On 5/7/10, Pavel Ivanov wrote: >> >>> Sorry, I can hardly understand what you are trying to say but want to >> >>> point out one error: >> >>> >> sqlite> select 'Start';select * from T0 group by c2; >> >>> >> >>> This is incorrect SQL. SQLite silently accepts it but its behavior is >> >>> undefined in such situation. Any other database wouldn't execute such >> >>> SQL. If you want to make it correct you should write something like >> >>> this: >> >>> >> >>> select max(c1), c2 from T0 group by c2; >> >>> or >> >>> select min(c1), c2 from T0 group by c2; >> >>> or >> >>> select sum(c1), c2 from T0 group by c2; >> >>> >> >>> Other aggregate variants are possible too. In general if you use >> >>> "group by" clause then only columns from "group by" clause can be >> >>> mentioned in select list, all other columns MUST be inside some sort >> >>> of aggregate function (http://www.sqlite.org/lang_aggfunc.html). >> >>> >> >>> >> >>> Pavel >> >>> >> >>> On Fri, May 7, 2010 at 2:02 PM, Matt Young >> wrote: >> In the following code I try select by group on one column. I am >> assuming the query optimizer will figure to stop gathering column 1s >> right away because there are no other selection constraints. Yet in >> the example, sql still looks through the entire table for additional >> column ones as one can see by looking as the column key it produces, >> the column key at the end of my group in column1. >> >> >> C:\R\SQLite>sqlite3 >> SQLite version 3.6.23.1 >> Enter ".help" for instructions >> Enter SQL statements terminated with a ";" >> sqlite> drop table T0; >> Error: no such table: T0 >> sqlite> create table T0( >> ...> c1 INTEGER PRIMARY KEY autoincrement, >> ...> c2 text); >> sqlite> >> sqlite> >> sqlite> insert into T0 (c2) values ('a'); >> sqlite> insert into T0 (c2) values ('a'); >> sqlite> insert into T0 (c2) values ('a'); >> sqlite> insert into T0 (c2) values ('b'); >> sqlite> insert into T0 (c2) values ('b'); >> sqlite> insert into T0 (c2) values ('c'); >> sqlite> insert into T0 (c2) values ('a'); >> sqlite> insert into T0 (c2) values ('d'); >> sqlite> >> sqlite> select 'Start';select * from T0; >> Start >> 1|a >> 2|a >> 3|a >> 4|b >> 5|b >> 6|c >> 7|a >> 8|d >> sqlite> select 'Start';select * from T0 group by c2; >> Start >> 7|a >> 5|b >> 6|c >> 8|d >> sqlite> >> >> >> In other words, is group by searching rows unnecessarily. I did not >> ask for the last one in the group? >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> >> >>> ___ >> >>> sqlite-users mailing list >> >>>
Re: [sqlite] Group by optimizer
On Fri, May 7, 2010 at 3:42 PM, Pavel Ivanovwrote: > > Absent the min statement, and no other constraints, SQLite should take > > the first item and call it quits. > > No, it shouldn't. As I said without min() it's invalid SQL and SQLite > has the right to do whatever it wants to. But of course I'd better see > it returning error and not executing such SQL at all. > > The documentation ought to say somewhere that the value of any column in an aggregate query that is not part of the GROUP BY clause and which is not an argument to an aggregate function is undefined. Does it not say so already? If not, please remind me to add it. > > Pavel > > On Fri, May 7, 2010 at 3:38 PM, Matt Young wrote: > > Yes, I have added the min and things work fine. > > > > select 'Start2';select min(c1),c2 from T0 group by c2 limit 1; > > > > Absent the min statement, and no other constraints, SQLite should take > > the first item and call it quits. > > select 'Start1';select * from T0 group by c2 limit 1; > > > > The second example without the min picks the last of the c2, not the > first. > > > > > > On 5/7/10, Matt Young wrote: > >> On 5/7/10, Pavel Ivanov wrote: > >>> Sorry, I can hardly understand what you are trying to say but want to > >>> point out one error: > >>> > sqlite> select 'Start';select * from T0 group by c2; > >>> > >>> This is incorrect SQL. SQLite silently accepts it but its behavior is > >>> undefined in such situation. Any other database wouldn't execute such > >>> SQL. If you want to make it correct you should write something like > >>> this: > >>> > >>> select max(c1), c2 from T0 group by c2; > >>> or > >>> select min(c1), c2 from T0 group by c2; > >>> or > >>> select sum(c1), c2 from T0 group by c2; > >>> > >>> Other aggregate variants are possible too. In general if you use > >>> "group by" clause then only columns from "group by" clause can be > >>> mentioned in select list, all other columns MUST be inside some sort > >>> of aggregate function (http://www.sqlite.org/lang_aggfunc.html). > >>> > >>> > >>> Pavel > >>> > >>> On Fri, May 7, 2010 at 2:02 PM, Matt Young > wrote: > In the following code I try select by group on one column. I am > assuming the query optimizer will figure to stop gathering column 1s > right away because there are no other selection constraints. Yet in > the example, sql still looks through the entire table for additional > column ones as one can see by looking as the column key it produces, > the column key at the end of my group in column1. > > > C:\R\SQLite>sqlite3 > SQLite version 3.6.23.1 > Enter ".help" for instructions > Enter SQL statements terminated with a ";" > sqlite> drop table T0; > Error: no such table: T0 > sqlite> create table T0( > ...> c1 INTEGER PRIMARY KEY autoincrement, > ...> c2 text); > sqlite> > sqlite> > sqlite> insert into T0 (c2) values ('a'); > sqlite> insert into T0 (c2) values ('a'); > sqlite> insert into T0 (c2) values ('a'); > sqlite> insert into T0 (c2) values ('b'); > sqlite> insert into T0 (c2) values ('b'); > sqlite> insert into T0 (c2) values ('c'); > sqlite> insert into T0 (c2) values ('a'); > sqlite> insert into T0 (c2) values ('d'); > sqlite> > sqlite> select 'Start';select * from T0; > Start > 1|a > 2|a > 3|a > 4|b > 5|b > 6|c > 7|a > 8|d > sqlite> select 'Start';select * from T0 group by c2; > Start > 7|a > 5|b > 6|c > 8|d > sqlite> > > > In other words, is group by searching rows unnecessarily. I did not > ask for the last one in the group? > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > >>> ___ > >>> sqlite-users mailing list > >>> sqlite-users@sqlite.org > >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > >>> > >> > > ___ > > sqlite-users mailing list > > sqlite-users@sqlite.org > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- - D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Group by optimizer
> Absent the min statement, and no other constraints, SQLite should take > the first item and call it quits. No, it shouldn't. As I said without min() it's invalid SQL and SQLite has the right to do whatever it wants to. But of course I'd better see it returning error and not executing such SQL at all. Pavel On Fri, May 7, 2010 at 3:38 PM, Matt Youngwrote: > Yes, I have added the min and things work fine. > > select 'Start2';select min(c1),c2 from T0 group by c2 limit 1; > > Absent the min statement, and no other constraints, SQLite should take > the first item and call it quits. > select 'Start1';select * from T0 group by c2 limit 1; > > The second example without the min picks the last of the c2, not the first. > > > On 5/7/10, Matt Young wrote: >> On 5/7/10, Pavel Ivanov wrote: >>> Sorry, I can hardly understand what you are trying to say but want to >>> point out one error: >>> sqlite> select 'Start';select * from T0 group by c2; >>> >>> This is incorrect SQL. SQLite silently accepts it but its behavior is >>> undefined in such situation. Any other database wouldn't execute such >>> SQL. If you want to make it correct you should write something like >>> this: >>> >>> select max(c1), c2 from T0 group by c2; >>> or >>> select min(c1), c2 from T0 group by c2; >>> or >>> select sum(c1), c2 from T0 group by c2; >>> >>> Other aggregate variants are possible too. In general if you use >>> "group by" clause then only columns from "group by" clause can be >>> mentioned in select list, all other columns MUST be inside some sort >>> of aggregate function (http://www.sqlite.org/lang_aggfunc.html). >>> >>> >>> Pavel >>> >>> On Fri, May 7, 2010 at 2:02 PM, Matt Young wrote: In the following code I try select by group on one column. I am assuming the query optimizer will figure to stop gathering column 1s right away because there are no other selection constraints. Yet in the example, sql still looks through the entire table for additional column ones as one can see by looking as the column key it produces, the column key at the end of my group in column1. C:\R\SQLite>sqlite3 SQLite version 3.6.23.1 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite> drop table T0; Error: no such table: T0 sqlite> create table T0( ...> c1 INTEGER PRIMARY KEY autoincrement, ...> c2 text); sqlite> sqlite> sqlite> insert into T0 (c2) values ('a'); sqlite> insert into T0 (c2) values ('a'); sqlite> insert into T0 (c2) values ('a'); sqlite> insert into T0 (c2) values ('b'); sqlite> insert into T0 (c2) values ('b'); sqlite> insert into T0 (c2) values ('c'); sqlite> insert into T0 (c2) values ('a'); sqlite> insert into T0 (c2) values ('d'); sqlite> sqlite> select 'Start';select * from T0; Start 1|a 2|a 3|a 4|b 5|b 6|c 7|a 8|d sqlite> select 'Start';select * from T0 group by c2; Start 7|a 5|b 6|c 8|d sqlite> In other words, is group by searching rows unnecessarily. I did not ask for the last one in the group? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> ___ >>> sqlite-users mailing list >>> sqlite-users@sqlite.org >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> >> > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Group by optimizer
Yes, I have added the min and things work fine. select 'Start2';select min(c1),c2 from T0 group by c2 limit 1; Absent the min statement, and no other constraints, SQLite should take the first item and call it quits. select 'Start1';select * from T0 group by c2 limit 1; The second example without the min picks the last of the c2, not the first. On 5/7/10, Matt Youngwrote: > On 5/7/10, Pavel Ivanov wrote: >> Sorry, I can hardly understand what you are trying to say but want to >> point out one error: >> >>> sqlite> select 'Start';select * from T0 group by c2; >> >> This is incorrect SQL. SQLite silently accepts it but its behavior is >> undefined in such situation. Any other database wouldn't execute such >> SQL. If you want to make it correct you should write something like >> this: >> >> select max(c1), c2 from T0 group by c2; >> or >> select min(c1), c2 from T0 group by c2; >> or >> select sum(c1), c2 from T0 group by c2; >> >> Other aggregate variants are possible too. In general if you use >> "group by" clause then only columns from "group by" clause can be >> mentioned in select list, all other columns MUST be inside some sort >> of aggregate function (http://www.sqlite.org/lang_aggfunc.html). >> >> >> Pavel >> >> On Fri, May 7, 2010 at 2:02 PM, Matt Young wrote: >>> In the following code I try select by group on one column. I am >>> assuming the query optimizer will figure to stop gathering column 1s >>> right away because there are no other selection constraints. Yet in >>> the example, sql still looks through the entire table for additional >>> column ones as one can see by looking as the column key it produces, >>> the column key at the end of my group in column1. >>> >>> >>> C:\R\SQLite>sqlite3 >>> SQLite version 3.6.23.1 >>> Enter ".help" for instructions >>> Enter SQL statements terminated with a ";" >>> sqlite> drop table T0; >>> Error: no such table: T0 >>> sqlite> create table T0( >>> ...> c1 INTEGER PRIMARY KEY autoincrement, >>> ...> c2 text); >>> sqlite> >>> sqlite> >>> sqlite> insert into T0 (c2) values ('a'); >>> sqlite> insert into T0 (c2) values ('a'); >>> sqlite> insert into T0 (c2) values ('a'); >>> sqlite> insert into T0 (c2) values ('b'); >>> sqlite> insert into T0 (c2) values ('b'); >>> sqlite> insert into T0 (c2) values ('c'); >>> sqlite> insert into T0 (c2) values ('a'); >>> sqlite> insert into T0 (c2) values ('d'); >>> sqlite> >>> sqlite> select 'Start';select * from T0; >>> Start >>> 1|a >>> 2|a >>> 3|a >>> 4|b >>> 5|b >>> 6|c >>> 7|a >>> 8|d >>> sqlite> select 'Start';select * from T0 group by c2; >>> Start >>> 7|a >>> 5|b >>> 6|c >>> 8|d >>> sqlite> >>> >>> >>> In other words, is group by searching rows unnecessarily. I did not >>> ask for the last one in the group? >>> ___ >>> sqlite-users mailing list >>> sqlite-users@sqlite.org >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Group by optimizer
On 5/7/10, Pavel Ivanovwrote: > Sorry, I can hardly understand what you are trying to say but want to > point out one error: > >> sqlite> select 'Start';select * from T0 group by c2; > > This is incorrect SQL. SQLite silently accepts it but its behavior is > undefined in such situation. Any other database wouldn't execute such > SQL. If you want to make it correct you should write something like > this: > > select max(c1), c2 from T0 group by c2; > or > select min(c1), c2 from T0 group by c2; > or > select sum(c1), c2 from T0 group by c2; > > Other aggregate variants are possible too. In general if you use > "group by" clause then only columns from "group by" clause can be > mentioned in select list, all other columns MUST be inside some sort > of aggregate function (http://www.sqlite.org/lang_aggfunc.html). > > > Pavel > > On Fri, May 7, 2010 at 2:02 PM, Matt Young wrote: >> In the following code I try select by group on one column. I am >> assuming the query optimizer will figure to stop gathering column 1s >> right away because there are no other selection constraints. Yet in >> the example, sql still looks through the entire table for additional >> column ones as one can see by looking as the column key it produces, >> the column key at the end of my group in column1. >> >> >> C:\R\SQLite>sqlite3 >> SQLite version 3.6.23.1 >> Enter ".help" for instructions >> Enter SQL statements terminated with a ";" >> sqlite> drop table T0; >> Error: no such table: T0 >> sqlite> create table T0( >> ...> c1 INTEGER PRIMARY KEY autoincrement, >> ...> c2 text); >> sqlite> >> sqlite> >> sqlite> insert into T0 (c2) values ('a'); >> sqlite> insert into T0 (c2) values ('a'); >> sqlite> insert into T0 (c2) values ('a'); >> sqlite> insert into T0 (c2) values ('b'); >> sqlite> insert into T0 (c2) values ('b'); >> sqlite> insert into T0 (c2) values ('c'); >> sqlite> insert into T0 (c2) values ('a'); >> sqlite> insert into T0 (c2) values ('d'); >> sqlite> >> sqlite> select 'Start';select * from T0; >> Start >> 1|a >> 2|a >> 3|a >> 4|b >> 5|b >> 6|c >> 7|a >> 8|d >> sqlite> select 'Start';select * from T0 group by c2; >> Start >> 7|a >> 5|b >> 6|c >> 8|d >> sqlite> >> >> >> In other words, is group by searching rows unnecessarily. I did not >> ask for the last one in the group? >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Group by optimizer
Sorry, I can hardly understand what you are trying to say but want to point out one error: > sqlite> select 'Start';select * from T0 group by c2; This is incorrect SQL. SQLite silently accepts it but its behavior is undefined in such situation. Any other database wouldn't execute such SQL. If you want to make it correct you should write something like this: select max(c1), c2 from T0 group by c2; or select min(c1), c2 from T0 group by c2; or select sum(c1), c2 from T0 group by c2; Other aggregate variants are possible too. In general if you use "group by" clause then only columns from "group by" clause can be mentioned in select list, all other columns MUST be inside some sort of aggregate function (http://www.sqlite.org/lang_aggfunc.html). Pavel On Fri, May 7, 2010 at 2:02 PM, Matt Youngwrote: > In the following code I try select by group on one column. I am > assuming the query optimizer will figure to stop gathering column 1s > right away because there are no other selection constraints. Yet in > the example, sql still looks through the entire table for additional > column ones as one can see by looking as the column key it produces, > the column key at the end of my group in column1. > > > C:\R\SQLite>sqlite3 > SQLite version 3.6.23.1 > Enter ".help" for instructions > Enter SQL statements terminated with a ";" > sqlite> drop table T0; > Error: no such table: T0 > sqlite> create table T0( > ...> c1 INTEGER PRIMARY KEY autoincrement, > ...> c2 text); > sqlite> > sqlite> > sqlite> insert into T0 (c2) values ('a'); > sqlite> insert into T0 (c2) values ('a'); > sqlite> insert into T0 (c2) values ('a'); > sqlite> insert into T0 (c2) values ('b'); > sqlite> insert into T0 (c2) values ('b'); > sqlite> insert into T0 (c2) values ('c'); > sqlite> insert into T0 (c2) values ('a'); > sqlite> insert into T0 (c2) values ('d'); > sqlite> > sqlite> select 'Start';select * from T0; > Start > 1|a > 2|a > 3|a > 4|b > 5|b > 6|c > 7|a > 8|d > sqlite> select 'Start';select * from T0 group by c2; > Start > 7|a > 5|b > 6|c > 8|d > sqlite> > > > In other words, is group by searching rows unnecessarily. I did not > ask for the last one in the group? > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Group by optimizer
In the following code I try select by group on one column. I am assuming the query optimizer will figure to stop gathering column 1s right away because there are no other selection constraints. Yet in the example, sql still looks through the entire table for additional column ones as one can see by looking as the column key it produces, the column key at the end of my group in column1. C:\R\SQLite>sqlite3 SQLite version 3.6.23.1 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite> drop table T0; Error: no such table: T0 sqlite> create table T0( ...> c1 INTEGER PRIMARY KEY autoincrement, ...> c2 text); sqlite> sqlite> sqlite> insert into T0 (c2) values ('a'); sqlite> insert into T0 (c2) values ('a'); sqlite> insert into T0 (c2) values ('a'); sqlite> insert into T0 (c2) values ('b'); sqlite> insert into T0 (c2) values ('b'); sqlite> insert into T0 (c2) values ('c'); sqlite> insert into T0 (c2) values ('a'); sqlite> insert into T0 (c2) values ('d'); sqlite> sqlite> select 'Start';select * from T0; Start 1|a 2|a 3|a 4|b 5|b 6|c 7|a 8|d sqlite> select 'Start';select * from T0 group by c2; Start 7|a 5|b 6|c 8|d sqlite> In other words, is group by searching rows unnecessarily. I did not ask for the last one in the group? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] GROUP BY optimizer cost problem with index
> create table stuff(a,b,c,d); > insert into stuff values(1,2,3,4); > create temp view v1 as select random()%100, > random()%100, random()%1000, random()%1 > from stuff x, stuff y; > insert into stuff select * from v1; > insert into stuff select * from v1; > insert into stuff select * from v1; > insert into stuff select * from v1; > insert into stuff select * from v1; > create index stuff_b on stuff(b); > create index stuff_c on stuff(c); > create index stuff_d on stuff(d); > analyze; A better example of this issue would be: -- 47 seconds select c from stuff where a=23 group by c; vs. -- 2 seconds select c from stuff where a=23 group by +c; since there is no index on a. Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] GROUP BY optimizer cost problem with index
There seems to be an issue with the sqlite cost heuristic with an INDEX present on GROUP BY with certain types of WHERE clauses. Given the database formed by running these statements: create table stuff(a,b,c,d); insert into stuff values(1,2,3,4); create temp view v1 as select random()%100, random()%100, random()%1000, random()%1 from stuff x, stuff y; insert into stuff select * from v1; insert into stuff select * from v1; insert into stuff select * from v1; insert into stuff select * from v1; insert into stuff select * from v1; create index stuff_b on stuff(b); create index stuff_c on stuff(c); create index stuff_d on stuff(d); analyze; Using sqlite.org's sqlite3-3.5.4.bin, this query takes 47 seconds: select c from stuff where +b=23 group by c; while this query takes just 2 seconds: select c from stuff where +b=23 group by +c; It is more efficient in this case to do a full table scan instead of using the INDEX on column c. Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs - To unsubscribe, send email to [EMAIL PROTECTED] -