Re: [sqlite] Faulty acceptance of non-aggregate value that is notin group by part of the SELECT statement
> -Original Message- > From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users- > boun...@sqlite.org] On Behalf Of Petite Abeille > Sent: 12 October 2011 00:15 > > I wonder how much of that feature is intentional vs. accidental. To me it > looks more like an implementation detail leak being post-rationalized as a > feature. Just my 2¢ though. I thought so too, but the behaviour is documented under the select statement: http://www.sqlite.org/lang_select.html#resultset Under " 3. Generation of the set of result rows." Third bullet, third paragraph: "If the SELECT statement is an aggregate query with a GROUP BY clause, then each of the expressions specified as part of the GROUP BY clause is evaluated for each row of the dataset. Each row is then assigned to a "group" based on the results; rows for which the results of evaluating the GROUP BY expressions are the same are assigned to the same group. For the purposes of grouping rows, NULL values are considered equal. The usual rules for selecting a collation sequence with which to compare text values apply when evaluating expressions in a GROUP BY clause. The expressions in the GROUP BY clause do not have to be expressions that appear in the result. The expressions in a GROUP BY clause may not be aggregate expressions. If a HAVING clause is specified, it is evaluated once for each group of rows as a boolean expression. If the result of evaluating the HAVING clause is false, the group is discarded. If the HAVING clause is an aggregate expression, it is evaluated across all rows in the group. If a HAVING clause is a non-aggregate expression, it is evaluated with respect to an arbitrarily selected row from the group. The HAVING expression may refer to values, even aggregate functions, that are not in the result. Each expression in the result-set is then evaluated once for each group of rows. If the expression is an aggregate expression, it is evaluated across all rows in the group. Otherwise, it is evaluated against a single arbitrarily chosen row from within the group. If there is more than one non-aggregate expression in the result-set, then all such expressions are evaluated for the same row. Each group of input dataset rows contributes a single row to the set of result rows. Subject to filtering associated with the DISTINCT keyword, the number of rows returned by an aggregate query with a GROUP BY clause is the same as the number of groups of rows produced by applying the GROUP BY and HAVING clauses to the filtered input dataset. " But then again, it could of course still be something that was documented after the fact. In any case, although I can see a certain advantage as explained earlier by Igor (less SQL code, very small performance gain) when the a column contains the same value for all rows in the result set, I think that the danger of getting a wrong result (as from what you would expect) far outweighs the advantage. The best would be a choice now that the feature is there and expected by some. Does anyone from SQLite Development have an opinion on this? /Frank Missel ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Faulty acceptance of non-aggregate value that is notin group by part of the SELECT statement
On Oct 11, 2011, at 6:00 PM, Frank Missel wrote: > Yeah, a pragma strict would be great. I can't be the only one who would > rather make the queries more error proof. I wonder how much of that feature is intentional vs. accidental. To me it looks more like an implementation detail leak being post-rationalized as a feature. Just my 2¢ though. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Faulty acceptance of non-aggregate value that is notin group by part of the SELECT statement
> Sometimes, you know that the value of a particular column is in fact unique > across the group (in which case it doesn't matter which row it's taken from). > This knowledge could come from invariants being maintained that are not > perhaps formally captured in the database schema, or else flow from the > particular join and WHERE conditions. > > In such cases (which come up surprisingly often, in my experience), it's > convenient to be able to just use the column name. I also work with MySQL a > bit, which doesn't allow that, so you have to wrap the column name in min() > or max() (doesn't matter which, as all values are the same). Personally, I find > it annoying. It makes the database engine do unnecessary comparisons, thus > hurting performance (though I admit that the difference is likely to be > immeasurably small), and more importantly, it makes the statement more > verbose and difficult to read and understand. > > Now, if there were some kind of a PRAGMA that would turn this behavior off > and enforce stricter syntax rules, I wouldn't be against it. I'd likely just never > use it. Please feel free to try and convince SQLite developers (of which I'm > not) to add such a pragma (but don't expect me to pitch in for the cause). > -- > Igor Tandetnik Ok, I see that. So its weighing the extra typing (and small performance gain) against the prevention of an error by a mistakenly left out column. Yeah, a pragma strict would be great. I can't be the only one who would rather make the queries more error proof. /Frank Missel ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Faulty acceptance of non-aggregate value that is notin group by part of the SELECT statement
Frank Misselwrote: > As for the advantages, I just don't see how it could be practical to have an > arbitrary group value together with the total number of records in an > application. Sometimes, you know that the value of a particular column is in fact unique across the group (in which case it doesn't matter which row it's taken from). This knowledge could come from invariants being maintained that are not perhaps formally captured in the database schema, or else flow from the particular join and WHERE conditions. In such cases (which come up surprisingly often, in my experience), it's convenient to be able to just use the column name. I also work with MySQL a bit, which doesn't allow that, so you have to wrap the column name in min() or max() (doesn't matter which, as all values are the same). Personally, I find it annoying. It makes the database engine do unnecessary comparisons, thus hurting performance (though I admit that the difference is likely to be immeasurably small), and more importantly, it makes the statement more verbose and difficult to read and understand. Now, if there were some kind of a PRAGMA that would turn this behavior off and enforce stricter syntax rules, I wouldn't be against it. I'd likely just never use it. Please feel free to try and convince SQLite developers (of which I'm not) to add such a pragma (but don't expect me to pitch in for the cause). -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users