Re: [sqlite] Faulty acceptance of non-aggregate value that is notin group by part of the SELECT statement

2011-10-11 Thread Frank Missel
> -Original Message-
> From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
> boun...@sqlite.org] On Behalf Of Petite Abeille
> Sent: 12 October 2011 00:15
> 
> I wonder how much of that feature is intentional vs. accidental. To me it
> looks more like an implementation detail leak being post-rationalized as a
> feature. Just my 2¢ though.

I thought so  too, but the behaviour is documented under the select
statement:

http://www.sqlite.org/lang_select.html#resultset

Under " 3. Generation of the set of result rows."

Third bullet, third paragraph:

"If the SELECT statement is an aggregate query with a GROUP BY clause, then
each of the expressions specified as part of the GROUP BY clause is
evaluated for each row of the dataset. Each row is then assigned to a
"group" based on the results; rows for which the results of evaluating the
GROUP BY expressions are the same are assigned to the same group. For the
purposes of grouping rows, NULL values are considered equal. The usual rules
for selecting a collation sequence with which to compare text values apply
when evaluating expressions in a GROUP BY clause. The expressions in the
GROUP BY clause do not have to be expressions that appear in the result. The
expressions in a GROUP BY clause may not be aggregate expressions.

If a HAVING clause is specified, it is evaluated once for each group of rows
as a boolean expression. If the result of evaluating the HAVING clause is
false, the group is discarded. If the HAVING clause is an aggregate
expression, it is evaluated across all rows in the group. If a HAVING clause
is a non-aggregate expression, it is evaluated with respect to an
arbitrarily selected row from the group. The HAVING expression may refer to
values, even aggregate functions, that are not in the result.

Each expression in the result-set is then evaluated once for each group of
rows. If the expression is an aggregate expression, it is evaluated across
all rows in the group. Otherwise, it is evaluated against a single
arbitrarily chosen row from within the group. If there is more than one
non-aggregate expression in the result-set, then all such expressions are
evaluated for the same row.

Each group of input dataset rows contributes a single row to the set of
result rows. Subject to filtering associated with the DISTINCT keyword, the
number of rows returned by an aggregate query with a GROUP BY clause is the
same as the number of groups of rows produced by applying the GROUP BY and
HAVING clauses to the filtered input dataset.
"

But then again, it could of course still be something that was documented
after the fact.

In any case,  although I can see a certain advantage as explained earlier by
Igor (less SQL code, very small performance gain) when the a column contains
the same value for all rows in the result set, I think that the danger of
getting a wrong result (as from what you would expect) far outweighs the
advantage. The best would be a choice now that the feature is there and
expected by some.

Does anyone from SQLite Development have an opinion on this?


/Frank Missel

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Faulty acceptance of non-aggregate value that is notin group by part of the SELECT statement

2011-10-11 Thread Petite Abeille

On Oct 11, 2011, at 6:00 PM, Frank Missel wrote:

> Yeah, a pragma strict would be great.  I can't be the only one who would
> rather make the queries more error proof.

I wonder how much of that feature is intentional vs. accidental. To me it looks 
more like an implementation detail leak being post-rationalized as a feature. 
Just my 2¢ though.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Faulty acceptance of non-aggregate value that is notin group by part of the SELECT statement

2011-10-11 Thread Frank Missel
> Sometimes, you know that the value of a particular column is in fact
unique
> across the group (in which case it doesn't matter which row it's taken
from).
> This knowledge could come from invariants being maintained that are not
> perhaps formally captured in the database schema, or else flow from the
> particular join and WHERE conditions.
> 
> In such cases (which come up surprisingly often, in my experience), it's
> convenient to be able to just use the column name. I also work with MySQL
a
> bit, which doesn't allow that, so you have to wrap the column name in
min()
> or max() (doesn't matter which, as all values are the same). Personally, I
find
> it annoying. It makes the database engine do unnecessary comparisons, thus
> hurting performance (though I admit that the difference is likely to be
> immeasurably small), and more importantly, it makes the statement more
> verbose and difficult to read and understand.
> 
> Now, if there were some kind of a PRAGMA that would turn this behavior off
> and enforce stricter syntax rules, I wouldn't be against it. I'd likely
just never
> use it. Please feel free to try and convince SQLite developers (of which
I'm
> not) to add such a pragma (but don't expect me to pitch in for the cause).
> --
> Igor Tandetnik

Ok, I see that.

So its weighing the extra typing (and small performance gain) against the
prevention of an error by a mistakenly left out column.
Yeah, a pragma strict would be great.  I can't be the only one who would
rather make the queries more error proof.


/Frank Missel

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Faulty acceptance of non-aggregate value that is notin group by part of the SELECT statement

2011-10-11 Thread Igor Tandetnik
Frank Missel  wrote:
> As for the advantages, I just don't see how it could be practical to have an
> arbitrary group value together with the total number of records in an
> application.

Sometimes, you know that the value of a particular column is in fact unique 
across the group (in which case it doesn't matter which row it's taken from). 
This knowledge could come from invariants being maintained that are not perhaps 
formally captured in the database schema, or else flow from the particular join 
and WHERE conditions.

In such cases (which come up surprisingly often, in my experience), it's 
convenient to be able to just use the column name. I also work with MySQL a 
bit, which doesn't allow that, so you have to wrap the column name in min() or 
max() (doesn't matter which, as all values are the same). Personally, I find it 
annoying. It makes the database engine do unnecessary comparisons, thus hurting 
performance (though I admit that the difference is likely to be immeasurably 
small), and more importantly, it makes the statement more verbose and difficult 
to read and understand.

Now, if there were some kind of a PRAGMA that would turn this behavior off and 
enforce stricter syntax rules, I wouldn't be against it. I'd likely just never 
use it. Please feel free to try and convince SQLite developers (of which I'm 
not) to add such a pragma (but don't expect me to pitch in for the cause).
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users