Re: Documentation of EXCEPT ALL may have a bug
Alvaro Herrera writes: > PG Doc comments form wrote: >> create table t(x int); >> create table u(x int); >> >> insert into t values (1), (2), (2), (3), (3); >> insert into u values (1), (2); >> >> select * from t except all select * from u; >> x >> --- >> 3 >> 3 >> 2 >> (3 rows) > I find this pretty odd behavior. Is this not an outright bug? It's exactly what the standard says to do: if there are M occurrences of a row value in the LHS, and N occurrences in the RHS, emit max(M-N, 0) copies of the row. To my mind that's a reasonable definition of EXCEPT if you suppose that nominally-identical rows are matched and discarded one by one, rather than with de-duplication occurring beforehand. > [*] I didn't try terribly hard, but couldn't actually find where the > behavior is defined. In SQL:2011, it's 7.13 general rule 3) b) iii) 3) B), on page 420 in the draft version I have. regards, tom lane
Re: Documentation of EXCEPT ALL may have a bug
On Sat, Feb 10, 2018 at 1:02 PM, Pantelis Theodosiou wrote: > > > On Sat, Feb 10, 2018 at 11:59 AM, Alvaro Herrera > wrote: > >> PG Doc comments form wrote: >> >> > create table t(x int); >> > create table u(x int); >> > >> > insert into t values (1), (2), (2), (3), (3); >> > insert into u values (1), (2); >> > >> > select * from t except all select * from u; >> >> > x >> > --- >> > 3 >> > 3 >> > 2 >> > (3 rows) >> >> I find this pretty odd behavior. Is this not an outright bug? I >> haven't read the SQL standard on this matter[*], but if they define >> EXCEPT ALL to work like this, then it seems pretty useless. (Maybe they >> just didn't intend EXCEPT ALL to be useful?) If this is indeed the >> case, maybe we should amend the docs not only to explain this behavior >> but also to warn against the construct. >> > > I think that's the way it is defined (but I agree, I don't remember ever > seeing a use for it) > > >> >> [*] I didn't try terribly hard, but couldn't actually find where the >> behavior is defined. What I have on hand is a draft of SQL:2011 where >> this appears to be defined in 7.13 but I was unable >> to find the rules for set operations. It refers to 9.12 Grouping >> operations but that defines conformance rules only. >> >> -- >> > > I may have a different version but I see: > That is in: 7.13 General Rules 3, Case b: > > iii) > T contains the following rows: > 1) Let R be a row that is a duplicate of some row in ET1 or of some row in > ET2 or both. Let > m be the number of duplicates of R in ET1 and let n be the number of > duplicates of R in > ET2, where m ≥ 0 and n ≥ 0. > 2) If DISTINCT is specified or implicit, then > ... > > 3) If ALL is specified, then > Case: > A) If UNION is specified, then the number of duplicates of R that T > contains is (m + n). > B) If EXCEPT is specified, then the number of duplicates of R that T > contains is the > maximum of (m – n) and 0 (zero). > C) If INTERSECT is specified, then the number of duplicates of R that T > contains is the > minimum of m and n. > >
Re: Documentation of EXCEPT ALL may have a bug
On Sat, Feb 10, 2018 at 11:59 AM, Alvaro Herrera wrote: > PG Doc comments form wrote: > > > create table t(x int); > > create table u(x int); > > > > insert into t values (1), (2), (2), (3), (3); > > insert into u values (1), (2); > > > > select * from t except all select * from u; > > > x > > --- > > 3 > > 3 > > 2 > > (3 rows) > > I find this pretty odd behavior. Is this not an outright bug? I > haven't read the SQL standard on this matter[*], but if they define > EXCEPT ALL to work like this, then it seems pretty useless. (Maybe they > just didn't intend EXCEPT ALL to be useful?) If this is indeed the > case, maybe we should amend the docs not only to explain this behavior > but also to warn against the construct. > I think that's the way it is defined (but I agree, I don't remember ever seeing a use for it) > > [*] I didn't try terribly hard, but couldn't actually find where the > behavior is defined. What I have on hand is a draft of SQL:2011 where > this appears to be defined in 7.13 but I was unable > to find the rules for set operations. It refers to 9.12 Grouping > operations but that defines conformance rules only. > > -- > I may have a different version but I see: iii) T contains the following rows: 1) Let R be a row that is a duplicate of some row in ET1 or of some row in ET2 or both. Let m be the number of duplicates of R in ET1 and let n be the number of duplicates of R in ET2, where m ≥ 0 and n ≥ 0. 2) If DISTINCT is specified or implicit, then ... 3) If ALL is specified, then Case: A) If UNION is specified, then the number of duplicates of R that T contains is (m + n). B) If EXCEPT is specified, then the number of duplicates of R that T contains is the maximum of (m – n) and 0 (zero). C) If INTERSECT is specified, then the number of duplicates of R that T contains is the minimum of m and n.
Re: Documentation of EXCEPT ALL may have a bug
PG Doc comments form wrote: > create table t(x int); > create table u(x int); > > insert into t values (1), (2), (2), (3), (3); > insert into u values (1), (2); > > select * from t except all select * from u; > x > --- > 3 > 3 > 2 > (3 rows) I find this pretty odd behavior. Is this not an outright bug? I haven't read the SQL standard on this matter[*], but if they define EXCEPT ALL to work like this, then it seems pretty useless. (Maybe they just didn't intend EXCEPT ALL to be useful?) If this is indeed the case, maybe we should amend the docs not only to explain this behavior but also to warn against the construct. [*] I didn't try terribly hard, but couldn't actually find where the behavior is defined. What I have on hand is a draft of SQL:2011 where this appears to be defined in 7.13 but I was unable to find the rules for set operations. It refers to 9.12 Grouping operations but that defines conformance rules only. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Documentation of EXCEPT ALL may have a bug
The following documentation comment has been logged on the website: Page: https://www.postgresql.org/docs/10/static/queries-union.html Description: I believe that the documented behavior of EXCEPT is not in agreement with Postgres behavior (I'm using Postgres 9.5). The documents say: EXCEPT returns all rows that are in the result of query1 but not in the result of query2. (This is sometimes called the difference between two queries.) Again, duplicates are eliminated unless EXCEPT ALL is used. Here is a test script: drop table if exists t; drop table if exists u; create table t(x int); create table u(x int); insert into t values (1), (2), (2), (3), (3); insert into u values (1), (2); select * from t except select * from u; select * from t except all select * from u; And here is the output: DROP TABLE DROP TABLE CREATE TABLE CREATE TABLE INSERT 0 5 INSERT 0 2 x --- 3 (1 row) x --- 3 3 2 (3 rows) The output from EXCEPT matches the documented behavior. The output from EXCEPT ALL makes sense to me, but I think it is at odds with the documentation: "EXCEPT returns all rows that are in the result of query1 but not in the result of query2." This general statement is then modified by the discussion of EXCEPT ALL. That first sentence does not deal with duplicates in the input tables, and just discusses set membership. Each occurrence of (3) in query1 is therefore kept. For EXCEPT, the duplicates are eliminated (yielding the output [3]), and EXCEPT ALL should therefore yield [3, 3]. In the EXCEPT ALL case, both occurrences of (2) in query1 should be eliminated by the one occurrence in query2. I think this is a fair interpretation based on the wording. To match the observed behavior, I think that the description of EXCEPT ALL needs to be modified to something like this: EXCEPT ALL returns those rows of query1 in excess of matching rows in query2, as well as rows of query1 that have no match in query2.