Re: Documentation of EXCEPT ALL may have a bug

2018-02-10 Thread Tom Lane
Alvaro Herrera  writes:
> PG Doc comments form wrote:
>> create table t(x int);
>> create table u(x int);
>> 
>> insert into t values (1), (2), (2), (3), (3);
>> insert into u values (1), (2);
>> 
>> select * from t except all select * from u;

>> x 
>> ---
>> 3
>> 3
>> 2
>> (3 rows)

> I find this pretty odd behavior.  Is this not an outright bug?

It's exactly what the standard says to do: if there are M occurrences
of a row value in the LHS, and N occurrences in the RHS, emit
max(M-N, 0) copies of the row.  To my mind that's a reasonable definition
of EXCEPT if you suppose that nominally-identical rows are matched and
discarded one by one, rather than with de-duplication occurring
beforehand.

> [*] I didn't try terribly hard, but couldn't actually find where the
> behavior is defined.

In SQL:2011, it's 7.13  general rule 3) b) iii) 3) B),
on page 420 in the draft version I have.

regards, tom lane



Re: Documentation of EXCEPT ALL may have a bug

2018-02-10 Thread Pantelis Theodosiou
On Sat, Feb 10, 2018 at 1:02 PM, Pantelis Theodosiou 
wrote:

>
>
> On Sat, Feb 10, 2018 at 11:59 AM, Alvaro Herrera 
> wrote:
>
>> PG Doc comments form wrote:
>>
>> > create table t(x int);
>> > create table u(x int);
>> >
>> > insert into t values (1), (2), (2), (3), (3);
>> > insert into u values (1), (2);
>> >
>> > select * from t except all select * from u;
>>
>> >  x
>> > ---
>> >  3
>> >  3
>> >  2
>> > (3 rows)
>>
>> I find this pretty odd behavior.  Is this not an outright bug?  I
>> haven't read the SQL standard on this matter[*], but if they define
>> EXCEPT ALL to work like this, then it seems pretty useless.  (Maybe they
>> just didn't intend EXCEPT ALL to be useful?)  If this is indeed the
>> case, maybe we should amend the docs not only to explain this behavior
>> but also to warn against the construct.
>>
>
> I think that's the way it is defined (but I agree, I don't remember ever
> seeing a use for it)
>
>
>>
>> [*] I didn't try terribly hard, but couldn't actually find where the
>> behavior is defined.  What I have on hand is a draft of SQL:2011 where
>> this appears to be defined in 7.13  but I was unable
>> to find the rules for set operations.  It refers to 9.12 Grouping
>> operations but that defines conformance rules only.
>>
>> --
>>
>
> I may have a different version but I see:
>

That is in:
7.13 
General Rules 3, Case b:

>
> iii)
> T contains the following rows:
> 1) Let R be a row that is a duplicate of some row in ET1 or of some row in
> ET2 or both. Let
> m be the number of duplicates of R in ET1 and let n be the number of
> duplicates of R in
> ET2, where m ≥ 0 and n ≥ 0.
> 2) If DISTINCT is specified or implicit, then 
> ...
>
> 3) If ALL is specified, then
> Case:
> A) If UNION is specified, then the number of duplicates of R that T
> contains is (m + n).
> B) If EXCEPT is specified, then the number of duplicates of R that T
> contains is the
> maximum of (m – n) and 0 (zero).
> C) If INTERSECT is specified, then the number of duplicates of R that T
> contains is the
> minimum of m and n.
>
>


Re: Documentation of EXCEPT ALL may have a bug

2018-02-10 Thread Pantelis Theodosiou
On Sat, Feb 10, 2018 at 11:59 AM, Alvaro Herrera 
wrote:

> PG Doc comments form wrote:
>
> > create table t(x int);
> > create table u(x int);
> >
> > insert into t values (1), (2), (2), (3), (3);
> > insert into u values (1), (2);
> >
> > select * from t except all select * from u;
>
> >  x
> > ---
> >  3
> >  3
> >  2
> > (3 rows)
>
> I find this pretty odd behavior.  Is this not an outright bug?  I
> haven't read the SQL standard on this matter[*], but if they define
> EXCEPT ALL to work like this, then it seems pretty useless.  (Maybe they
> just didn't intend EXCEPT ALL to be useful?)  If this is indeed the
> case, maybe we should amend the docs not only to explain this behavior
> but also to warn against the construct.
>

I think that's the way it is defined (but I agree, I don't remember ever
seeing a use for it)


>
> [*] I didn't try terribly hard, but couldn't actually find where the
> behavior is defined.  What I have on hand is a draft of SQL:2011 where
> this appears to be defined in 7.13  but I was unable
> to find the rules for set operations.  It refers to 9.12 Grouping
> operations but that defines conformance rules only.
>
> --
>

I may have a different version but I see:

iii)
T contains the following rows:
1) Let R be a row that is a duplicate of some row in ET1 or of some row in
ET2 or both. Let
m be the number of duplicates of R in ET1 and let n be the number of
duplicates of R in
ET2, where m ≥ 0 and n ≥ 0.
2) If DISTINCT is specified or implicit, then 
...

3) If ALL is specified, then
Case:
A) If UNION is specified, then the number of duplicates of R that T
contains is (m + n).
B) If EXCEPT is specified, then the number of duplicates of R that T
contains is the
maximum of (m – n) and 0 (zero).
C) If INTERSECT is specified, then the number of duplicates of R that T
contains is the
minimum of m and n.


Re: Documentation of EXCEPT ALL may have a bug

2018-02-10 Thread Alvaro Herrera
PG Doc comments form wrote:

> create table t(x int);
> create table u(x int);
> 
> insert into t values (1), (2), (2), (3), (3);
> insert into u values (1), (2);
> 
> select * from t except all select * from u;

>  x 
> ---
>  3
>  3
>  2
> (3 rows)

I find this pretty odd behavior.  Is this not an outright bug?  I
haven't read the SQL standard on this matter[*], but if they define
EXCEPT ALL to work like this, then it seems pretty useless.  (Maybe they
just didn't intend EXCEPT ALL to be useful?)  If this is indeed the
case, maybe we should amend the docs not only to explain this behavior
but also to warn against the construct.

[*] I didn't try terribly hard, but couldn't actually find where the
behavior is defined.  What I have on hand is a draft of SQL:2011 where
this appears to be defined in 7.13  but I was unable
to find the rules for set operations.  It refers to 9.12 Grouping
operations but that defines conformance rules only.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Documentation of EXCEPT ALL may have a bug

2018-02-09 Thread PG Doc comments form
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/10/static/queries-union.html
Description:

I believe that the documented behavior of EXCEPT is not in agreement
with Postgres behavior (I'm using Postgres 9.5). The documents say:

EXCEPT returns all rows that are in the result of query1 but not
in the result of query2. (This is sometimes called the difference
between two queries.) Again, duplicates are eliminated unless
EXCEPT ALL is used.

Here is a test script:

drop table if exists t;
drop table if exists u;

create table t(x int);
create table u(x int);

insert into t values (1), (2), (2), (3), (3);
insert into u values (1), (2);

select * from t except select * from u;
select * from t except all select * from u;

And here is the output:

DROP TABLE
DROP TABLE
CREATE TABLE
CREATE TABLE
INSERT 0 5
INSERT 0 2

 x 
---
 3
(1 row)

 x 
---
 3
 3
 2
(3 rows)

The output from EXCEPT matches the documented behavior.

The output from EXCEPT ALL makes sense to me, but I think it is at
odds with the documentation: "EXCEPT returns all rows that are in the
result of query1 but not in the result of query2." This general
statement is then modified by the discussion of EXCEPT ALL. That first
sentence does not deal with duplicates in the input tables, and just
discusses set membership. Each occurrence of (3) in query1 is
therefore kept.  For EXCEPT, the duplicates are eliminated (yielding
the output [3]), and EXCEPT ALL should therefore yield [3, 3].  In the
EXCEPT ALL case, both occurrences of (2) in query1 should be
eliminated by the one occurrence in query2. I think this is a fair
interpretation based on the wording.

To match the observed behavior, I think that the description of EXCEPT
ALL needs to be modified to something like this:

EXCEPT ALL returns those rows of query1 in excess of matching rows in
query2, as well as rows of query1 that have no match in query2.