[HACKERS] Nulls, arrays, records, IS NULL, IS DISTINCT FROM

Tom Lane Fri, 29 Sep 2006 09:53:41 -0700

Following up yesterday's discussion, I've been studying the SQL spec for
<null predicate> and <distinct predicate>, and it seems a bit
inconsistent.


The rules for <distinct predicate> make it clear that you are supposed
to "drill down" into row and array values to determine distinctness.
SQL99 has

            a) If the declared type of X or Y is an array type, then "X IS
              DISTINCT FROM Y" is effectively computed as follows:

              i) Let NX be the number of elements in X; let NY be the number
                 of elements in Y.

             ii) Let EX(i) be the i-th element of X; let EY(i) be the i-th
                 element of Y.

            iii) Case:

                 1) If NX is not equal to NY, then "X IS DISTINCT FROM Y" is
                   true.

                 2) If NX equals zero and NY equals zero, then "X IS
                   DISTINCT FROM Y" is false.

                 3) If "EX(i) IS DISTINCT FROM EY(i)" is false for all i
                   between 1 (one) and NX, then "X IS DISTINCT FROM Y" is
                   false.

                 4) Otherwise, "X IS DISTINCT FROM Y" is true.

SQL2003 has completely rewritten the text but the meaning seems the
same.  I suppose we want to generalize the NX/NY business to say
"if the array bounds are not identical then the arrays are distinct".
We are clearly getting this wrong since the introduction of nulls in
arrays, but I'll go fix that.

Similarly, given two row expressions, distinctness is determined
field-wise: X and Y are distinct if any two corresponding fields
are distinct.  We are currently getting this correct only for
the case of parse-time ROW expressions, ie
        ROW(x,y,z) IS [NOT] DISTINCT FROM ROW(xx,yy,zz)
This is pretty much analogous to the case Teodor noted yesterday
for IS NULL: it's not being done in gram.y but it's still being
done much too early.  We need to be able to do it in the executor
to handle situations where a row value is coming from a function
or some other source that's not disassemblable at parse time.

What's bothering me is that for "foo IS [NOT] NULL", the spec clearly
prescribes drilling down into a rowtype value to examine the individual
fields, but I can't find any language that prescribes the same for
arrays.  Is this intentional, or an oversight?  In particular, the
spec says
        ROW(1,2,NULL) IS NOT NULL
is false, because the row fields must be *all* not null to make it true.
But it's very unclear whether
        ARRAY[1,2,NULL] IS NOT NULL
should be false on the same reasoning.  Right now, we respond "true" on
the grounds that the array object as-a-whole isn't null, without
examining its contents.

Comments?  Does anyone see any guidance in the spec?  If there is none,
which behavior do we think is most useful/consistent?

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

[HACKERS] Nulls, arrays, records, IS NULL, IS DISTINCT FROM

Reply via email to