On Sat, Sep 14, 2013 at 08:58:32PM +0200, Andres Freund wrote:
> On 2013-09-14 11:25:52 -0700, Kevin Grittner wrote:
> > Andres Freund <and...@2ndquadrant.com> wrote:
> > > But both arrays don't have the same binary representation since
> > > the former has a null bitmap, the latter not. So, if you had a
> > > composite type like (int4[]) and would compare that without
> > > invoking operators you'd return something false in some cases
> > > because of the null bitmaps.
> > 
> > Not for the = operator.  The new "identical" operator would find
> > them to not be identical, though.
> 
> Yep. And I think that's a problem if exposed to SQL. People won't
> understand the hazards and end up using it because its faster or
> somesuch.

The important question is whether to document the new operator and/or provide
it under a guessable name.  If we give the operator a weird name, don't
document it, and put an "internal use only" comment in the catalogs, that is
essentially as good as hiding this feature at the SQL level.

I'm of two minds on that question.  On the one hand, MV maintenance is hardly
the first use case for an identity operator.  Any replication system or user
space materialized view implementation might want this.  On the other hand,
offering it for the record type exclusively is surprising.  It's also
surprising how records with different numbers of dropped columns can be found
identical, even though a record column within the top-level record is not
permitted to vary that way.

Supposing a decision to document the operator, a second question is whether
"===" is the right name:

On Thu, Sep 12, 2013 at 03:27:27PM -0700, Kevin Grittner wrote:
> The identical (===) and not identical (!==) operator names were
> chosen because of a vague similarity to the "exactly equals"
> concepts in JavaScript and PHP, which use that name.  The semantics
> aren't quite the same, but it seemed close enough not to be too
> surprising.

Maybe.  If we were mimicking the JavaScript/PHP operator of the same name,
'1.0'::numeric === '1.00'::numeric would return true, and '1'::int4 ===
'1'::int2 would return false.  The patch submitted returns false for the first
example and raises a "cannot compare dissimilar column types" error when
comparing an int4 record column to an int2 record column.

> I think, introducing a noticeable amount of infrastructure for this just
> because of citext is a bad idea.
> At some point we need to replace citext with proper case-insensitive
> collation support - then it really might become necessary.

citext is just one example.  Others include '1.0'::numeric = '1.00'::numeric
and '30 day'::interval = '1 mon'::interval.

> > (2)  Require every data type which can be used in a matview to
> > implement some new operator or function for "identical".  Perhaps
> > that could be mitigated to only implementat it if equal values can
> > have user-visible differences.
> 
> That basically would require adding a new member to btree opclasses that
> btrees don't need themselves... Hm.

That's the wrong way to model it.  Identity is a particular kind of equality,
which is to say a particular hash opclass.  It so happens that memcmp() is the
logical way to implement most/all identity operators, so we can get a btree
opclass as well.

Type-specific identity operators seem like overkill, anyway.  If we find that
meaningless variations in a particular data type are causing too many false
non-matches for the generic identity operator, the answer is to make the
functions generating datums of that type settle on a canonical form.  That
would be the solution for your example involving array null bitmaps.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to