On 12/18/2014 01:02 AM, Peter Geoghegan wrote:
On Wed, Dec 17, 2014 at 1:12 PM, Heikki Linnakangas
<hlinnakan...@vmware.com> wrote:
Now, let's imagine a table like this:

CREATE TABLE persons (
   username text unique,
   real_name text unique,
   data text
);

Is there any way to specify both of those constraints, so that the insertion
is IGNOREd if it violates either one of them? If you try to do:

INSERT INTO persons(username, real_name, data)
VALUES('foobar', 'foo bar')
ON CONFLICT (username, real_name) IGNORE;

It will fail because there is no unique index on (username, real_name). In
this particular case, you could leave out the specification, but if there
was a third constraint that you're not expecting to conflict with, you would
want violations of that constraint to still throw an error. And you can't
leave out the specification with ON CONFLICT UPDATE anyway.

Good point.

For the IGNORE case: I guess the syntax just isn't that flexible. I
agree that that isn't ideal.

It should be simple to allow multiple key specifications:

INSERT INTO persons (username, real_name, data)
VALUES('foobar', 'foo bar')
ON CONFLICT (username), (real_name) IGNORE;

It's a rather niche use case, but might as well support it for the sake of completeness.

For the UPDATE case: Suppose your example was an UPDATE where we
simply assigned the excluded.data value to the data column in the
auxiliary UPDATE's targetlist. What would the user really be asking
for with that command, at a really high level? It seems like they
might actually want to run two UPSERT commands (one for username, the
other for real_name), or rethink their indexing strategy - in
particular, whether it's appropriate that there isn't a composite
unique constraint on (username, real_name).

Now, suppose that by accident or by convention it will always be
possible for a composite unique index to be built on (username,
real_name) - no dup violations would be raised if it was attempted,
but it just hasn't been and won't be. In other words, it's generally
safe to actually pretend that there is one. Then, surely it doesn't
matter if the user picks one or the other unique index. It'll all work
out when the user assigns to both in the UPDATE targetlist, because of
the assumed convention that I think is implied by the example. If the
convention is violated, at least you get a dup violation letting you
know (iff you bothered to assign). But I wouldn't like to encourage
that pattern.

I think that the long and the short of it is that you really ought to
have one unique index as an arbiter in mind when writing a DML
statement for the UPDATE variant. Relying on this type of convention
is possible, I suppose, but ill-advised.

Another thought is that you might want to specify a different action depending on which constraint is violated:

INSERT INTO persons (username, real_name, data)
VALUES('foobar', 'foo bar')
ON CONFLICT (username) IGNORE
ON CONFLICT (real_name) UPDATE ...;

Although that leaves the question of what to do if both are violated. Perhaps:

INSERT INTO persons (username, real_name, data)
VALUES('foobar', 'foo bar')
ON CONFLICT (username, real_name) IGNORE
ON CONFLICT (real_name) UPDATE username = excluded.username;
ON CONFLICT (username) UPDATE real_name = excluded.real_name;

5. What if there are multiple unique indexes with the same columns, but
different operator classes?

I thought about that. I am reusing a little bit of the CREATE INDEX
infrastructure for raw parsing, and for a small amount of parse
analysis (conveniently, this makes the command reject things like
aggregate functions with no additional code - the error messages only
mention "index expressions", so I believe that's fine). This could
include an opclass specification, but right now non-default opclasses
are rejected during extra steps in parse analysis, for no particular
reason.

I could easily have the unique index inference specification accept a
named opclass, if you thought that was important, and you thought
naming a non-default opclass by name was a good SQL interface. It
would take only a little effort to support non-default opclasses.

It's a little weird to mention an opclass by name. It's similar to naming an index by name, really. How about naming the operator? For an exclusion constraint, that would be natural, as the syntax to create an exclusion constraint in the first place is "EXCLUDE USING gist (c WITH &&)"

Naming the index by columns makes sense in most cases, and I don't like specifying the index's name, but how about allowing naming a constraint? Indexes are just an implementation detail, but constraints are not. Unique and exclusion constraints are always backed by an index, so there is little difference in practice, but I would feel much more comfortable mentioning constraints by name than indexes.

Most people would list the columns, but if there is a really bizarre constraint, with non-default opclasses, or an exclusion constraint, it's probably been given a name that you could use.


In theory, with the promise tuple approach to locking, you don't necessarily even need an index to back up the constraint. You could just do a sequential scan of the whole table to see if there are any conflicting rows, then insert the row, and perform another scan to see if any conflicting rows appeared in the meantime. Performance would suck, and there is no guarantee that another backend doesn't do a regular INSERT into to the table that violates the imaginary constraint, so this is pretty useless in practice. So probably better to not allow it.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to