Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

Heikki Linnakangas Thu, 18 Dec 2014 07:01:04 -0800

On 12/18/2014 01:02 AM, Peter Geoghegan wrote:

On Wed, Dec 17, 2014 at 1:12 PM, Heikki Linnakangas
<[email protected]> wrote:

Now, let's imagine a table like this:


CREATE TABLE persons (
   username text unique,
   real_name text unique,
   data text
);

Is there any way to specify both of those constraints, so that the insertion
is IGNOREd if it violates either one of them? If you try to do:

INSERT INTO persons(username, real_name, data)
VALUES('foobar', 'foo bar')
ON CONFLICT (username, real_name) IGNORE;

It will fail because there is no unique index on (username, real_name). In
this particular case, you could leave out the specification, but if there
was a third constraint that you're not expecting to conflict with, you would
want violations of that constraint to still throw an error. And you can't
leave out the specification with ON CONFLICT UPDATE anyway.


Good point.

For the IGNORE case: I guess the syntax just isn't that flexible. I
agree that that isn't ideal.


It should be simple to allow multiple key specifications:

INSERT INTO persons (username, real_name, data)
VALUES('foobar', 'foo bar')
ON CONFLICT (username), (real_name) IGNORE;

It's a rather niche use case, but might as well support it for the sakeof completeness.

For the UPDATE case: Suppose your example was an UPDATE where we
simply assigned the excluded.data value to the data column in the
auxiliary UPDATE's targetlist. What would the user really be asking
for with that command, at a really high level? It seems like they
might actually want to run two UPSERT commands (one for username, the
other for real_name), or rethink their indexing strategy - in
particular, whether it's appropriate that there isn't a composite
unique constraint on (username, real_name).

Now, suppose that by accident or by convention it will always be
possible for a composite unique index to be built on (username,
real_name) - no dup violations would be raised if it was attempted,
but it just hasn't been and won't be. In other words, it's generally
safe to actually pretend that there is one. Then, surely it doesn't
matter if the user picks one or the other unique index. It'll all work
out when the user assigns to both in the UPDATE targetlist, because of
the assumed convention that I think is implied by the example. If the
convention is violated, at least you get a dup violation letting you
know (iff you bothered to assign). But I wouldn't like to encourage
that pattern.

I think that the long and the short of it is that you really ought to
have one unique index as an arbiter in mind when writing a DML
statement for the UPDATE variant. Relying on this type of convention
is possible, I suppose, but ill-advised.

Another thought is that you might want to specify a different actiondepending on which constraint is violated:


INSERT INTO persons (username, real_name, data)
VALUES('foobar', 'foo bar')
ON CONFLICT (username) IGNORE
ON CONFLICT (real_name) UPDATE ...;

Although that leaves the question of what to do if both are violated.Perhaps:


INSERT INTO persons (username, real_name, data)
VALUES('foobar', 'foo bar')
ON CONFLICT (username, real_name) IGNORE
ON CONFLICT (real_name) UPDATE username = excluded.username;
ON CONFLICT (username) UPDATE real_name = excluded.real_name;

5. What if there are multiple unique indexes with the same columns, but
different operator classes?


I thought about that. I am reusing a little bit of the CREATE INDEX
infrastructure for raw parsing, and for a small amount of parse
analysis (conveniently, this makes the command reject things like
aggregate functions with no additional code - the error messages only
mention "index expressions", so I believe that's fine). This could
include an opclass specification, but right now non-default opclasses
are rejected during extra steps in parse analysis, for no particular
reason.

I could easily have the unique index inference specification accept a
named opclass, if you thought that was important, and you thought
naming a non-default opclass by name was a good SQL interface. It
would take only a little effort to support non-default opclasses.

It's a little weird to mention an opclass by name. It's similar tonaming an index by name, really. How about naming the operator? For anexclusion constraint, that would be natural, as the syntax to create anexclusion constraint in the first place is "EXCLUDE USING gist (c WITH &&)"

Naming the index by columns makes sense in most cases, and I don't likespecifying the index's name, but how about allowing naming a constraint?Indexes are just an implementation detail, but constraints are not.Unique and exclusion constraints are always backed by an index, so thereis little difference in practice, but I would feel much more comfortablementioning constraints by name than indexes.

Most people would list the columns, but if there is a really bizarreconstraint, with non-default opclasses, or an exclusion constraint, it'sprobably been given a name that you could use.

In theory, with the promise tuple approach to locking, you don'tnecessarily even need an index to back up the constraint. You could justdo a sequential scan of the whole table to see if there are anyconflicting rows, then insert the row, and perform another scan to seeif any conflicting rows appeared in the meantime. Performance wouldsuck, and there is no guarantee that another backend doesn't do aregular INSERT into to the table that violates the imaginary constraint,so this is pretty useless in practice. So probably better to not allow it.


- Heikki



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

Reply via email to