Re: [HACKERS] INSERT ... ON CONFLICT () SELECT

2017-06-19 Thread Matt Pulver
On Sun, Jun 18, 2017 at 9:21 PM, Peter Geoghegan  wrote:

> Returning rows with duplicate values seems rather unorthodox.
>

Ok, then option 2 it is.

In summary, this is what I am going to (attempt to) implement for the new
syntax:

INSERT ...
ON CONFLICT (...) DO SELECT
RETURNING ...


   1. Rows that are in conflict are made available to the RETURNING clause.
   In other words, it is like an idempotent "ON CONFLICT DO UPDATE".
   2. Similarly, insertion sets that would cause the error "ON CONFLICT DO
   UPDATE command cannot affect row a second time" if it were an "ON CONFLICT
   DO UPDATE" statement will also cause a similar error for "ON CONFLICT DO
   SELECT". This will prevent duplicate rows from being returned.
   3. Like an "ON CONFLICT DO UPDATE", the returned rows cannot be changed
   by another part of the wCTE, even if no actual insertions occurred.

Unless I have missed anything, I think all other issues have been
adequately addressed. Since there are no red lights, I shall proceed. :)

Best regards,
Matt


Re: [HACKERS] INSERT ... ON CONFLICT () SELECT

2017-06-18 Thread Matt Pulver
On Sat, Jun 17, 2017 at 9:55 PM, Peter Geoghegan <p...@bowt.ie> wrote:

> On Sat, Jun 17, 2017 at 7:49 AM, Matt Pulver <mpul...@unitytechgroup.com>
> wrote:
> > With the proposed "INSERT ... ON CONFLICT () SELECT" feature, the
> > get_or_create_id() function is simplified to:
>
> Are you locking the existing rows? Because otherwise, the
> determination that they're conflicting can become obsolete immediately
> afterwards. (I guess you would be.)
>

If that is required in order to return the rows in their conflicted state,
then yes.


> The problem with this design and similar designs is that presumably
> the user is sometimes projecting the conflicting rows with the
> intention of separately updating them in a wCTE. That might not work,
> because only ON CONFLICT doesn't use the MVCC snapshot, in order to
> ensure that an UPDATE is guaranteed when an INSERT cannot go ahead.
> That isn't what you're doing in the example you gave, but surely some
> users would try to do things like that, and get very confused.
>

Ultimately the proposed "INSERT ... ON CONFLICT () DO SELECT" syntax is
still an INSERT statement, not a SELECT, so a user should not expect rows
returned from it to be available for UPDATE/DELETE in another part of a
wCTE. Anyone who understands this behavior for an INSERT statement, let
alone the current "INSERT ... ON CONFLICT DO UPDATE" should not be too
surprised if the same thing applies to the new "INSERT ... ON CONFLICT DO
SELECT".


> I think that what you propose to do here would likely create a lot of
> confusion by mixing MVCC semantics with special UPSERT visibility
> semantics ("show me the latest row version visible to any possible
> snapshot for the special update") even without a separate UPDATE, in
> fact. Would you be okay if "id" appeared duplicated in the rows you
> project in your new syntax, even when there is a separate unique
> constraint on that column? I suppose that there is some risk of things
> like that today, but this would make the "sleight of hand" used by ON
> CONFLICT DO UPDATE more likely to cause problems.
>

 Good point. Here is an example using the example table from my previous
email:

INSERT INTO example (name) VALUES ('foo'), ('foo')
ON CONFLICT (name) DO SELECT
RETURNING *


Here are a couple options of how to handle this:

1) Return two identical rows (with the same id).
2) Produce an error, with error message:
"ERROR:  ON CONFLICT DO SELECT command cannot reference row a second time
HINT:  Ensure that no rows proposed for insertion within the same command
have duplicate constrained values."

This would be nearly identical to the existing error message that is
produced when running:

INSERT INTO example (name) VALUES ('foo'), ('foo')
ON CONFLICT (name) DO UPDATE SET value=1
RETURNING *


which gives the error message:
"ERROR:  ON CONFLICT DO UPDATE command cannot affect row a second time
HINT:  Ensure that no rows proposed for insertion within the same command
have duplicate constrained values."

Technically, an error doesn't *need* to be produced for the above "ON
CONFLICT DO SELECT" statement - I think it is still perfectly well-defined
to return duplicate rows as in option 1. Option 2 is more for the benefit
of the user who is probably doing something wrong by attempting to INSERT a
set of rows that violate a constraint. What do you think would be best?

Thank you for the discussion.

Best regards,
Matt


[HACKERS] INSERT ... ON CONFLICT () SELECT

2017-06-17 Thread Matt Pulver
Hello,

I am looking to add a new language feature that returns the rows that
conflict on an INSERT, and would appreciate feedback and guidance on this.

Here is an example.

To implement a get_or_create_id() function, this is how it must currently
be done:

CREATE TABLE example (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
value FLOAT);
CREATE UNIQUE INDEX ON example (name);

CREATE FUNCTION get_or_create_id(_name TEXT) RETURNS INT AS
$$
WITH get AS (
SELECT id FROM example WHERE name=_name
), new AS (
INSERT INTO example (name) VALUES (_name)
ON CONFLICT (name) DO NOTHING
RETURNING id
)
SELECT id FROM get
UNION ALL
SELECT id FROM new
$$
LANGUAGE sql;

SELECT get_or_create_id('foo'); -- 1
SELECT get_or_create_id('bar'); -- 2
SELECT get_or_create_id('foo'); -- 1


With the proposed "INSERT ... ON CONFLICT () SELECT" feature, the
get_or_create_id() function is simplified to:

CREATE FUNCTION get_or_create_id(_name TEXT) RETURNS INT AS
$$
INSERT INTO example (name) VALUES (_name)
ON CONFLICT (name) DO SELECT
RETURNING id
$$
LANGUAGE sql;


In the case of a CONFLICT, the selected rows are exactly those same rows
that would be operated on by an ON CONFLICT () DO UPDATE clause. These rows
are then made available to the RETURNING clause in the same manner. Just
like "DO NOTHING", the "DO SELECT" clause takes no arguments. It only makes
the conflicting rows available to the RETURNING clause.

Tom Lane has previously responded
 to a
similar request which was ill-defined, especially in the context of
exclusion constraints. I believe that by SELECTing exactly those same rows
that an UPDATE clause would on a CONFLICT, this becomes well-defined, even
with exclusion constraints.

Feedback/guidance is most welcome.

Best regards,
Matt