Re: [HACKERS] Feature suggestion: ON CONFLICT DO NOTHING RETURNING which returns existing row data

2016-10-06 Thread Tom Dunstan

> On 5 Oct 2016, at 8:11 PM, Pantelis Theodosiou  wrote:
> 
> This can be solved by chaining modifying CTEs. 
> 
> Something like this (not tested)  that can work with multiple rows inserted:

Thanks for the suggestion, but it was actually slower than our current 
implementation, I believe due to always looking up t1’s id in that join rather 
than only doing it when we didn’t get an id back from the insert. My hope with 
this feature suggestion / request was that we wouldn’t have to do that 
subsequent lookup at all, as pg would just give it back to us.

Maybe it would be a win if we were inserting multiple rows, but this code is 
actually in a trigger on a dummy table that we COPY data in to - thus it can’t 
be rewritten as a rule or a multi-row insert like that.

Thanks

Tom



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Feature suggestion: ON CONFLICT DO NOTHING RETURNING which returns existing row data

2016-10-05 Thread Pantelis Theodosiou
This can be solved by chaining modifying CTEs.

Something like this (not tested)  that can work with multiple rows inserted:


WITH
  vals (bk1, bk2, other_t1_columns, other_t2_columns) AS
( VALUES (bk1val, bk2val, other_t1_values, other_t2_values),
 (bk1val, bk2val, other_t1_values, other_t2_values)
),
  ins_t1 AS
( INSERT INTO t1 (bk1, bk2, other columns)
  SELECT bk1, bk2, other_t1_columns
  FROM vals
  ON CONFLICT (bk1val, bk2val) DO NOTHING
  RETURNING id, bk1, bk2
)
INSERT INTO t2 (t1_id, other_t2_columns)
SELECT
COALESCE(t1.id, ins_t1,id),
val.bk1, val.bk2, val.other_t2_columns
FROM vals
  LEFT JOIN ins_t1 ON (vals.bk1, vals.bk2) = (ins_t1.bk1, ins_t1.bk2)
  LEFT JOIN t1 ON (vals.bk1, vals.bk2) = (t1.bk1, t1.bk2)
 ;

On Wed, Oct 5, 2016 at 1:53 AM, Tom Dunstan  wrote:

> Hi all
>
> We recently moved to using 9.5 and were hoping to use the new upsert
> functionality, but unfortunately it doesn’t quite do what we need.
>
> Our setup is something like this:
>
> CREATE TABLE t1 (
>   id BIGSERIAL NOT NULL PRIMARY KEY,
>   bk1 INT,
>   bk2 UUID
>   — other columns
> );
> CREATE UNIQUE INDEX t1_bk ON t1 (bk1, bk2);
>
> CREATE TABLE t2 (
>   t1_id BIGINT NOT NULL REFERENCES t1
>  — other stuff
> );
>
> Data comes in as inserts of one tuple each of t1 and t2. We expect inserts
> to t1 to be heavily duplicated. That is, for stuff coming in we expect a
> large number of rows to have duplicate (bk1, bk2), and we wish to discard
> those, but not discard the t2 tuple - those should always be inserted and
> reference the correct t1 record.
>
> So we currently have an insertion function that does this:
>
> BEGIN
>   INSERT INTO t1 (bk1, bk2, other columns)
>   VALUES (bk1val, bk2val, other values)
>   RETURNING id
>   INTO t1_id;
> EXCEPTION WHEN unique_violation THEN
>   SELECT id
>   FROM t1
>   WHERE bk1 = bk1val AND bk2 = bk2val
>   INTO t1_id;
> END;
>
> INSERT INTO t2(t1_id, other columns) VALUES(t1_id, other values);
>
> We were hoping that we’d be able to do something like this:
>
> INSERT INTO t1 (bk1, bk2, other columns)
>   VALUES (bk1val, bk2val, other values)
>   ON CONFLICT (bk1val, bk2val) DO NOTHING
>   RETURNING id
>   INTO t1_id;
> INSERT INTO t2(t1_id, other columns) VALUES(t1_id, other values);
>
> But unfortunately it seems that the RETURNING clause returns null when
> there’s a conflict, rather than the existing row’s value.
>
> I understand that there is ambiguity if there were multiple rows that were
> in conflict. I think this sort of functionality really only makes sense
> where the conflict target is a unique constraint, so IMO it would make
> sense to only support returning columns in that case.
>
> I imagine that this would be possible to do more efficiently than the
> subsequent query that we are currently doing given that postgres has
> already found the rows in question, in the index at least. I have no idea
> how hard it would actually be to implement though. FWIW my use-case would
> be supported even if this only worked for indexes where the to-be-returned
> columns were stored in the index using Anastasia’s covering + unique index
> patch, when that lands.
>
> Thoughts?
>
> Tom
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


[HACKERS] Feature suggestion: ON CONFLICT DO NOTHING RETURNING which returns existing row data

2016-10-04 Thread Tom Dunstan
Hi all

We recently moved to using 9.5 and were hoping to use the new upsert 
functionality, but unfortunately it doesn’t quite do what we need.

Our setup is something like this:

CREATE TABLE t1 (
  id BIGSERIAL NOT NULL PRIMARY KEY,
  bk1 INT,
  bk2 UUID
  — other columns
);
CREATE UNIQUE INDEX t1_bk ON t1 (bk1, bk2);

CREATE TABLE t2 (
  t1_id BIGINT NOT NULL REFERENCES t1
 — other stuff
);

Data comes in as inserts of one tuple each of t1 and t2. We expect inserts to 
t1 to be heavily duplicated. That is, for stuff coming in we expect a large 
number of rows to have duplicate (bk1, bk2), and we wish to discard those, but 
not discard the t2 tuple - those should always be inserted and reference the 
correct t1 record.

So we currently have an insertion function that does this:

BEGIN
  INSERT INTO t1 (bk1, bk2, other columns)
  VALUES (bk1val, bk2val, other values)
  RETURNING id
  INTO t1_id;
EXCEPTION WHEN unique_violation THEN
  SELECT id
  FROM t1
  WHERE bk1 = bk1val AND bk2 = bk2val
  INTO t1_id;
END;

INSERT INTO t2(t1_id, other columns) VALUES(t1_id, other values);

We were hoping that we’d be able to do something like this:

INSERT INTO t1 (bk1, bk2, other columns)
  VALUES (bk1val, bk2val, other values)
  ON CONFLICT (bk1val, bk2val) DO NOTHING
  RETURNING id
  INTO t1_id;
INSERT INTO t2(t1_id, other columns) VALUES(t1_id, other values);

But unfortunately it seems that the RETURNING clause returns null when there’s 
a conflict, rather than the existing row’s value.

I understand that there is ambiguity if there were multiple rows that were in 
conflict. I think this sort of functionality really only makes sense where the 
conflict target is a unique constraint, so IMO it would make sense to only 
support returning columns in that case.

I imagine that this would be possible to do more efficiently than the 
subsequent query that we are currently doing given that postgres has already 
found the rows in question, in the index at least. I have no idea how hard it 
would actually be to implement though. FWIW my use-case would be supported even 
if this only worked for indexes where the to-be-returned columns were stored in 
the index using Anastasia’s covering + unique index patch, when that lands.

Thoughts?

Tom



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers