Re: [GENERAL] Why is unique constraint needed for upsert? (treat atomicity as optional)

2014-07-23 Thread Seamus Abshere

On 7/23/14 3:40 PM, Tom Lane wrote:

John R Pierce pie...@hogranch.com writes:

On 7/23/2014 10:21 AM, Seamus Abshere wrote:

Upsert is usually defined [1] in reference to a violating a unique key:
Is this theoretically preferable to just looking for a row that
matches certain criteria, updating it if found or inserting otherwise?



what happens when two connections do this more or less concurrently, in
transactions?


For the OP's benefit --- the subtext John left unstated is that the
unique-key mechanism has already solved the problem of preventing
concurrent updates from creating duplicate keys.  If we build a version of
UPSERT that doesn't rely on a unique index then it'll need some entirely
new mechanism to prevent concurrent key insertion.  (And if you don't care
about concurrent cases, you don't really need UPSERT ...)


hi all,

What if we treat atomicity as optional? You could have extremely 
readable syntax like:



-- no guarantees, no index required
UPSERT age = 5 INTO dogs WHERE name = 'Jerry';



-- optionally tell us how you want to deal with collision
UPSERT age = 3 INTO dogs WHERE name = 'Jerry' ON DUPLICATE KEY KEEP NEWEST;
UPSERT age = 5 INTO dogs WHERE name = 'Jerry' ON DUPLICATE KEY KEEP NEWEST;



-- only **require** (by throwing an error) a unique index or a locked table for 
queries like
UPSERT age = age+1 INTO dogs WHERE name = 'Jerry';


Obviously this flies in the face of what most people say the 
fundamental Upsert property is [1]



At READ COMMITTED isolation level, you should always get an atomic insert or 
update [1]


I just think there are a lot of non-concurrent bulk loading and 
processing workflows that could benefit from the performance advantages 
of upsert (one trip to database).


Best, thanks,
Seamus

[1] http://www.pgcon.org/2014/schedule/events/661.en.html

--
Seamus Abshere, SCEA
https://github.com/seamusabshere


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is unique constraint needed for upsert? (treat atomicity as optional)

2014-07-23 Thread Tom Lane
Seamus Abshere sea...@abshere.net writes:
 On 7/23/14 3:40 PM, Tom Lane wrote:
 For the OP's benefit --- the subtext John left unstated is that the
 unique-key mechanism has already solved the problem of preventing
 concurrent updates from creating duplicate keys.

 What if we treat atomicity as optional?

You'll get a *much* warmer response to that kind of suggestion from
MongoDB or MySQL, no doubt.  PG is not in the business of optional
data integrity.

 I just think there are a lot of non-concurrent bulk loading and 
 processing workflows that could benefit from the performance advantages 
 of upsert (one trip to database).

What exactly is your argument for supposing that an UPSERT without an
underlying index would perform so well?  It seems much more likely
that it'd suck, because of having to do full-table scans to look
for existing rows.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is unique constraint needed for upsert? (treat atomicity as optional)

2014-07-23 Thread John R Pierce

On 7/23/2014 1:45 PM, Seamus Abshere wrote:


What if we treat atomicity as optional? You could have extremely 
readable syntax like:


atomicity is not and never will be optional in PostgreSQL.


-- no guarantees, no index required
UPSERT age = 5 INTO dogs WHERE name = 'Jerry'; 


and if there's several rows with name='Jerry', you'd want to update them 
ALL ?  if name isn't indexed, this will, as Tom suggests, require a FULL 
table scan, and it still will have issues with concurrency (connection 
scans table, finds nothing, starts to insert, user 2 scans table, finds 
nothing, starts to insert, poof, now we have two records?!?).   If name 
*is* indexed and unique, this collision will cause a error at commit for 
at least one of those connections.


--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is unique constraint needed for upsert? (treat atomicity as optional)

2014-07-23 Thread Seamus Abshere

On 7/23/14 6:03 PM, John R Pierce wrote:

On 7/23/2014 1:45 PM, Seamus Abshere wrote:

What if we treat atomicity as optional?


atomicity is not and never will be optional in PostgreSQL.


I'm wondering what a minimal definition of upsert could be - possibly 
separating concurrency handling out as a (rigorously defined) option for 
those who need it.



-- no guarantees, no index required
UPSERT age = 5 INTO dogs WHERE name = 'Jerry';


and if there's several rows with name='Jerry', you'd want to update them
ALL ?  if name isn't indexed, this will, as Tom suggests, require a FULL
table scan, and it still will have issues with concurrency


Ah, I was just saying, in terms of correctness, it seems to me that 
upsert shouldn't NEED a index to work, just like you don't need an index 
on name when you say WHERE name = 'Jerry' in SELECTs or INSERTS or 
UPDATES.


Appreciate the defense of data integrity in any case!!

Best,
Seamus

--
Seamus Abshere, SCEA
https://github.com/seamusabshere


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general