On 6 October 2014 15:04, Heikki Linnakangas <hlinnakan...@vmware.com> wrote: > On 10/06/2014 04:44 PM, Simon Riggs wrote: >> >> On 6 October 2014 13:21, Heikki Linnakangas <hlinnakan...@vmware.com> >> wrote: >> >>>> My understanding of what you're saying is that if >>>> >>>> * we have a table with >1 unique index >>>> * and we update the values of the uniquely index columns (e.g. PK >>>> update) >>>> * on both of the uniquely indexed column sets >>>> then we get occaisonal deadlocks, just as we would do using current >>>> UPDATE/INSERT. >>> >>> >>> >>> Right. To be precise: you don't need to update both of the columns in the >>> same transaction, it's enough that some of the concurrent transactions >>> update one column, while other transactions update the other column. >> >> >> CREATE TABLE foo >> (id1 integer not null primary key >> ,id2 integer not null unique >> ,val integer); >> >> Given the table above, which one do we mean? >> >> 1. When we mix UPDATE foo SET id2 = X WHERE id1 = Y; and UPDATE foo >> SET id1 = Y WHERE id2 = X; we can deadlock >> 2. When we mix UPDATE foo SET val = Z WHERE id1 = Y; and UPDATE foo >> SET val = W WHERE id2 = X; we can deadlock >> >> (2) is a common use case, (1) is a very rare use case and most likely >> a poor design > > > Well, at least one of the statements has to be an UPSERT, and at least one > of them has to update a column with a unique constraint on it. This pair of > transactions could deadlock, for example: > > Transaction 1: > INSERT INTO foo VALUES (Y, X, Z) ON CONFLICT IGNORE; > Transaction 2: > UPDATE foo SET id2 = X WHERE id1 = Y; > > That's made-up syntax, but the idea is that the first transaction attempts > to insert a row with values id1=Y, id2=X, val=Z. If that fails because of a > row with id1=Y or id2=X already exists, then it's supposed to do nothing.
Lets look at a real world example CREATE TABLE citizen (ssn integer not null primary key ,email text not null unique ,tax_amount decimal); Transaction 1: INSERT INTO citizen VALUES (555123456, 'si...@2ndquadrant.com', 1000.00) ON CONFLICT IGNORE; Transaction 2: UPDATE foo SET email = 'si...@2ndquadrant.com', tax_amount = 1000.00 WHERE ssn = 555123456; OK, now I understand how a deadlock is possible. Thanks for your help. Again I note that there is no isolation test that refers to this situation, nor any documentation, internal or user facing that describes the situation or its workaround. My feeling is that is an unlikely situation. To have two actors concurrently updating the same data AND in different ways from two different angles. How likely is it that we would issue those two transactions concurrently AND we would be concerned because this caused an error? If the tax_amount was the same, it wouldn't matter that one failed. If the tax_amount differeed, we would want to know about the error, not accept it in silence. Are any of those things substantially worse than the current situation using INSERT/UPDATE loops? It might be nice if the above never deadlocked. What is the price of ensuring that in terms of code maintainability and performance? What would this do to COPY performance? >> If the user wishes to protect against such deadlocks they retain the >> option to use row locking. Yes? > > > Sorry, I didn't understand that. Row locking? I think that thought doesn't apply here. > In general, this is of course a lot easier to implement if we restrict it so > that it only works in some limited cases. That may be fine, but then we have > to be able to document clearly what the limitations are, and throw an error > if you violate those limitations. Seems reasonable. My point here is to establish that... a) there are multiple ways to implement the UPSERT feature and none should be thrown away too quickly b) the current patch does not implement something we all agree on yet c) not all requirements have been properly documented, understood or agreed by hackers If we want to move forwards we need to agree things based upon clarity and real world usage. It may be that people on reading this now believe Peter's HW locking approach is the best. I'm happy to go with consensus. My feeling is that substantially more work is required on explaining the details around multiple unique index constraints, trigger behaviour and various other corner cases. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers