>>> Jeff Davis <pg...@j-davis.com> wrote: > On Mon, 2009-01-26 at 15:46 -0600, Kevin Grittner wrote: >> After the COMMIT succeeds, the locks from Session1 are released. >> Session2 acquires its update lock and reads row 2, finds that it >> doesn't match its update criteria, downgrades the lock to shared, >> acquires an update lock on row 3, finds that it does match the >> selection criteria, upgrades the lock to exclusive, updates it, >> acquires and update lock on row 4 finds that it doesn't match the >> update criteria, downgrades the lock to shared, hits the end of table, >> releases the shared locks. > > This is the part I'm having a problem with. This depends on row 3 being > read after row 2. If that weren't the case (say, with a more complex > update and a more complex search criteria), then the index scan would > have already passed by the value and would never know that it was > updated to a value that does match the search criteria. I think you're missing a fundamental point -- in Sybase, before a row or range is read it is blocked against update by other transactions until after the reading statement completes; before a row or range is updated it is blocked against another transaction even reading it. (For serializable transactions the locks are all held until commit or rollback.) So, if an index scan for S2 had already passed a certain point and blocked on an update by S1, and then S1 tried to update any part of what S2 had read, there would be a deadlock and one of these transactions would be rolled back with a serialization error. > Data: > i j > -------- > 1 20 > 2 40 > 3 50 > 4 80 > > S1: > BEGIN; > UPDATE a SET j = (j - 10) WHERE i = 2 OR i = 3; S1 holds locks that prevent any other transaction reading anything about these two rows, including index entries. > S2: > BEGIN; > UPDATE a SET j = j + 100 WHERE j = 10 or j = 40; > -- Here, the index scan is already past j=10 by the time > -- it blocks on a concurrently-updated tuple Assuming an index on j, S2 will block when it tries to read a modified range, so yeah, it might read j=10 and find nothing, then attempt to read j=40 and block. (Of course, in reality if the table had four rows and the database had current statistics, these would all be done with table scans and the locks would effectively be the same as table locks. That being fairly obvious behavior, I'll stick to the supposition that it's operating with row locks.) > S1: > COMMIT; The index entries and data rows are updated in place. Locks are then released. S2 now reads rows j=40 and finds i=3 as the only match. (No "snapshot" exists. No old version of the tuple. It just sees whatever is there when the locks are released.) > S2: > COMMIT; > > In PostgreSQL this sequence results in: > i | j > ---+---- > 1 | 20 > 4 | 80 > 2 | 30 > 3 | 40 > > The second update matched no tuples at all. In Sybase (and similar databases), the result would be: i | j ---+----- 1 | 20 2 | 30 3 | 140 4 | 80 >> Let me restate -- I don't propose that PostgreSQL implement this >> locking scheme. I think it can and should do better in approaching >> compliance with the standard, and with ACID properties, without >> compromising concurrency and performance to the degree required by >> this sort of locking and blocking. > > I think Greg has it right: without predicate locking we can't really > achieve the behavior you're expecting. So how would we better approach > the semantics you want without it? Well, this thread was talking about dealing with situations where queries using FOR UPDATE/FOR SHARE return something other than what is requested, or results based on viewing only part of what was committed by another transaction. My feeling is that we should be looking harder at recognizing these cases and rolling back a transaction with a serialization failure before returning bad data. When you are using these clauses you are already vulnerable to deadlocks. This doesn't seem to me to be that different from other situations where people have said "It's easy to return results quickly if you don't care whether they're accurate." Regarding the broader issues -- during discussion of documentation for the anomalies in snapshot isolation I was made aware of recent work, published by the ACM last year, which provides techniques for a more general and comprehensive solution. This has already been implemented in at least two other MVCC databases, although these changes haven't made it to a production release of anything yet. I've been trying to wait until 8.4 hits beta testing to open a discussion of this. Basically, though, this work outlines a way to provide real serializable behavior in an MVCC database without any more blocking than PostgreSQL already has. Stay tuned for a discussion of this once 8.4 is in beta. -Kevin
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers