Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Markus Wanner

Hi,

Quoting Tom Lane t...@sss.pgh.pa.us:

Not at the table level.  If you could lock only at the tuple level
maybe you'd have something


AFAIUI this is about the tuple level lock, yes.


but it seems like you can't find the
target tuples without having acquired a snapshot.


Maybe not *the* target tuple, but we could certainly find candidate  
target tuples. Of course it's impossible to determine visibility  
without a snapshot (and thus find *the* one).


But it seems to me it might suffice to optimistically pick a plausible  
tuple (i.e. determined by a candidate snapshot) and try to lock that.  
Only after we hold the lock, we get a real snapshot and re-check  
visibility of the tuple we've locked.


If it's the visible target tuple we want to update we are all fine, if  
not another transaction updated the tuple and we have to look for the  
new version of that tuple. As we've just taken a new snapshot *after*  
that other transaction updating the tuple of interest, we now see the  
*new* tuple and can continue with our transaction normally (instead of  
having to abort the transaction with a serialization failure).


So yes, to me this looks like a theoretically possible performance  
gain. It certainly only helps the very first tuple write. And it seems  
to apply to SERIALIZABLE transactions exclusively.


Another minor gotcha exists, though. There's another possible cause  
for the visibility check to fail: our initial pick with the candidate  
snapshot might have been wrong. In that unfortunate case we can  
continue as described, but it's worth mentioning that we were waiting  
for the wrong lock (i.e. a tuple that's not visible according to the  
real snapshot might have been one from an aborted transaction, for  
example). The candidate snapshot should thus be rather good, but  
that's not much of an issue, I think.


If we want to completely get rid of serialization failures in the  
first (writing) operation within a transaction, we'd have to repeat  
these steps after a visibility check fails. Meaning having to degrade  
the real snapshot acquired after the first lock to a candidate  
snapshot for the second tuple lock we try.


Maybe candidate snapshots is a good short name for such a feature?

Another line of thought: isn't this like READ COMMITTED for just the  
first operation in a SERIALIZABLE transaction?


Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Kevin Grittner
Tom Lane  wrote:
 Kevin Grittner  writes:
 Tom Lane  wrote:
 (Besides which the lock acquired by UPDATE isn't exclusive and
 wouldn't block anyway...)
 
 It blocks other UPDATEs.
 
 Not at the table level.
 
The question was about whether we could change the timing of when
we get the current locks, not about getting different locks than we
currently do.  Clearly, adding a table lock would make no sense.
 
 If you could lock only at the tuple level maybe you'd have
 something, but it seems like you can't find the target tuples
 without having acquired a snapshot.
 
That would be the fatal difference then.  InnoDB doesn't actually
keep old versions of a row; it generates them from undo records in
the log when needed, which might be why it was feasible to get the
lock before the snapshot there.  If we need the snapshot before we
can get the lock, it's not an optimization which is available to us.
 
Thanks,
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Kevin Grittner
Markus Wanner  wrote:
 Another line of thought: isn't this like READ COMMITTED for just
 the first operation in a SERIALIZABLE transaction?
 
Hmmm...  You have a point.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Kevin Grittner
Markus Wanner mar...@bluegap.ch wrote:
 
 Another line of thought: isn't this like READ COMMITTED for just
 the first operation in a SERIALIZABLE transaction?
 
I've mulled it over and I have two different logical proofs that
this is safe; if anyone is dubious I'd be happy to share.
 
This seems likely to be of significant benefit in some workloads,
and I can't see anywhere that it is likely to cost much.  Any
objections to adding this to the TODO list as a performance item?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Robert Haas
On Thu, Dec 17, 2009 at 10:05 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Markus Wanner mar...@bluegap.ch wrote:

 Another line of thought: isn't this like READ COMMITTED for just
 the first operation in a SERIALIZABLE transaction?

 I've mulled it over and I have two different logical proofs that
 this is safe; if anyone is dubious I'd be happy to share.

 This seems likely to be of significant benefit in some workloads,
 and I can't see anywhere that it is likely to cost much.  Any
 objections to adding this to the TODO list as a performance item?

I thought you concluded two emails ago that it wouldn't work for PG?
It's certainly not clear to me what exactly the TODO would be.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 Markus Wanner mar...@bluegap.ch wrote:

 Another line of thought: isn't this like READ COMMITTED for just
 the first operation in a SERIALIZABLE transaction?

 I've mulled it over and I have two different logical proofs that
 this is safe; if anyone is dubious I'd be happy to share.

 This seems likely to be of significant benefit in some workloads,
 and I can't see anywhere that it is likely to cost much.  Any
 objections to adding this to the TODO list as a performance item?
 
 I thought you concluded two emails ago that it wouldn't work for
 PG?  It's certainly not clear to me what exactly the TODO would
 be.
 
Tom's emails had me pretty convinced that this technique wouldn't
work in PostgreSQL, but Markus put a fresh perspective on it which
makes it seem relatively painless.  (Although, as is often the case,
my perspective may be naive.)
 
Basically, in a SERIALIZABLE transaction, if the first statement
which would require a snapshot would currently fail with ERROR: 
could not serialize access due to concurrent update we would
instead get a fresh snapshot and retry -- which is what we do in a
READ COMMITTED transaction.
 
One way of looking at this is that any transaction which fails with
a serialization error can be retried with a reasonable chance of
success.  There is no evidence of anything wrong with the
transaction itself, just that its actions conflicted with those of a
concurrent transaction.  For the case we're discussing, that other
transaction has now committed.  (We blocked waiting to see whether
it would commit or roll back.)  If this is the first statement which
needed a snapshot, retrying it with a new snapshot can't create any
conflicting views of the data.  We *could* view this sort of as an
automatic transaction retry in the limited situations where the
database engine can determine what to do.  (If there had been prior
statements, you can't really know that the current statement would
have been issued by the client had the prior statements been run
against a different snapshot.)  Where this view of things is a
little off is that explicit locks obtained earlier in the
transaction would still be held; we're not really starting the
*whole* transaction over.  While this doesn't seem a fatal flaw, it
does mean the other way of looking at it is a more technically
correct.
 
The other way of looking at it is that until a statement succeeds
with a given snapshot, you have not fixed your snapshot for the
serializable transaction.  A retry similar to what we do for READ
COMMITTED would just be part of obtaining the one snapshot used for
the SERIALIZABLE transaction -- it isn't fixed until that first
statement succeeds.
 
I'm assuming that this could be a fairly small change because we
already have code to do exactly the right thing for READ COMMITTED
transactions.  The logic to choose which way to handle the commit of
a transaction which held a competing lock would need to be modified
to use the READ COMMITTED lock on the first statement which obtains
a snapshot in a SERIALIZABLE transaction, and the snapshot for a
SERIALIZABLE transaction would not be fixed until the completion
of the first statement needing a snapshot.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Basically, in a SERIALIZABLE transaction, if the first statement
 which would require a snapshot would currently fail with ERROR: 
 could not serialize access due to concurrent update we would
 instead get a fresh snapshot and retry -- which is what we do in a
 READ COMMITTED transaction.

This sounds like a pretty horrid kluge.  For one thing, the statement
might already have done a great deal of work before you hit the failure.
(Admittedly, that work will be lost anyway if we abort, but it's not
a localized change to make it all happen all over again.)  Also,
aborting that statement without also losing any previously-acquired
locks would require establishing a hidden subtransaction, with ensuing
extra costs to be paid even when there isn't a failure.

I think you misunderstand how READ COMMITTED works; it does not change
the snapshot for the entire statement, it only follows the update chain
for a particular tuple that's been chosen for update or delete.
 
 I'm assuming that this could be a fairly small change 

It would not be.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Greg Stark
On Thu, Dec 17, 2009 at 3:39 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Basically, in a SERIALIZABLE transaction, if the first statement
 which would require a snapshot would currently fail with ERROR:
 could not serialize access due to concurrent update we would
 instead get a fresh snapshot and retry -- which is what we do in a
 READ COMMITTED transaction.

So I for multi-statement transactions I don't see what this buys you.
You'll still have to write the code to retry, and postgres retrying in
the cases where it can isn't really going to be a whole lot better.

Moreover I think it would kick in less often than you might expect and
sometimes surprise people by not kicking in when they expect it to.
Any internal queries could count (though i think you get away with
catalog operations in snapshot_now), any volatile functions, etc. So
people might write a single-statement SQL transaction and not bother
writing retry logic and then be surprised by errors.

I'm unclear why serialization failures would be rare. It depends
entirely on the application. If you're querying records which are busy
from concurrent updates you could get a continuous stream of
serialization failures. It seems better to report the situation to the
user all the time since they have to handle it already and might want
to know about the problem and implement some kind of backoff rather
than hide it from them but only sometimes so they still have to write
code to handle it but aren't allows to handle it consistently.

This isn't the first time that we've seen advantages that could be had
from packaging up a whole transaction so the database can see
everything the transaction needs to do. Perhaps we should have an
interface for saying you're going to feed a series of commands which
you want the database to repeat for you verbatim automatically on
serialization failures. Since you can't construct the queries based on
the results of previous queries the database would be free to buffer
them all up and run them together at the end of the transaction which
would allow the other tricky optimizations we've pondered in the past
as well.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 we would instead get a fresh snapshot and retry -- which is what
 we do in a READ COMMITTED transaction.
 
 I think you misunderstand how READ COMMITTED works; it does not
 change the snapshot for the entire statement, it only follows the
 update chain for a particular tuple that's been chosen for update
 or delete.
 
Thanks for the clarification.  That does not work for SERIALIZABLE
at all, because other tables or rows referenced in that first
statement would be using the original snapshot.  Indeed, the
behavior under READ COMMITTED could be astonishing in certain
circumstances as it breaks atomicity:
 
atomicity: all of the results of a transaction should be visible in
the database, or none of them should be. It should never be possible
to see the results of some operations in a transaction without the
others.
 
connection1:
 
test=# create table t (c1 int not null primary key, c2 int);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
t_pkey for table t
CREATE TABLE
test=# insert into t values (1, 101), (2, 7);
INSERT 0 2
test=# start TRANSACTION ISOLATION LEVEL READ COMMITTED ;
START TRANSACTION
test=# update t set c2 = c2 + (select c2 from t where c1 = 2) where
c1 = 1;
UPDATE 1
test=# update t set c2 = 11 where c1 = 2;
UPDATE 1
 
connection2:

test=# START TRANSACTION ISOLATION LEVEL READ COMMITTED ;
START TRANSACTION
test=# update t set c2 = c2 + (select c2 from t where c1 = 2) where
c1 = 1;
[blocks]
 
connection1:

test=# commit;
COMMIT
 
connection2:

UPDATE 1
test=# commit;
COMMIT
test=# select * from t;
 c1 | c2
+-
  2 |  11
  1 | 115
(2 rows)
 
The update on connection2 added the modified value of the first
update from connection1 to the unmodified value from the second
update on connection1.  In other words, the atomicity of the update
on connection1 is broken in this case.
 
I'm not sure why this is considered OK.  At a minimum it should be
mentioned in our documentation of our implementation of the READ
COMMITTED isolation level.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Thanks for the clarification.  That does not work for SERIALIZABLE
 at all, because other tables or rows referenced in that first
 statement would be using the original snapshot.  Indeed, the
 behavior under READ COMMITTED could be astonishing in certain
 circumstances as it breaks atomicity:

Yup.  That is stated fairly clearly already in the description of
READ COMMITTED mode, no?
http://developer.postgresql.org/pgdocs/postgres/transaction-iso.html#XACT-READ-COMMITTED

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 the behavior under READ COMMITTED could be astonishing in certain
 circumstances as it breaks atomicity:
 
 Yup.  That is stated fairly clearly already in the description of
 READ COMMITTED mode, no?

http://developer.postgresql.org/pgdocs/postgres/transaction-iso.html#XACT-READ-COMMITTED
 
: it is possible for an updating command to see an inconsistent
: snapshot: it can see the effects of concurrent updating commands
: on the same rows it is trying to update, but it does not see
: effects of those commands on other rows in the database. This
: behavior makes Read Committed mode unsuitable for commands that
: involve complex search conditions
 
I don't know how many times I've read that page (many), yet I never
properly comprehended the impact of that part.  I think the last bit
I quoted above is somewhat misleading, in that it implies that the
issue is limited to complex search conditions.  In the failing case
I showed in this thread, the search conditions involved are
comparisons for equality of an integer literal to the one-column
integer primary key.  It seems like any join or subquery which
references a table is vulnerable, yes?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 http://developer.postgresql.org/pgdocs/postgres/transaction-iso.html#XACT-READ-COMMITTED

 I don't know how many times I've read that page (many), yet I never
 properly comprehended the impact of that part.  I think the last bit
 I quoted above is somewhat misleading, in that it implies that the
 issue is limited to complex search conditions.  In the failing case
 I showed in this thread, the search conditions involved are
 comparisons for equality of an integer literal to the one-column
 integer primary key.  It seems like any join or subquery which
 references a table is vulnerable, yes?

Well, it would all depend on what you're trying to do.  Typical
single-row UPDATE commands aren't really affected by this problem,
and in fact the behavior is pretty much exactly what they want as
long as the WHERE conditions don't involve columns that are being
changed.

Maybe we should replace the bit about complex search conditions
with something about referencing multiple rows to perform one update.
I'm not very sure what a clearer explanation would look like though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Greg Stark
On Thu, Dec 17, 2009 at 5:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, it would all depend on what you're trying to do.  Typical
 single-row UPDATE commands aren't really affected by this problem,
 and in fact the behavior is pretty much exactly what they want as
 long as the WHERE conditions don't involve columns that are being
 changed.

 Maybe we should replace the bit about complex search conditions
 with something about referencing multiple rows to perform one update.
 I'm not very sure what a clearer explanation would look like though.

I wonder if RETURNING hasn't created a whole new set of cases where
our READ COMMITTED behaviour is bogus. I guess it's equivalent to
having used SELECT FOR UPDATE.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 I'm not very sure what a clearer explanation would look like
 
As a stab at it, how about?:
 
This behavior makes Read Committed mode unsuitable for many UPDATE
or DELETE commands with joins or subqueries
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 I wonder if RETURNING hasn't created a whole new set of cases where
 our READ COMMITTED behaviour is bogus.

I don't see how.  It just gives you access to the same values that were
actually used by the UPDATE.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Robert Haas
On Thu, Dec 17, 2009 at 12:51 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:

 I'm not very sure what a clearer explanation would look like

 As a stab at it, how about?:

 This behavior makes Read Committed mode unsuitable for many UPDATE
 or DELETE commands with joins or subqueries

I don't think that's any clearer, though it is more disparaging.  :-)

Note we also say: The partial transaction isolation provided by Read
Committed mode is adequate for many applications, and this mode is
fast and simple to use; however, it is not sufficient for all cases.
Applications that do complex queries and updates might require a more
rigorously consistent view of the database than Read Committed mode
provides.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Joshua D. Drake
On Thu, 2009-12-17 at 12:58 -0500, Robert Haas wrote:
 On Thu, Dec 17, 2009 at 12:51 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
  Tom Lane t...@sss.pgh.pa.us wrote:
 
  I'm not very sure what a clearer explanation would look like
 
  As a stab at it, how about?:
 
  This behavior makes Read Committed mode unsuitable for many UPDATE
  or DELETE commands with joins or subqueries
 
 I don't think that's any clearer, though it is more disparaging.  :-)
 
 Note we also say: The partial transaction isolation provided by Read
 Committed mode is adequate for many applications, and this mode is
 fast and simple to use; however, it is not sufficient for all cases.
 Applications that do complex queries and updates might require a more
 rigorously consistent view of the database than Read Committed mode
 provides.

What is needed here is a layman's context of what isolation modes are
good for what type of operation. Neither your explanation or Tom's is
particularly useful except to say, Crap, I might be screwed but I don't
know if I am... how do I find out?

Joshua D. Drake



 
 ...Robert
 


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - 
Salamander


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 I'm not very sure what a clearer explanation would look like
 
 As a stab at it, how about?:
 
 This behavior makes Read Committed mode unsuitable for many UPDATE
 or DELETE commands with joins or subqueries

After thinking a bit, I'd be inclined to add a new paragraph.
In particular, now that FOR UPDATE actually works in subqueries,
it'd be worth pointing out that you can add that to guard against
this type of issue.  Perhaps, after the DELETE FROM website
example, we could add something like

UPDATEs and DELETEs involving joins or subqueries are particularly
at risk, since they may perform an update based on a combination of
old rows from other tables with an up-to-date target row.  This risk
can be mitigated by adding FOR UPDATE or FOR SHARE to subqueries, so
that all rows directly involved in an update are guaranteed current.
However that will also increase the risk of deadlock failures.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Robert Haas
On Thu, Dec 17, 2009 at 1:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 I'm not very sure what a clearer explanation would look like

 As a stab at it, how about?:

 This behavior makes Read Committed mode unsuitable for many UPDATE
 or DELETE commands with joins or subqueries

 After thinking a bit, I'd be inclined to add a new paragraph.
 In particular, now that FOR UPDATE actually works in subqueries,
 it'd be worth pointing out that you can add that to guard against
 this type of issue.  Perhaps, after the DELETE FROM website
 example, we could add something like

 UPDATEs and DELETEs involving joins or subqueries are particularly
 at risk, since they may perform an update based on a combination of
 old rows from other tables with an up-to-date target row.  This risk
 can be mitigated by adding FOR UPDATE or FOR SHARE to subqueries, so
 that all rows directly involved in an update are guaranteed current.
 However that will also increase the risk of deadlock failures.

I like that.  It might also be worth trying to explain that if you
select some data out of the database, do a computation with it, and
then use the results to drive an update, you're going to want to make
the initial select be FOR SHARE.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 I don't think that's any clearer, though it is more disparaging. 
 :-)
 
It's certainly not my goal to knock PostgreSQL.  The precise
conditions in which an UPDATE or DELETE can view an inconsistent
database state (and therefore potentially persist something based on
that inconsistent state) are that it has a FROM clause and/or
subqueries which reference data changed by a concurrent database
transaction which also affects rows which are targets of the UPDATE
or DELETE.  Precise descriptions of problem circumstances seem more
useful to developers than vague statements like it's usually good
enough, except when it isn't.
 
If an accurate description of the behavior is considered
disparaging, perhaps it's the behavior which should change, not just
the description of it.  Since I never use READ COMMITTED for
updates, I'm not going to weigh in on whether this is a big enough
problem to merit the effort and overhead of a different
implementation; I'm just suggesting we should put the information
out there more explicitly.  My wording was still a little on the
vague side, in an attempt to keep it short; perhaps that was a
mistake.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Tom Lane
Joshua D. Drake j...@commandprompt.com writes:
 What is needed here is a layman's context of what isolation modes are
 good for what type of operation. Neither your explanation or Tom's is
 particularly useful except to say, Crap, I might be screwed but I don't
 know if I am... how do I find out?

If we had a simple way to characterize that, we'd not be having this
discussion :-(

One possibility is to try to list the risky cases.  So far I can think
of:

* updates using a WHERE clause that tests columns being changed by other
transactions

* updates using subqueries/joins so that the result depends on other rows
besides the one directly updated/deleted, and those other rows are
subject to concurrent changes

But I'm not sure this is a complete list, and an incomplete one might do
more harm than good ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 After thinking a bit, I'd be inclined to add a new paragraph.
 In particular, now that FOR UPDATE actually works in subqueries,
 it'd be worth pointing out that you can add that to guard against
 this type of issue.  Perhaps, after the DELETE FROM website
 example, we could add something like
 
 UPDATEs and DELETEs involving joins or subqueries are particularly
 at risk, since they may perform an update based on a combination
 of old rows from other tables with an up-to-date target row.  This
 risk can be mitigated by adding FOR UPDATE or FOR SHARE to
 subqueries, so that all rows directly involved in an update are
 guaranteed current.  However that will also increase the risk of
 deadlock failures.
 
Much better than my suggestion.  Including both the problem
conditions and the solution is ideal.
 
I'd missed that we now allow FOR UPDATE and FOR SHARE on subqueries.
Nice enhancement.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 ...  The precise
 conditions in which an UPDATE or DELETE can view an inconsistent
 database state (and therefore potentially persist something based on
 that inconsistent state) are that it has a FROM clause and/or
 subqueries which reference data changed by a concurrent database
 transaction which also affects rows which are targets of the UPDATE
 or DELETE.

Are we sure that's a precise and complete description?  I don't have
a problem with putting a description just like that in the docs, but
I'm not yet convinced it's right.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 Are we sure that's a precise and complete description?  I don't
 have a problem with putting a description just like that in the
 docs, but I'm not yet convinced it's right.
 
Well, I thought it was when I typed it.  You mentioned referencing
other columns in the updated rows; I'll test to see how that
behaves.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Robert Haas
On Thu, Dec 17, 2009 at 1:12 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas robertmh...@gmail.com wrote:

 I don't think that's any clearer, though it is more disparaging.
 :-)

 It's certainly not my goal to knock PostgreSQL.  The precise
 conditions in which an UPDATE or DELETE can view an inconsistent
 database state (and therefore potentially persist something based on
 that inconsistent state) are that it has a FROM clause and/or
 subqueries which reference data changed by a concurrent database
 transaction which also affects rows which are targets of the UPDATE
 or DELETE.  Precise descriptions of problem circumstances seem more
 useful to developers than vague statements like it's usually good
 enough, except when it isn't.

 If an accurate description of the behavior is considered
 disparaging, perhaps it's the behavior which should change, not just
 the description of it.  Since I never use READ COMMITTED for
 updates, I'm not going to weigh in on whether this is a big enough
 problem to merit the effort and overhead of a different
 implementation; I'm just suggesting we should put the information
 out there more explicitly.  My wording was still a little on the
 vague side, in an attempt to keep it short; perhaps that was a
 mistake.

Don't get me wrong, I don't love the current behavior.  (I don't have
a competing proposal either.)  But I think we want to describe it with
precision, because there are also many cases where _it works fine_.
Telling people when it works and when it doesn't work is a lot more
useful than attempting to qualitatively estimate how good or bad it
is.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Kevin Grittner
Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:
  
 Are we sure that's a precise and complete description?  I don't
 have a problem with putting a description just like that in the
 docs, but I'm not yet convinced it's right.
  
 Well, I thought it was when I typed it.  You mentioned referencing
 other columns in the updated rows; I'll test to see how that
 behaves.
 
Some quick testing seems to show that for the rows on which we were
blocking, all columns reflect all updates from the concurrent
transaction on which we were waiting, including columns used in the
WHERE clause.  I'm not sure exactly what other tests might be
necessary.  I'm having trouble coming up with anything which doesn't
involve a join or subquery, but that could be a failure of
imagination.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Some quick testing seems to show that for the rows on which we were
 blocking, all columns reflect all updates from the concurrent
 transaction on which we were waiting, including columns used in the
 WHERE clause.  I'm not sure exactly what other tests might be
 necessary.  I'm having trouble coming up with anything which doesn't
 involve a join or subquery, but that could be a failure of
 imagination.

The issue that I was thinking about is that there are actually two
rounds of WHERE testing involved in a READ COMMITTED update: first
we fetch a row that matches the WHERE clause *in the query snapshot*,
and then we fetch its most up-to-date version and recheck the WHERE
condition for that.  If the updated version no longer satisfies WHERE
we ignore it.  The trouble with this is that the same transaction that
changed that row to not satisfy WHERE might have also changed some
other row so that it now *does* satisfy WHERE, but we won't ever find
that other row because in the query snapshot it doesn't pass the WHERE.
The website example in the docs is meant to illustrate this hazard.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 Don't get me wrong, I don't love the current behavior.  (I don't
 have a competing proposal either.)  But I think we want to
 describe it with precision, because there are also many cases
 where _it works fine_. Telling people when it works and when it
 doesn't work is a lot more useful than attempting to qualitatively
 estimate how good or bad it is.
 
It sounds like we're in violent agreement.  I'm not sure what I said
which might have led you to believe I felt otherwise.
 
[reviews thread]
 
The suggestion you felt was more disparaging was:
 
: This behavior makes Read Committed mode unsuitable for
: many UPDATE or DELETE commands with joins or subqueries
 
You do realize that what is already in the documentation, for which
this was a suggested replacement, was?:
 
: This behavior makes Read Committed mode unsuitable for
: commands that involve complex search conditions
 
I'm not seeing where I made it more disparaging; I was trying to
clarify under what circumstances it was a problem.  If you have a
suggestion for a better way to phrase the part I left alone, feel
free to suggest something.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 [a transaction] might have also changed some other row so that it
 now *does* satisfy WHERE, but we won't ever find that other row
 because in the query snapshot it doesn't pass the WHERE.
 
OK; got it.  No way to fix that, really, without getting a fresh
snapshot and re-starting the command, is there?  I take it from your
earlier posts that wouldn't be pretty. On the bright side, to be
taken as showing an inconsistent state, the transaction on which we
block has to both move one or more rows into the matching set as
well as moving one or more rows out.
 
Another example of the phenomenon:
 
connection1:

test=# create table t (name text not null primary key, is_it boolean
not null);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
t_pkey for table t
CREATE TABLE
test=# insert into t values ('huey', true), ('dewey', false),
('louie', false);
INSERT 0 3
test=# start transaction isolation level read committed;
START TRANSACTION
test=# update t set is_it = not is_it where name in ('huey',
'dewey');
UPDATE 2
 
connection2:

test=# start transaction isolation level read committed;
START TRANSACTION
test=# select * from t where is_it for update;
[blocks]
 
connection1:

test=# commit;
COMMIT
 
connection2:

 name | is_it
--+---
(0 rows)
 
test=# select * from t where is_it for update;
 name  | is_it
---+---
 dewey | t
(1 row)
 
So this particular issue means that rows affected will be the
intersection of rows matching the WHERE clause before and after the
conflicting concurrent transaction(s) commit.  The join/subquery
issue means that all values used would be based on the snapshot at
the start of the statement except that values from rows updated by
concurrent transactions on which we blocked would be based on the
updated rows.  Any other issues?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Joshua D. Drake
On Thu, 2009-12-17 at 13:13 -0500, Tom Lane wrote:
 Joshua D. Drake j...@commandprompt.com writes:
  What is needed here is a layman's context of what isolation modes are
  good for what type of operation. Neither your explanation or Tom's is
  particularly useful except to say, Crap, I might be screwed but I don't
  know if I am... how do I find out?
 
 If we had a simple way to characterize that, we'd not be having this
 discussion :-(

Certainly true. Sorry if I came off harsh my intent was to illustrate
the more verbose yet less detailed information isn't going to help.

Joshua D. Drake



-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or 
Sir.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-17 Thread Joshua D. Drake
On Thu, 2009-12-17 at 12:16 -0600, Kevin Grittner wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:
  
  After thinking a bit, I'd be inclined to add a new paragraph.
  In particular, now that FOR UPDATE actually works in subqueries,
  it'd be worth pointing out that you can add that to guard against
  this type of issue.  Perhaps, after the DELETE FROM website
  example, we could add something like
  
  UPDATEs and DELETEs involving joins or subqueries are particularly
  at risk, since they may perform an update based on a combination
  of old rows from other tables with an up-to-date target row.  This
  risk can be mitigated by adding FOR UPDATE or FOR SHARE to
  subqueries, so that all rows directly involved in an update are
  guaranteed current.  However that will also increase the risk of
  deadlock failures.
  
 Much better than my suggestion.  Including both the problem
 conditions and the solution is ideal.
  
 I'd missed that we now allow FOR UPDATE and FOR SHARE on subqueries.
 Nice enhancement.

+1

Joshua D. Drake


  
 -Kevin
 


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or 
Sir.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] determine snapshot after obtaining locks for first statement

2009-12-16 Thread Kevin Grittner
The Cahill thesis mentions an interesting optimization -- they defer
determination of the snapshot until after any locks required for the
first statement have been acquired.  Where the first statement was,
for example, an UPDATE, this reduced re-reads or rollbacks in the
face of concurrent modifications.
 
Does PostgreSQL currently do this?  If not, would it make sense?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-16 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 The Cahill thesis mentions an interesting optimization -- they defer
 determination of the snapshot until after any locks required for the
 first statement have been acquired.  Where the first statement was,
 for example, an UPDATE, this reduced re-reads or rollbacks in the
 face of concurrent modifications.
 
 Does PostgreSQL currently do this?

Yes --- it's not an optimization, it's necessary for basic
functionality to work correctly.  See for example the last para at
http://www.postgresql.org/docs/8.4/static/applevel-consistency.html

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-16 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 The Cahill thesis mentions an interesting optimization -- they
 defer determination of the snapshot until after any locks
 required for the first statement have been acquired.  Where the
 first statement was, for example, an UPDATE, this reduced
 re-reads or rollbacks in the face of concurrent modifications.
  
 Does PostgreSQL currently do this?
 
 Yes --- it's not an optimization, it's necessary for basic
 functionality to work correctly.
 
Hmmm...  Testing seems to indicate that this doesn't work per the
described optimization:
 
T1: start transaction isolation level serializable;
START TRANSACTION
T2: start transaction isolation level serializable;
START TRANSACTION
T1: update t2a set c2 = c2 - 1 where c1 = 1;
UPDATE 1
T2: update t2a set c2 = c2 - 1 where c1 = 1;
[blocks]
T1: commit;
COMMIT
T2: [unblocks]
ERROR:  could not serialize access due to concurrent update
 
The optimization Cahill describes is that for the first statement in
a transaction, the lock for the UPDATE is acquired before obtaining
the snapshot, so T2 succeeds after T1 commits.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-16 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Yes --- it's not an optimization, it's necessary for basic
 functionality to work correctly.
 
 Hmmm...  Testing seems to indicate that this doesn't work per the
 described optimization:

You'd need an explicit LOCK TABLE t2a after starting the transaction.
With the code you give, the snapshot is acquired at the beginning of
processing the UPDATE command, before it finds out that the target
is t2a and acquires a lock on it.  (Besides which the lock acquired
by UPDATE isn't exclusive and wouldn't block anyway...)
 
 The optimization Cahill describes is that for the first statement in
 a transaction, the lock for the UPDATE is acquired before obtaining
 the snapshot, so T2 succeeds after T1 commits.

If he's talking about automatically taking an exclusive lock, I doubt
very many of our users would find that an improvement.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-16 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 With the code you give, the snapshot is acquired at the beginning
 of processing the UPDATE command, before it finds out that the
 target is t2a and acquires a lock on it.
 
Right.  The optimization would be to wait to determine the snapshot
for the UPDATE statement until we acquire the lock.  (This trick
only worked for the first statement in a database transaction.) 
Perhaps that's not feasible in PostgreSQL.  That's what I'm asking.
 
 (Besides which the lock acquired by UPDATE isn't exclusive and
 wouldn't block anyway...)
 
It blocks other UPDATEs.
 
 The optimization Cahill describes is that for the first statement
 in a transaction, the lock for the UPDATE is acquired before
 obtaining the snapshot, so T2 succeeds after T1 commits.
 
 If he's talking about automatically taking an exclusive lock, I
 doubt very many of our users would find that an improvement.
 
I don't believe he's talking about a lock which excludes SELECTs on
the data.  He's talking about reducing transaction aborts based on
the First Committer Wins rule.  Apparently it helped his
high-contention benchmarks considerably in all three isolation
levels he was testing -- snapshot, serializable snapshot, and
serializable S2PL.
 
If there's something fundamentally different about how PostgreSQL
does things, such that we *can't* get the lock before the snapshot,
that's fine -- I just thought maybe this was a trick we could use.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] determine snapshot after obtaining locks for first statement

2009-12-16 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 (Besides which the lock acquired by UPDATE isn't exclusive and
 wouldn't block anyway...)
 
 It blocks other UPDATEs.

Not at the table level.  If you could lock only at the tuple level
maybe you'd have something, but it seems like you can't find the
target tuples without having acquired a snapshot.

 If he's talking about automatically taking an exclusive lock, I
 doubt very many of our users would find that an improvement.
 
 I don't believe he's talking about a lock which excludes SELECTs on
 the data.

Well, you could take such a lock (one that blocks other UPDATEs but
not SELECTs) but it would be a clear loss of concurrency compared to
what we have now.  Unless I misunderstand what you're talking about,
it'd serialize all updates on a given table whether they conflict or
not.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers