Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2008-03-12 Thread Bruce Momjian

Added to TODO:

* Expire published xmin for read-only and idle transactions

  http://archives.postgresql.org/pgsql-hackers/2007-09/msg00343.php


---

Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  As a fallout of this work that I haven't seen made explicit, a session
  opening a transaction and then sitting around doing nothing will not
  cause as many problems as it used to -- for example it won't cause
  VACUUM to be unable to clean up dead rows.  Is this correct?
 
 Yeah, if you just issue BEGIN and then sit, you won't have acquired
 either an xid or an xmin, so you don't create a VACUUM problem anymore.
 
 If you issue BEGIN, then SELECT, then sit, you'll be publishing an xmin
 but not an xid, so at that point you become a problem for VACUUM.
 However, internally you don't have any live snapshots (if you're in READ
 COMMITTED mode), so eventually we could have you stop publishing an xmin
 too.  That's something for 8.4 though.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-08 Thread Alvaro Herrera

As a fallout of this work that I haven't seen made explicit, a session
opening a transaction and then sitting around doing nothing will not
cause as many problems as it used to -- for example it won't cause
VACUUM to be unable to clean up dead rows.  Is this correct?

Nowadays this is no longer much of a problem, but it used to be, back
when drivers were more broken than they are now (when they had the
commit method send COMMIT; BEGIN); however, it still seems to me like
a big step forwards.

-- 
Alvaro Herrera  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
Siempre hay que alimentar a los dioses, aunque la tierra esté seca (Orual)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-08 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 As a fallout of this work that I haven't seen made explicit, a session
 opening a transaction and then sitting around doing nothing will not
 cause as many problems as it used to -- for example it won't cause
 VACUUM to be unable to clean up dead rows.  Is this correct?

Yeah, if you just issue BEGIN and then sit, you won't have acquired
either an xid or an xmin, so you don't create a VACUUM problem anymore.

If you issue BEGIN, then SELECT, then sit, you'll be publishing an xmin
but not an xid, so at that point you become a problem for VACUUM.
However, internally you don't have any live snapshots (if you're in READ
COMMITTED mode), so eventually we could have you stop publishing an xmin
too.  That's something for 8.4 though.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-08 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 If you issue BEGIN, then SELECT, then sit, you'll be publishing an xmin
 but not an xid, so at that point you become a problem for VACUUM.
 However, internally you don't have any live snapshots (if you're in READ
 COMMITTED mode), so eventually we could have you stop publishing an xmin
 too.  That's something for 8.4 though.

Aren't there some things that depend on the idea that even READ COMMITTED
transactions still have a serializable snapshot lying around for them to use?

This is actually one of the rough spots in HOT that I'm afraid you'll have
problems with when you review it. If there were any HOT chains which are
broken according to a new index definition then a any transaction considering
using that index needs to know whether there's any possibility the plan will
be used with a snapshot which can see those old tuples. It currently does this
by checking if the transaction which created the index is in its serializable
snapshot.

It seems weird to have the planner using snapshots in any way, but I don't see
any direct problems which arise. Essentially it's using the serializable
snapshot as a proxy for better live snapshot bookkeeping. If there's no
serializable snapshot then any future snapshot taken will surely not be able
to see the old tuples, and if there is then it should be fine (though not as
good as possible if we did more bookkeeping) to use it to determine whether
the old tuples could possibly be visible when the plan is executed.

I'm a bit afraid the plan will stay cached longer than would be ideal. If the
plan doesn't use the new index then ideally we would want to invalidate it
from the cache at the end of this transaction I suppose.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-08 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 If you issue BEGIN, then SELECT, then sit, you'll be publishing an xmin
 but not an xid, so at that point you become a problem for VACUUM.
 However, internally you don't have any live snapshots (if you're in READ
 COMMITTED mode), so eventually we could have you stop publishing an xmin
 too.  That's something for 8.4 though.

 Aren't there some things that depend on the idea that even READ COMMITTED
 transactions still have a serializable snapshot lying around for them to use?

I don't see why.  A READ COMMITTED transaction that is between
statements does still have a serializable snapshot sitting around, but
it won't ever reference it again.  (AFAIR you can't switch an active
transaction into serializable mode.)  So with a bit of work on snapshot
management we could recognize that we have no live snapshots and clear
the published xmin.  This has been discussed before, eg this thread:
http://archives.postgresql.org/pgsql-patches/2007-03/msg00381.php

 This is actually one of the rough spots in HOT that I'm afraid you'll have
 problems with when you review it. If there were any HOT chains which are
 broken according to a new index definition then a any transaction considering
 using that index needs to know whether there's any possibility the plan will
 be used with a snapshot which can see those old tuples. It currently does this
 by checking if the transaction which created the index is in its serializable
 snapshot.

This seems to be a proxy for what's the oldest live snapshot in this
backend, which is exactly what we'd have to track to adjust xmin
anyway.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-08 Thread Josh Berkus
Greg,

 Aren't there some things that depend on the idea that even READ COMMITTED
 transactions still have a serializable snapshot lying around for them to
 use?

No, that would be REPEATABLE READ.  That's one of the areas where we need to 
test HOT; does RR and SERIALIZABLE still work correctly with HOT?

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-08 Thread Gregory Stark

Josh Berkus [EMAIL PROTECTED] writes:

 No, that would be REPEATABLE READ.  That's one of the areas where we need to 
 test HOT; does RR and SERIALIZABLE still work correctly with HOT?

I'm a little confused by your question. REPEATABLE READ is an isolation level
we don't directly support in Postgres. If you set the isolation level to
REPEATABLE READ you get SERIALIZABLE which is a higher level.

HOT works fine with SERIALIZABLE, It uses the same criteria as vacuum for
determining when a tuple is dead, namely it compares against RecentGlobalXmin.

The check for whether a transaction can see any old tuples in broken chains
uses the serializable snapshot as a conservative proxy for the oldest snapshot
which might be in use. That will work for both serializable and read
committed.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-07 Thread Florian G. Pflug

Simon Riggs wrote:

On Fri, 2007-09-07 at 06:36 +0200, Florian G. Pflug wrote:

Tom Lane wrote:

Florian G. Pflug [EMAIL PROTECTED] writes:
- I actually think with just a little bit of more work, we

can go even further, and get rid of the ReadNewTransactionId() call
completely during snapshotting.

[ squint... ]  This goes a bit far for me.  In particular, I think this
will fail in the edge case when there are no live XIDs visible in
ProcArray.  You cannot go back and do ReadNewTransactionId afterward,
at least not without re-scanning the ProcArray a second time, which
makes it at best a questionable win.

Why would it?


I think the additional suggestion goes a bit too far. You may be right,
but I don't want to change the transaction system in advanced ways this
close to the next release. We may have difficulty spotting bugs in that
thinking during beta.


Ok, those were two clear votes against doing this, so I'll stop
arguing ;-). I do think that we should have another look at this when
8.4 opens, though.

greetings, Florian Pflug


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-07 Thread Simon Riggs
On Fri, 2007-09-07 at 06:36 +0200, Florian G. Pflug wrote:
 Tom Lane wrote:
  Florian G. Pflug [EMAIL PROTECTED] writes:
  So I believe you're right, and we can skip taking the lock in the no
  xid case 

Sounds good.

 - I actually think with just a little bit of more work, we
  can go even further, and get rid of the ReadNewTransactionId() call
  completely during snapshotting.
  
  [ squint... ]  This goes a bit far for me.  In particular, I think this
  will fail in the edge case when there are no live XIDs visible in
  ProcArray.  You cannot go back and do ReadNewTransactionId afterward,
  at least not without re-scanning the ProcArray a second time, which
  makes it at best a questionable win.
 
 Why would it?

I think the additional suggestion goes a bit too far. You may be right,
but I don't want to change the transaction system in advanced ways this
close to the next release. We may have difficulty spotting bugs in that
thinking during beta.

lazy XID assignment will reduce the number of times
GetNextTransactionId() is called and if we also avoid taking the
ProcArrayLock for CommitTransaction() then we will have significantly
reduced contention for mixed workloads (i.e. most real ones).

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-07 Thread Tom Lane
Florian G. Pflug [EMAIL PROTECTED] writes:
 Why would it? The idea was to remember the largest committed xid, and that
 won't go away just because the proc array is rather empty xid-wise.

I hadn't fully absorbed this idea last night, but now that I have, I'm
starting to think it's a good one.

 (That slightly lagging of largestCommittedXid might cause some tuples not to
 be VACUUMED though, so we might want to update largestCommittedXid for
 ABORTS too, and probably rename it to largestNonRunningXid or whatever ;-) ).

LargestCompletedXid, perhaps?

 This would rid us of the rather complicated entanglement of XidGenLock and
 the ProcArrayLock, lessen the lock contention, and reduce the average snapshot
 size a bit.

In view of Simon's earlier comments at
http://archives.postgresql.org/pgsql-hackers/2007-07/msg00948.php
it seems like not having to take the XidGenLock during GetSnapshotData
ought to be a pretty serious win performance-wise, and it might open up
some other possibilities that are now foreclosed by the risk of deadlock
between the two locks.

I've spent the past hour or so trying to consolidate the comments in
GetSnapshotData and related places into a single chunk of text to be
added to src/backend/access/transam/README.  Attached is what I have so
far --- this incorporates the idea of not taking ProcArrayLock to exit
an XID-less transaction, but not yet Florian's idea.  I think it'd get
simpler if we changed to that, but am posting this for comments.

regards, tom lane


Interlocking transaction begin, transaction end, and snapshots
--

We try hard to minimize the amount of overhead and lock contention involved
in the frequent activities of beginning/ending a transaction and taking a
snapshot.  Unfortunately, we must have some interlocking for this, because
it is critical that all backends agree on the commit order of transactions.
For example, suppose an UPDATE in xact A is blocked by xact B's prior
update of the same row, and xact B is doing commit while xact C gets a
snapshot.  Xact A can complete and commit as soon as B releases its locks.
If xact C's GetSnapshotData sees xact B as still running, then it had
better see xact A as still running as well, or it will be able to see two
tuple versions - one deleted by xact B and one inserted by xact A.  That
is, if A thinks it committed after B, C had better think the same.  We
enforce this by not allowing any transaction to exit the set of running
transactions while a snapshot is being taken.  (This rule is probably
stronger than necessary, but see the next problem.)  The implementation
of this is that GetSnapshotData takes the ProcArrayLock in shared mode
(thereby allowing multiple backends to take snapshots in parallel), but
xact.c must take the ProcArrayLock in exclusive mode while clearing
MyProc-xid at transaction end (either commit or abort).

Here is another variant of the risk scenario:

1. Xact A is running (in Read Committed mode).
2. Xact C's GetSnapshotData reads next transaction ID into xmax, then is
   swapped out before it can acquire ProcArrayLock.
3. Xact B gets new XID (= C's xmax), makes changes and commits.
4. Xact A changes some row R changed by xact B and commits.
5. Xact C finishes getting its snapshot data.  It sees xact A as done,
   but sees xact B as still running (since B = xmax).

Now C will see R changed by xact B and then xact A, *but* does not see
other changes made by xact B.  If C is supposed to be in Serializable mode,
this is wrong.

To prevent this it is necessary that GetSnapshotData acquire ProcArrayLock
before it calls ReadNewTransactionId.  This prevents xact A from exiting
the set of running transactions seen by xact C.  Therefore both A and B
will be seen as still running = no inconsistency.

In short, then, the rule is that no transactions may exit the set of
currently-running transactions between the time we fetch xmax and the time
we finish building our snapshot.  However, this restriction only applies
to transactions that have an XID --- read-only transactions can end without
acquiring ProcArrayLock, since they don't affect anyone else's snapshot.

We must also require GetNewTransactionId to store the new XID into the
shared ProcArray before releasing XidGenLock.  This ensures that when
GetSnapshotData calls ReadNewTransactionId (which also takes XidGenLock),
all active XIDs before the returned value of nextXid are already present in
the ProcArray and can't be missed by GetSnapshotData.  Unfortunately, we
can't have GetNewTransactionId take ProcArrayLock to do this, else it could
deadlock against GetSnapshotData.  Therefore, we simply let
GetNewTransactionId store into MyProc-xid without any lock.  We are
thereby relying on fetch/store of an XID to be atomic, else other backends
might see a partially-set XID.  (NOTE: for multiprocessors that need
explicit memory access fence instructions, this means that

Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-07 Thread Florian G. Pflug

Tom Lane wrote:

I've spent the past hour or so trying to consolidate the comments in
GetSnapshotData and related places into a single chunk of text to be
added to src/backend/access/transam/README.  Attached is what I have so
far --- this incorporates the idea of not taking ProcArrayLock to exit
an XID-less transaction, but not yet Florian's idea.  I think it'd get
simpler if we changed to that, but am posting this for comments.



Interlocking transaction begin, transaction end, and snapshots
--

We try hard to minimize the amount of overhead and lock contention involved
in the frequent activities of beginning/ending a transaction and taking a
snapshot.  Unfortunately, we must have some interlocking for this, because
it is critical that all backends agree on the commit order of transactions.

This is actually still a slightly stronger requirement than what we really
need I think.

To simplify the following discussion, A - B shall mean that transaction A saw
B as committed. Conversely, A ! B shall mean that A treats B as in-progress.
If A was in read-committed mode, the visibility refers to the latest snapshot
that A used.

Now assume A and B commit at nearly the same time, and for two other
transactions C and D the following holds:
C - A, C ! B but
D ! A, D - B.

This would violate the requirement that the commit order is globally
agreed upon, yet as long as both A ! B and B ! A holds, there is no
conflict. (Note that if A and B are serializable, A ! B  B ! A
implies that A and B cannot have touched the same record and have
both committed - one would have been aborted due to a
SerializationError).

I must admit, though, that this is a quite academic case, since the
prerequisite A ! B and B ! A is something we have no control over
for read-committed transactions - who knows when they might have taken
their last snapshot...

Still, I wanted to mention this because I believe that the minimal
requirement that we actually *need* to enforce is
A - B and B - C imply A - C.  (T1)

The actual implementation will probably always have to enforce
something slightly stronger, but it's still nice to know the
minimal guarantee needed to be able to judge correctness.


For example, suppose an UPDATE in xact A is blocked by xact B's prior
update of the same row, and xact B is doing commit while xact C gets a
snapshot.  Xact A can complete and commit as soon as B releases its locks.
If xact C's GetSnapshotData sees xact B as still running, then it had
better see xact A as still running as well, or it will be able to see two
tuple versions - one deleted by xact B and one inserted by xact A. 

In my notation this becomes: A - B and C ! B implies C ! A.

This then follows from (T1) - Assume that A - B, C ! B but C - A,
then with (A) C - B follows from C - A and A - B, which contradicts
C ! B.


We
enforce this by not allowing any transaction to exit the set of running
transactions while a snapshot is being taken.  (This rule is probably
stronger than necessary, but see the next problem.)  The implementation
of this is that GetSnapshotData takes the ProcArrayLock in shared mode
(thereby allowing multiple backends to take snapshots in parallel), but
xact.c must take the ProcArrayLock in exclusive mode while clearing
MyProc-xid at transaction end (either commit or abort).

Agreed. We *do* enforce a strict global ordering of committs and snapshots.
This then guarantees (T1) because if A - B and B - C, than A *must*
have taken it's snapshot after B committed, and B in turn *must* have
taken it's snapshot after C committed, so surely A - C will hold too.


Here is another variant of the risk scenario:

1. Xact A is running (in Read Committed mode).
2. Xact C's GetSnapshotData reads next transaction ID into xmax, then is
   swapped out before it can acquire ProcArrayLock.
3. Xact B gets new XID (= C's xmax), makes changes and commits.
4. Xact A changes some row R changed by xact B and commits.
5. Xact C finishes getting its snapshot data.  It sees xact A as done,
   but sees xact B as still running (since B = xmax).

Now C will see R changed by xact B and then xact A, *but* does not see
other changes made by xact B.  If C is supposed to be in Serializable mode,
this is wrong.

I never really grasped why we need to assume serializable here - this seems
wrong if C is read-committed too. Seeing only half of a transaction's changes
can never be right, can it?


To prevent this it is necessary that GetSnapshotData acquire ProcArrayLock
before it calls ReadNewTransactionId.  This prevents xact A from exiting
the set of running transactions seen by xact C.  Therefore both A and B
will be seen as still running = no inconsistency.

Another point of view is that determining the xmax of a snapshot really *is*
part of taking the snapshot. Since we already obtained that we need to
serialize snapshotting and committing, it follows that we must not allow
committs to happen between the 

Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-07 Thread Tom Lane
Florian G. Pflug [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Here is another variant of the risk scenario:
 
 1. Xact A is running (in Read Committed mode).
 2. Xact C's GetSnapshotData reads next transaction ID into xmax, then is
 swapped out before it can acquire ProcArrayLock.
 3. Xact B gets new XID (= C's xmax), makes changes and commits.
 4. Xact A changes some row R changed by xact B and commits.
 5. Xact C finishes getting its snapshot data.  It sees xact A as done,
 but sees xact B as still running (since B = xmax).
 
 Now C will see R changed by xact B and then xact A, *but* does not see
 other changes made by xact B.  If C is supposed to be in Serializable mode,
 this is wrong.

 I never really grasped why we need to assume serializable here - this seems
 wrong if C is read-committed too. Seeing only half of a transaction's changes
 can never be right, can it?

You'd be surprised ;-) ... Read Committed is not totally consistent,
even considering a single statement using a single snapshot, because of
the rules about UPDATEs starting from the latest committed version.
But let's not get into that today.

Anyway, one thing I realized while making this text is that the example
cases mentioned in CommitTransaction and GetSnapshotData really were
basically the same case.  The difference is that in this case, the
failure to see B as committed is closely related to the way that xmax is
computed.  We could get rid of this example if we switched over to your
LatestCompletedXid proposal.

 I had a rather hard time understanding these things initially - at least
 for me, the enlightenment came when I realized that most of the locking
 in there actually ensures that (T1) holds - or in other words, that
 the relation A sees B as committed *has* to be a transitive one.
 So I'd like to see this reasoning in that README - though maybe I'm the
 only one to whom this is the clearest wording...

I'll put some of this into the README.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-07 Thread Tom Lane
Here's some revised text for the README file, based on using Florian's
idea of a global latestCompletedXid variable.  As I worked through it
I realized that in this design, XidGenLock gates entry of new XIDs into
the ProcArray while ProcArrayLock gates their removal.  Which is an
interesting sort of symmetry property.  It also turns out that the
reason we need to gate entry with XidGenLock is to keep from breaking
GetOldestXmin, rather than to ensure correctness of snapshots per se.

(Note: I refer in the text to ProcArrayEndTransaction(), which is a
function I'm thinking of putting into procarray.c to replace the current
inline-in-xact.c code that clears xid and related fields.)

Comments?

regards, tom lane


Interlocking transaction begin, transaction end, and snapshots
--

We try hard to minimize the amount of overhead and lock contention involved
in the frequent activities of beginning/ending a transaction and taking a
snapshot.  Unfortunately, we must have some interlocking for this, because
we must ensure consistency about the commit order of transactions.
For example, suppose an UPDATE in xact A is blocked by xact B's prior
update of the same row, and xact B is doing commit while xact C gets a
snapshot.  Xact A can complete and commit as soon as B releases its locks.
If xact C's GetSnapshotData sees xact B as still running, then it had
better see xact A as still running as well, or it will be able to see two
tuple versions - one deleted by xact B and one inserted by xact A.  Another
reason why this would be bad is that C would see (in the row inserted by A)
earlier changes by B, and it would be inconsistent for C not to see any
of B's changes elsewhere in the database.

Formally, the correctness requirement is if a snapshot A considers
transaction X as committed, and any of transaction X's snapshots considered
transaction Y as committed, then snapshot A must consider transaction Y as
committed.

What we actually enforce is strict serialization of commits and rollbacks
with snapshot-taking: we do not allow any transaction to exit the set of
running transactions while a snapshot is being taken.  (This rule is
stronger than necessary for consistency, but is relatively simple to
enforce, and it assists with some other issues as explained below.)  The
implementation of this is that GetSnapshotData takes the ProcArrayLock in
shared mode (so that multiple backends can take snapshots in parallel),
but ProcArrayEndTransaction must take the ProcArrayLock in exclusive mode
while clearing MyProc-xid at transaction end (either commit or abort).

ProcArrayEndTransaction also holds the lock while advancing the shared
latestCompletedXid variable.  This allows GetSnapshotData to use
latestCompletedXid + 1 as xmax for its snapshot: there can be no
transaction = this xid value that the snapshot needs to consider as
completed.

In short, then, the rule is that no transaction may exit the set of
currently-running transactions between the time we fetch latestCompletedXid
and the time we finish building our snapshot.  However, this restriction
only applies to transactions that have an XID --- read-only transactions
can end without acquiring ProcArrayLock, since they don't affect anyone
else's snapshot nor latestCompletedXid.

Transaction start, per se, doesn't have any interlocking with these
considerations, since we no longer assign an XID immediately at transaction
start.  But when we do decide to allocate an XID, GetNewTransactionId must
store the new XID into the shared ProcArray before releasing XidGenLock.
This ensures that all top-level XIDs = latestCompletedXid are either
present in the ProcArray, or not running anymore.  (This guarantee doesn't
apply to subtransaction XIDs, because of the possibility that there's not
room for them in the subxid array; instead we guarantee that they are
present or the overflow flag is set.)  If a backend released XidGenLock
before storing its XID into MyProc, then it would be possible for another
backend to allocate and commit a later XID, causing latestCompletedXid to
pass the first backend's XID, before that value became visible in the
ProcArray.  That would break GetOldestXmin, as discussed below.

We allow GetNewTransactionId to store the XID into MyProc-xid (or the
subxid array) without taking ProcArrayLock.  This was once necessary to
avoid deadlock; while that is no longer the case, it's still beneficial for
performance.  We are thereby relying on fetch/store of an XID to be atomic,
else other backends might see a partially-set XID.  This also means that
readers of the ProcArray xid fields must be careful to fetch a value only
once, rather than assume they can read it multiple times and get the same
answer each time.

Another important activity that uses the shared ProcArray is GetOldestXmin,
which must determine a lower bound for the oldest xmin of any active MVCC
snapshot, system-wide.  

Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-07 Thread Florian G. Pflug

Tom Lane wrote:
Here's some revised text for the README file, based on using Florian's idea 
of a global latestCompletedXid variable.  As I worked through it I realized 
that in this design, XidGenLock gates entry of new XIDs into the ProcArray 
while ProcArrayLock gates their removal.  Which is an interesting sort of 
symmetry property.  It also turns out that the reason we need to gate entry 
with XidGenLock is to keep from breaking GetOldestXmin, rather than to ensure

correctness of snapshots per se.

I believe it would break both, no? If an xid = latestCompletedXid is
not included in the snapshot, but later used for updates, the snapshot
will see those changes as committed when they really are not.

But other than that, it really sounds fine. It certainly explains things much
better than the comments in the existing code.

I noticed two rather cosmetic issues
.) latestCompletedXid sounds as it might refer to the *last* completed xid,
but it actually refers to the largest / highest completed xid. So maybe we
should call it highestCompletedXid or largestCompletedXid.

.) Since you mention that we assume reading and writing int4s are atomic
operations, maybe we should mention that for safety's sake we mark the
corresponding pointers with volatile?

greetings, Florian Pflug



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-07 Thread Tom Lane
Florian G. Pflug [EMAIL PROTECTED] writes:
 I noticed two rather cosmetic issues
 .) latestCompletedXid sounds as it might refer to the *last* completed xid,
 but it actually refers to the largest / highest completed xid. So maybe we
 should call it highestCompletedXid or largestCompletedXid.

Actually that was an intentional choice: because of the wraparound
behavior of XIDs, the latest value is not necessarily numerically
largest.  I'm not wedded to it though.

 .) Since you mention that we assume reading and writing int4s are atomic
 operations, maybe we should mention that for safety's sake we mark the
 corresponding pointers with volatile?

Couldn't hurt.

I have a draft patch that I'll post shortly.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-06 Thread Tom Lane
Florian G. Pflug [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 The lazy-XID patch, as committed, doesn't help that situation at all,

 I think the comment is correct in principle - If we remove the oldest
 xmin without locking, then two concurrent OldestXmin calculations
 will get two different results. The question is if that has any
 negative effects, though.

My point is that there's a difference between what you compute (and
publish) as your own xmin, and what you compute as the RecentGlobalXmin.
I don't think there's any need for a guarantee that two concurrent
processes get the same estimate of RecentGlobalXmin, as long as
they do not get an estimate less than reality, ie, that someone cannot
later compute and publish a smaller xmin.

There are reasons why we want two concurrent GetSnapshotDatas to compute
the same xmin, but I think in the end it just comes down to being a
prerequisite for the above constraint --- without that you're not sure
that someone might not be about to publish an xmin less than what you
obtained as RecentGlobalXmin.

Dropping a live xid is a whole different issue.  There, you have the
problem that you need everyone to see a consistent commit order, which
is what the example in GetSnapshotData is about.  But I don't think that
xmin enters into that.  xmin is only about is it safe to drop this
tuple because no one can see it?.  There, we don't have to be exactly
correct, we only have to err in the conservative direction.

 It was this comment in GetSnapshotData that made me keep the locking
 in the first place:

   * It is sufficient to get shared lock on ProcArrayLock, even if we are
   * computing a serializable snapshot and therefore will be setting
   * MyProc-xmin. This is because any two backends that have overlapping
   * shared holds on ProcArrayLock will certainly compute the same xmin

If I recall correctly, that text was written to justify downgrading
GetSnapshotData's hold on ProcArrayLock from exclusive to shared --- it
was merely arguing that the results wouldn't change if we did that.
I don't see an argument there that this condition is really *necessary*.
We do have to think carefully about whether GetOldestXmin can compute a
value that's too large, but right at the moment I see no problem there.

 So I believe you're right, and we can skip taking the lock in the no
 xid case - I actually think with just a little bit of more work, we
 can go even further, and get rid of the ReadNewTransactionId() call
 completely during snapshotting.

[ squint... ]  This goes a bit far for me.  In particular, I think this
will fail in the edge case when there are no live XIDs visible in
ProcArray.  You cannot go back and do ReadNewTransactionId afterward,
at least not without re-scanning the ProcArray a second time, which
makes it at best a questionable win.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-06 Thread Florian G. Pflug

Tom Lane wrote:

Florian G. Pflug [EMAIL PROTECTED] writes:

So I believe you're right, and we can skip taking the lock in the no
xid case - I actually think with just a little bit of more work, we
can go even further, and get rid of the ReadNewTransactionId() call
completely during snapshotting.


[ squint... ]  This goes a bit far for me.  In particular, I think this
will fail in the edge case when there are no live XIDs visible in
ProcArray.  You cannot go back and do ReadNewTransactionId afterward,
at least not without re-scanning the ProcArray a second time, which
makes it at best a questionable win.


Why would it? The idea was to remember the largest committed xid, and that
won't go away just because the proc array is rather empty xid-wise. Actually,
in that case the largest comitted xid+1 will (nearly) be what
ReadNewTransactionId() returns. (Nearly because the transaction with the xid
ReadNewTransactionId()-1 might have aborted, so largestCommittedXid might be
a bit further behind ReadNewTransactionId().)

(That slightly lagging of largestCommittedXid might cause some tuples not to
be VACUUMED though, so we might want to update largestCommittedXid for
ABORTS too, and probably rename it to largestNonRunningXid or whatever ;-) ).

I would go as far as saying that largestCommittedXid+1 is the natural choice
for xmax - after all, xmax is the cutoff point after which a xid *cannot*
be seen as committed, and largestCommittedXid+1 is the smallest xmax that
guarantees that we see xacts committed before the snapshot as committed.

The xmin computation won't change - apart from using some other initial value.

This would rid us of the rather complicated entanglement of XidGenLock and
the ProcArrayLock, lessen the lock contention, and reduce the average snapshot
size a bit.

greetings, Florian Pflug

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-06 Thread Florian G. Pflug

Tom Lane wrote:

Simon was complaining a bit ago that we still have problems with
excessive contention for the ProcArrayLock, and that much of this stems
from the need for transaction exit to take that lock exclusively.
The lazy-XID patch, as committed, doesn't help that situation at all,
saying

/*
 * Lock ProcArrayLock because that's what GetSnapshotData uses.
 * You might assume that we can skip this step if we had no
 * transaction id assigned, because the failure case outlined
 * in GetSnapshotData cannot happen in that case. This is true,
 * but we *still* need the lock guarantee that two concurrent
 * computations of the *oldest* xmin will get the same result.
 */


I think the comment is correct in principle - If we remove the oldest
xmin without locking, then two concurrent OldestXmin calculations
will get two different results. The question is if that has any
negative effects, though.


That leaves xmin, which AFAICS is
only interesting for the computations of GetOldestXmin() and
RecentGlobalXmin.  And I assert it doesn't matter if those numbers
advance asynchronously, so long as they never go backward.

Yes, the xmin is surely the only field that might need need the locking.

It was this comment in GetSnapshotData that made me keep the locking
in the first place:

 * It is sufficient to get shared lock on ProcArrayLock, even if we are
 * computing a serializable snapshot and therefore will be setting
 * MyProc-xmin. This is because any two backends that have overlapping
 * shared holds on ProcArrayLock will certainly compute the same xmin
 * (since no xact, in particular not the oldest, can exit the set of
 * running transactions while we hold ProcArrayLock --- see further
 * discussion just below). So it doesn't matter whether another backend
 * concurrently doing GetSnapshotData or GetOldestXmin sees our xmin as
 * set or not; he'd compute the same xmin for himself either way.
 * (We are assuming here that xmin can be set and read atomically,
 * just like xid.)

But now that I read this again, I think that comment is just missleading -
especially the part So it doesn't matter whether another backend concurrently
doing GetSnapshotData or GetOldestXmin sees our xmin as set or not; he'd compute
the same xmin for himself either way.
This sounds as if the Proc-xmin that *one* backend announces had
influence over the Proc-xmin that *another* backend might compute.
Which isn't true - it only influences the GlobalXmin that another backend might
compute.

So I believe you're right, and we can skip taking the lock in the no xid case -
I actually think with just a little bit of more work, we can go even further,
and get rid of the ReadNewTransactionId() call completely during snapshotting.

There are two things we must ensure when I comes to snapshots, commits and
xid assignment.

1) A transaction must either be not in progress, be in our snapshot, or
   have an xid = xmax.
2) If transaction A sees B as committed, and B sees C as committed, then
   A must see C as committed.

ad 1): We guarantee that by storing the xid in the proc array before releasing
   the XidGenLock. Therefore, when we later obtain our xmax value,
   we can be sure that we see all xacts in the proc array that have an
   xid  xmax and are in progress.

ad 2): We guarantee that by serializing snapshotting against committing. Since
   we use ReadNewTransactionId() as the snapshot's xmax this implies that
   we take the ProcArrayLock *before* reading our xmax value.

Now, ReadNewTransactionId() is actually larger than necessary as a xmax.
The minimal xmax that we can set is largest committed xid+1. We can
easily track that value during commit when we hold the ProcArrayLock
(If we have no xid, and therefor don't have to hold the lock, we also
don't need to update that value).

If we used this LatestCommittedXid as xmax, we'd still guarantee (2),
but without having to hold the XidGenLock during GetSnapshotData().

I wouldn't have dared to suggest this for 8.3, but since you came up with
locking improvements in the first place... ;-)

greetings, Florian Pflug


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate