"Florian G. Pflug" <[EMAIL PROTECTED]> writes:
> Why would it? The idea was to remember the largest committed xid, and that
> won't go away just because the proc array is rather empty xid-wise.

I hadn't fully absorbed this idea last night, but now that I have, I'm
starting to think it's a good one.

> (That slightly lagging of largestCommittedXid might cause some tuples not to
> be VACUUMED though, so we might want to update largestCommittedXid for
> ABORTS too, and probably rename it to largestNonRunningXid or whatever ;-) ).

LargestCompletedXid, perhaps?

> This would rid us of the rather complicated entanglement of XidGenLock and
> the ProcArrayLock, lessen the lock contention, and reduce the average snapshot
> size a bit.

In view of Simon's earlier comments at
it seems like not having to take the XidGenLock during GetSnapshotData
ought to be a pretty serious win performance-wise, and it might open up
some other possibilities that are now foreclosed by the risk of deadlock
between the two locks.

I've spent the past hour or so trying to consolidate the comments in
GetSnapshotData and related places into a single chunk of text to be
added to src/backend/access/transam/README.  Attached is what I have so
far --- this incorporates the idea of not taking ProcArrayLock to exit
an XID-less transaction, but not yet Florian's idea.  I think it'd get
simpler if we changed to that, but am posting this for comments.

                        regards, tom lane

Interlocking transaction begin, transaction end, and snapshots

We try hard to minimize the amount of overhead and lock contention involved
in the frequent activities of beginning/ending a transaction and taking a
snapshot.  Unfortunately, we must have some interlocking for this, because
it is critical that all backends agree on the commit order of transactions.
For example, suppose an UPDATE in xact A is blocked by xact B's prior
update of the same row, and xact B is doing commit while xact C gets a
snapshot.  Xact A can complete and commit as soon as B releases its locks.
If xact C's GetSnapshotData sees xact B as still running, then it had
better see xact A as still running as well, or it will be able to see two
tuple versions - one deleted by xact B and one inserted by xact A.  That
is, if A thinks it committed after B, C had better think the same.  We
enforce this by not allowing any transaction to exit the set of running
transactions while a snapshot is being taken.  (This rule is probably
stronger than necessary, but see the next problem.)  The implementation
of this is that GetSnapshotData takes the ProcArrayLock in shared mode
(thereby allowing multiple backends to take snapshots in parallel), but
xact.c must take the ProcArrayLock in exclusive mode while clearing
MyProc->xid at transaction end (either commit or abort).

Here is another variant of the risk scenario:

1. Xact A is running (in Read Committed mode).
2. Xact C's GetSnapshotData reads next transaction ID into xmax, then is
   swapped out before it can acquire ProcArrayLock.
3. Xact B gets new XID (>= C's xmax), makes changes and commits.
4. Xact A changes some row R changed by xact B and commits.
5. Xact C finishes getting its snapshot data.  It sees xact A as done,
   but sees xact B as still running (since B >= xmax).

Now C will see R changed by xact B and then xact A, *but* does not see
other changes made by xact B.  If C is supposed to be in Serializable mode,
this is wrong.

To prevent this it is necessary that GetSnapshotData acquire ProcArrayLock
before it calls ReadNewTransactionId.  This prevents xact A from exiting
the set of running transactions seen by xact C.  Therefore both A and B
will be seen as still running => no inconsistency.

In short, then, the rule is that no transactions may exit the set of
currently-running transactions between the time we fetch xmax and the time
we finish building our snapshot.  However, this restriction only applies
to transactions that have an XID --- read-only transactions can end without
acquiring ProcArrayLock, since they don't affect anyone else's snapshot.

We must also require GetNewTransactionId to store the new XID into the
shared ProcArray before releasing XidGenLock.  This ensures that when
GetSnapshotData calls ReadNewTransactionId (which also takes XidGenLock),
all active XIDs before the returned value of nextXid are already present in
the ProcArray and can't be missed by GetSnapshotData.  Unfortunately, we
can't have GetNewTransactionId take ProcArrayLock to do this, else it could
deadlock against GetSnapshotData.  Therefore, we simply let
GetNewTransactionId store into MyProc->xid without any lock.  We are
thereby relying on fetch/store of an XID to be atomic, else other backends
might see a partially-set XID.  (NOTE: for multiprocessors that need
explicit memory access fence instructions, this means that
acquiring/releasing XidGenLock is just as necessary as acquiring/releasing
ProcArrayLock for GetSnapshotData to ensure it sees up-to-date xid fields.)
This also means that readers of the ProcArray xid fields must be careful to
fetch a value only once, rather than assume they can read it multiple times
and get the same answer each time.

Another important activity that uses the shared ProcArray is GetOldestXmin,
which must determine a lower bound for the oldest xmin of any active MVCC
snapshot, system-wide.  Each individual backend advertises the smallest
xmin of its own snapshots in MyProc->xmin, or zero if it currently has no
live snapshots (eg, if it's between transactions or hasn't yet set a
snapshot for a new transaction).  GetOldestXmin takes the MIN() of the
valid xmin fields.  It does this with only shared lock on ProcArrayLock,
which means there is a potential race condition against other backends
doing GetSnapshotData concurrently: we must be certain that a concurrent
backend that is about to set its xmin does not compute an xmin less than
what GetOldestXmin returns.  We ensure that by including all the active
XIDs into the MIN() calculation, along with the valid xmins.  The rule that
transactions can't exit without taking exclusive ProcArrayLock ensures that
concurrent holders of shared ProcArrayLock will compute the same minimum of
currently-active XIDs: no xact, in particular not the oldest, can exit
while we hold shared ProcArrayLock.  So GetOldestXmin's view of the minimum
active XID will be the same as that of any concurrent GetSnapshotData, and
so it can't produce an overestimate.  If there is no active transaction at
all, GetOldestXmin returns the result of ReadNewTransactionId.  Note that
two concurrent executions of GetOldestXmin might not see the same result
from ReadNewTransactionId --- but if there is a difference, the intervening
execution(s) of GetNewTransactionId must have stored their XIDs into the
ProcArray, so the later execution of GetOldestXmin will see them and
compute the same global xmin anyway.

GetSnapshotData also performs an oldest-xmin calculation (which had better
match GetOldestXmin's) and stores that into RecentGlobalXmin, which is used
for some tuple age cutoff checks where a fresh call of GetOldestXmin seems
too expensive.  Note that while it is certain that two concurrent
executions of GetSnapshotData will compute the same xmin for their own
snapshots, as argued above, it is not certain that they will arrive at the
same estimate of RecentGlobalXmin.  This is because we allow XID-less
transactions to clear their MyProc->xmin asynchronously (without taking
ProcArrayLock), so one execution might see what had been the oldest xmin,
and another not.  This is OK since RecentGlobalXmin need only be a valid
lower bound.  As noted above, we are already assuming that fetch/store
of the xid fields is atomic, so assuming it for xmin as well is no extra

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at


Reply via email to