Hi I've already posted this idea, but I feel that I did explain it rather badly. So here comes a new try.

## Advertising

Currently, we do not assume that either the childXids array, nor the xid cache in the proc array are sorted by ascending xid order. I believe that we could simplify the code, further reduce the locking requirements, and enabled a transaction to de-overflow it's xid cache if we assume that those arrays are in ascending xid order. ** Sortedness ** Presumably, the existing code *already* guarantees that both the childXids and the xid cache *are* in fact sorted by ascending xid order - due to the following reasons: A transaction (including subtransactions) form a tree. Each xact (or node) may have any number of children - assume that they are added "left to right" in the order of their creation. For ever (sub)xact, all subtrees but the rightmost one is "closed" (meaning that no new nodes will ever be created inside them). That last assertion follows from the fact that each non-rightmost subtree is either aborted, or subcommitted.

`Since adding a new sibling to a (sub) xact is only possible if all the other`

`subtrees are subcommitted or aborted, it will surely get a higher xid than any`

`node in any sibling subtree. Since the xids in the subcommitted siblings where`

`already added to the original (sub) xact upon subcommit, the childXids of that`

xact will have the following structure: <xids of 1. subcommited child> <xids of 2. subcommited child> ... Each of the sublists are guaranteed to include only xids larger than those inside their predecessors - which leads to a completely sorted array in the whole. The proc-array cache is surely sorted by ascending xid order, because we add entries while we generate them (until we overflow). ** Benefits *** Since we know that both the childXids and the xid cache array are sorted, removing entries from the cache becomes much easier. Since we can only abort a rightmost subtree (All others are either already aborted, or subcommitted), we know that the to-be-aborted xids are the N largest xids in our whole transactions. To remove them from the proc array we overwrite their xids with InvalidTransactionId from right-to-left, and afterwards set the new array length. This allows as to replace the ExclusiveLock with a ShareLock. Since we overwrite the xids right-to-left, a snapshot take concurrently will miss some of the right-most xids. This is exactly the same situation as if we had aborted those xids one-by-one, instead of all in one go. (Xmin calculation are completely unaffected - the toplevel xid *not* removed this way, and that one is surely smaller than any subxact xid) The rather easy xid-cache remove algorithms will also detect if a previously overflowed cache will be de-overflowed by the removal. (This needs a little bit more checking - de-overflowing the xid cache is a bit trickey because we must not race with TransactionIdIsInProgress. But that seems doable, it'll just need the right ordering of things - maybe updating the CLOG *before* updating the proc array is already sufficient even). If there is interest in doing this, I can come up with a patch. greetings, Florian Pflug ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org