On 10 March 2017 at 02:55, Robert Haas <robertmh...@gmail.com> wrote:

> Well, that's why I tried to advocate that their should be two separate
> XID limits in shared memory: leave what's there now the way it is, and
> then add a new limit for "don't try to look up XIDs before this point:
> splat".  I still think that'd be less risky.

I'm coming back to this "cold" after an extended break, so I hope I
get the details right.

TL;DR: doing it that way duplicates a bunch of stuff and is ugly
without offering significant benefit over just fixing the original.

I started out with the approach you suggested, but it turns out to be
less helpful than you'd think. Simply advancing a new lower limit
field before beginning truncation is no help; there's then a race
where the lower-limit field can be advanced after we test it but
before we actually do the SLRU read from clog. To guard against this
it's necessary for SLRU truncation to take an exclusive lock during
which it advances the lower bound. That way a concurrent reader can
take the lock in shared mode before reading the lower bound and hold
it until it finishes the clog read, knowing that vacuum cannot then
truncate the data out from under it because it'll block trying to
advance the lower limit.

A spinlock isn't suitable for this. While we can take the lock only
briefly to update the limit field in vac_truncate_clog, in
txid_status() we have to hold it from when we test the boundary
through to when we finish the SLRU clog lookup, and that lookup does
I/O and might sleep. If we release it after testing the lower bound
but before the SLRU lookup our race comes back since vacuum can jump
in and truncate it out from under us. So now we need a new LWLock used
only for vac_truncate_clog before advancing the clog truncation.

I implemented just that - a new ClogTruncateLog in the lwlocks array
and a new field in ShmemVariableCache for the lower xid bound, IIRC.
Other than requiring an extra lwlock acquisition for vac_truncate_clog
it works fine ... for the master.

But it doesn't fix the much bigger race on the standby. We only emit
WAL for xid limits after we truncate clog, and the clog truncation
record doesn't record the new limit.

So now we need a new, somewhat redundant, xlog record and redo
function for this lower clog bound pointer. Which, really, is only
actually tracking a slightly more up to date version of oldestXid.

At that point I was just papering around a race that should just be
fixed at its source instead. Advance oldestXid before truncating clog,
and write a clog truncation record that includes the new oldestXid.

So... I can go back to the old approach and just add the new xlog
record and redo method, new lwlock, new shmemvariablecache field, etc,
if you're really concerned this approach is risky. But I'd rather fix
the original problem instead.

It might be helpful if I separate out the patch that touches oldestXid
from the rest for separate review, so I'll update here with a 2-patch
series instead of a squashed single patch soon.

 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to