On 11 March 2017 at 05:09, Robert Haas <robertmh...@gmail.com> wrote:

> On the other
> hand, there really are two separate notions of the "oldest" XID.
> There's the oldest XID that we can safely look up, and then there's
> the oldest XID that we can't reuse.  These two are the same when no
> truncation is in progress, but when a truncation is in progress then
> they're different: the oldest XID that's safe to look up is the first
> one after whatever we're truncating away, but the oldest XID that we
> can't reuse is the newest one preceding the stuff that we're busy
> truncating.


My view here is that the oldest xid we cannot reuse is already guarded
by xidWrapLimit, which we advance after clog truncation. Whether as
this advances at the same time as or after we advance oldestXid and
truncate clog doesn't actually matter, we must just ensure that it
never advances _before_.

So tracking a second copy of oldestXid whose only purpose is to
recalculate xidWrapLimit serves no real purpose. It's redundant except
during vac_truncate_clog, during which time local state is sufficient
*if* we add oldestXid to the clog truncation xlog record, which we
must do anyway because:

Any number of locking hoop-jumping schemes fail to solve the problem
of outdated oldestXid information on standbys. Right now we truncate
clog and xlog the truncation before we write the new oldestXid limit
to xlog. In fact, we don't write the new xid limit to xlog until the
next checkpoint. So the standby has a huge window where its idea of
oldestXid is completely wrong, and unless we at least add the new
oldestXid to the clog truncation xlog record we can't fix that.

We only get away with this now because there's no way to look up an
arbitrary xid's status.

No locking scheme on the master can solve this, because the locks on
the master do not affect the standby or vice versa.

Therefore, we _must_ advance oldestXid (or a copy of it used only for
"oldest xid still in clog) before truncating clog.

If we're going to do that we might as well just make sure the
standby's xid limits are updated correctly when we truncate clog
rather than doing it lazily at checkpoints. Advance oldestXid before
truncating clog away, and record the new xid in the clog truncation
xlog record. On redo after master crash, and on standbys, we're
guaranteed to re-do the whole clog truncation operation - advance
oldestXid, truncate clog, advance xidWrapLimit etc - and everything
stays consistent.

I'll extract this part of the patch so it can be looked at separately,
it'll be clearer that way.

I think of it as slightly contracting then slightly expanding the xid
range window during clog truncation. Advance the oldest xid slightly
before the xidWrapLimit, so temporarily the range of xids is narrower
than 2^31. xlog it first so we ensure it's all redone on crash and on
standby. Because no lock is held throughout all of vac_truncate_clog,
make sure the ordering of the different phases between concurrent
vac_truncate_xlog runs doesn't matter.

 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to