On 11 March 2017 at 05:09, Robert Haas <robertmh...@gmail.com> wrote:
> On the other > hand, there really are two separate notions of the "oldest" XID. > There's the oldest XID that we can safely look up, and then there's > the oldest XID that we can't reuse. These two are the same when no > truncation is in progress, but when a truncation is in progress then > they're different: the oldest XID that's safe to look up is the first > one after whatever we're truncating away, but the oldest XID that we > can't reuse is the newest one preceding the stuff that we're busy > truncating. Right. My view here is that the oldest xid we cannot reuse is already guarded by xidWrapLimit, which we advance after clog truncation. Whether as this advances at the same time as or after we advance oldestXid and truncate clog doesn't actually matter, we must just ensure that it never advances _before_. So tracking a second copy of oldestXid whose only purpose is to recalculate xidWrapLimit serves no real purpose. It's redundant except during vac_truncate_clog, during which time local state is sufficient *if* we add oldestXid to the clog truncation xlog record, which we must do anyway because: Any number of locking hoop-jumping schemes fail to solve the problem of outdated oldestXid information on standbys. Right now we truncate clog and xlog the truncation before we write the new oldestXid limit to xlog. In fact, we don't write the new xid limit to xlog until the next checkpoint. So the standby has a huge window where its idea of oldestXid is completely wrong, and unless we at least add the new oldestXid to the clog truncation xlog record we can't fix that. We only get away with this now because there's no way to look up an arbitrary xid's status. No locking scheme on the master can solve this, because the locks on the master do not affect the standby or vice versa. Therefore, we _must_ advance oldestXid (or a copy of it used only for "oldest xid still in clog) before truncating clog. If we're going to do that we might as well just make sure the standby's xid limits are updated correctly when we truncate clog rather than doing it lazily at checkpoints. Advance oldestXid before truncating clog away, and record the new xid in the clog truncation xlog record. On redo after master crash, and on standbys, we're guaranteed to re-do the whole clog truncation operation - advance oldestXid, truncate clog, advance xidWrapLimit etc - and everything stays consistent. I'll extract this part of the patch so it can be looked at separately, it'll be clearer that way. I think of it as slightly contracting then slightly expanding the xid range window during clog truncation. Advance the oldest xid slightly before the xidWrapLimit, so temporarily the range of xids is narrower than 2^31. xlog it first so we ensure it's all redone on crash and on standby. Because no lock is held throughout all of vac_truncate_clog, make sure the ordering of the different phases between concurrent vac_truncate_xlog runs doesn't matter. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers