Hi, On 2020-04-01 11:15:14 -0400, Robert Haas wrote: > On Wed, Apr 1, 2020 at 2:40 AM Andres Freund <and...@anarazel.de> wrote: > > I added some debug output to print the mapping before/after changes by > > MaintainOldSnapshotTimeMapping() (note that I used timestamps relative > > to the server start in minutes/seconds to make it easier to interpret). > > > > And the output turns out to be something like: > > > > WARNING: old snapshot mapping at "before update" with head ts: 7, current > > entries: 8 max entries: 15, offset: 0 > > entry 0 (ring 0): min 7: xid 582921233 > > entry 1 (ring 1): min 8: xid 654154155 > > entry 2 (ring 2): min 9: xid 661972949 > > entry 3 (ring 3): min 10: xid 666899382 > > entry 4 (ring 4): min 11: xid 644169619 > > entry 5 (ring 5): min 12: xid 644169619 > > entry 6 (ring 6): min 13: xid 644169619 > > entry 7 (ring 7): min 14: xid 644169619 > > > > WARNING: head 420 s: updating existing bucket 4 for sec 660 with xmin > > 666899382 > > > > WARNING: old snapshot mapping at "after update" with head ts: 7, current > > entries: 8 max entries: 15, offset: 0 > > entry 0 (ring 0): min 7: xid 582921233 > > entry 1 (ring 1): min 8: xid 654154155 > > entry 2 (ring 2): min 9: xid 661972949 > > entry 3 (ring 3): min 10: xid 666899382 > > entry 4 (ring 4): min 11: xid 666899382 > > entry 5 (ring 5): min 12: xid 644169619 > > entry 6 (ring 6): min 13: xid 644169619 > > entry 7 (ring 7): min 14: xid 644169619 > > > > It's pretty obvious that the xids don't make a ton of sense, I think: > > They're not monotonically ordered. The same values exist multiple times, > > despite xids being constantly used. Also, despite the ringbuffer > > supposedly having 15 entries (that's snapshot_too_old = 5min + the 10 we > > always add), and the workload having run for 14min, we only have 8 > > entries. > > The function header comment for MaintainOldSnapshotTimeMapping could > hardly be more vague, as it's little more than a restatement of the > function name. However, it looks to me like the idea is that this > function might get called multiple times for the same or similar > values of whenTaken. I suppose that's the point of this code:
Right. We enforce whenTaken to be monotonic (cf. GetSnapshotCurrentTimestamp()), but since GetSnapshotCurrentTimestamp() reduces the granularity of the timestamp to one-minute (the AlignTimestampToMinuteBoundary() call), it's obviously possible to end up in the same bin as a previous > What I interpret this to be doing is saying - if we got a new call to > this function with a rounded-to-the-minute timestamp that we've seen > previously and for which we still have an entry, and if the XID passed > to this function is newer than the one passed by the previous call, > then advance the xid_by_minute[] bucket to the newer value. Now that > begs the question - what does this XID actually represent? The > comments don't seem to answer that question, not even the comments for > OldSnapshotControlData, which say that we should "Keep one xid per > minute for old snapshot error handling." but don't say which XIDs we > should keep or how they'll be used. However, the only call to > MaintainOldSnapshotTimeMapping() is in GetSnapshotData(). It appears > that we call this function each time a new snapshot is taken and pass > the current time (modulo some fiddling) and snapshot xmin. Given that, > one would expect that any updates to the map would be tight races, > i.e. a bunch of processes that all took their snapshots right around > the same time would all update the same map entry in quick succession, > with the newest value winning. Right. > And that make the debugging output which I quoted from your message > above really confusing. At this point, the "head timestamp" is 7 > minutes after this facility started up. The first we entry we have is > for minute 7, and the last is for minute 14. But the one we're > updating is for minute 11. How the heck can that happen? If I undestand what your reference correctly, I think that is because, due to the bug, the "need a new bucket" branch doesn't just extend by one bucket, it extends it by many in common cases. Basically filling buckets "into the future". the advance = ... variable in the branch will not always be 1, even when we continually call Maintain*. Here's some debug output showing that (slightly modified from the patch I previously sent): WARNING: old snapshot mapping at "before update" with head ts: 1, current entries: 2 max entries: 15, offset: 0 entry 0 (ring 0): min 1: xid 1089371384 entry 1 (ring 1): min 2: xid 1099553206 WARNING: head 1 min: filling 2 buckets starting at 0 for whenTaken 3 min, with xmin 1109840204 WARNING: old snapshot mapping at "after update" with head ts: 3, current entries: 4 max entries: 15, offset: 0 entry 0 (ring 0): min 3: xid 1089371384 entry 1 (ring 1): min 4: xid 1099553206 entry 2 (ring 2): min 5: xid 1109840204 entry 3 (ring 3): min 6: xid 1109840204 Note how the two new buckets have the same xid, and how we're inserting for "whenTaken 3 min", but we've filled the mapping up to minute 6. I don't think the calculation of the 'advance' variable is correct as is, even if we ignore the wrong setting of the head_timestamp variable. Greetings, Andres Freund