On 13/11/2018 16:34, Andrew Gierth wrote: > So while investigating a case of this warning (in > UpdateMinRecoveryPoint): > > "xlog min recovery request %X/%X is past current point %X/%X" > > I noticed that it is issued even in cases where we know that > minRecoveryPoint is not yet valid, for example because we're waiting to > see XLOG_BACKUP_END before declaring consistency. > > But, you'd think, you shouldn't get this error because any page we > modify during recovery should have been restored from an FPI with a > suitably early LSN? For data pages that is correct, but not for VM or > (iff wal_log_hints or checksums are enabled) FSM pages. > > When we replay an operation that, for example, clears a bit in the VM, > the redo code will read in that VM page from disk, and because we're not > yet consistent and because _clearing_ a VM bit is not in itself > wal-logged and doesn't result in any FPI being generated for the VM > page, it could well read a VM page that has a far-future LSN from the > point of view of replay, and dirty it, causing a later eviction to try > and do UpdateMinRecoveryPoint with that future LSN. > > (I haven't investigated this aspect, but there also appears to be no > protection against torn pages in the VM when checksums are enabled? am I > missing something somewhere?) > > I'm less clear on the exact mechanisms, but when wal_log_hints (or > checksums) is on, FSM pages also get LSNs, sometimes, thanks to > MarkBufferDirtyHint, and at least some code paths can also do > MarkBufferDirty on FSM pages during recovery, which would cause their > eviction with possible future LSNs as with VM pages. > > This means that if you simply do an old-style base backup using > pg_start_backup/rsync/pg_stop_backup (on a sufficiently active system > and taking long enough) and then recover from it, you're likely to get a > log spammed with these errors for no very good reason. > > So it seems to me that issuing this error is a bug if the conditions > described are actually harmless, while if they're not harmless, then > obviously that is a bug. So _something_ needs fixing here, but I'm not > yet sufficiently confident of my analysis to say what. > > Opinions? > > (as a further point, it seems to me that backupEndRequired is a rather > misleadingly named variable, since what _actually_ determines whether an > XLOG_BACKUP_END record is expected is whether backupStartPoint is set. > backupEndRequired seems to change one error message and, questionably, > one decision about whether to do crash recovery before entering archive > recovery, but nothing else.)
Bump. I was the originator of this report. I work with Postgres every single day and I was spooked by these warnings. People with much less involvement would probably be terrified. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support