On Tue, Apr 21, 2026 at 9:49 AM Alexander Korotkov <[email protected]> wrote: > > On Tue, Apr 14, 2026 at 7:22 PM Melanie Plageman > > > Yea, I agree that this seems like simply an oversight in 96ef3b8. And > > it seems safe to use MarkBufferDirty() here instead. > > I also think that usage of MarkBufferDirty() here is safe. If I > understood correctly. > 1) When wal_log_hints = on, should be completely safe. Even if we > have torn page after the crash, during recovery FPI from the primary > should come first.
I think FPIs from primary don't really matter here, since we are only talking about MarkBufferDirty() in XLogRecordPageWithFreespace(). If we change it to MarkBufferDirty() on the standby and the machine crashes mid-write leading to a checksum error, we'll just zero it out -- which is really your point 2. While FPIs from the primary will overwrite the standby's FSM page, they don't provide torn-page protection for modifications made by the standby as you could read the page between the torn write and replaying any FPI from the primary. > 2) When wal_log_hints = off, we can end up with torn pages not covered > by FPI. Without checksums, FSM can tolerate torn pages. With > checksums, that would result in zeroed pages. FSM can tolerate that > as well. And the last shouldn't happen too frequently. So, we should > finally get way better FSM state than it is now. Yes, I think the bottom line is that we can't get checksum errors reading FSM pages because of ZERO_ON_ERROR, so there is no reason to do MarkBufferDirtyHint() in recovery for FSM. It only leads to losing changes to the page. > Should we push it to all supported branches? I haven't looked at the code paths in previous versions, but as long as they are reading FSM pages with RBM_ZERO_ON_ERROR, I think it is safe to do so. It is a bug that is causing overly optimistic FSM numbers, but it's not a correctness issue like wrong results/data corruption etc. So, I think you could make an argument either way about fixing it. I don't know how much of Alexey's reported issue was this vs PageGetFreeSpace() in heap_xlog_visible(). The MarkBufferDirty() change is easy to fix, so it probably makes sense to do so. I haven't investigated more about the PageGetFreeSpace() issue. - Melanie
