Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > On Wed, Oct 30, 2019 at 02:44:18PM +0100, Antonin Houska wrote: > >Please consider this scenario (race conditions): > > > >1. FlushBuffer() has written the buffer but hasn't yet managed to clear the > >BM_DIRTY flag (however BM_JUST_DIRTIED could be cleared by now). > > > >2. Another backend modified a hint bit and called MarkBufferDirtyHint(). > > > >3. In MarkBufferDirtyHint(), if XLogHintBitIsNeeded() evaluates to true > >(e.g. due to checksums enabled), new LSN is computed, however it's not > >assigned to the page because the buffer is still dirty: > > > > if (!(buf_state & BM_DIRTY)) > > { > > ... > > > > if (!XLogRecPtrIsInvalid(lsn)) > > PageSetLSN(page, lsn); > > } > > > >4. MarkBufferDirtyHint() completes. > > > >5. In the first session, FlushBuffer()->TerminateBufferIO() will not clear > >BM_DIRTY because MarkBufferDirtyHint() has eventually set > >BM_JUST_DIRTIED. Thus the hint bit change itself will be written by the next > >call of FlushBuffer(). However page LSN is hasn't been updated so the > >requirement that WAL must be flushed first is not met. > > > >I think that PageSetLSN() should be called regardless BM_DIRTY. Do I miss any > >subtle detail? > > > > Isn't this prevented by locking of the buffer header? Both FlushBuffer > and MarkBufferDirtyHint do obtain that lock. I see MarkBufferDirtyHint > does a bit of work before, but that's related to BM_PERMANENT. > > If there really is a race condition, it shouldn't be that difficult to > trigger it by adding a sleep or a breakpoint, I guess.
Yes, I had verified it using gdb: inserted a row into a table, then attached gdb to checkpointer and stopped it when FlushBuffer() was about to call TerminateBufferIO(). Then, when scanning the table, I saw that MarkBufferDirtyHint() skipped the call of PageSetLSN(). Finally, before checkpointer unlocked the buffer header in TerminateBufferIO(), buf_state was 3553624066 ~ 0b11010011110100000000000000000010. With BM_DIRTY defined as #define BM_DIRTY (1U << 23) the flag appeared to be set. -- Antonin Houska Web: https://www.cybertec-postgresql.com