Michael Paquier <mich...@paquier.xyz> wrote:

> On Mon, Nov 11, 2019 at 10:03:14AM +0100, Antonin Houska wrote:
> > This looks good to me.
> 
> Actually, no, this is not good.  I have been studying more the patch,
> and after stressing more this code path with a cluster having
> checksums enabled and shared_buffers at 1MB, I have been able to make
> a couple of page's LSNs go backwards with pgbench -s 100.  The cause
> was simply that the page got flushed with a newer LSN than what was
> returned by XLogSaveBufferForHint() before taking the buffer header
> lock, so updating only the LSN for a non-dirty page was simply
> guarding against that.

Interesting. Now that I know about the problem, I could have reproduced it
using gdb: MarkBufferDirtyHint() was called by 2 backends concurrently in such
a way that the first backend generates the LSN, but before it manages to
assign it to the page, another backend generates another LSN and sets it.

Can't we just apply the attached diff on the top of your patch?

Also I wonder how checksums helped you to discover the problem? Although I
could simulate the setting of lower LSN, I could not see any broken
checksum. And I wouldn't even expect that since FlushBuffer() copies the
buffer into backend local memory before it calculates the checksum.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index eba9a4716d..2bbc5a92c7 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -3540,11 +3540,14 @@ MarkBufferDirtyHint(Buffer buffer, bool buffer_std)
 		 * before using PageGetLSN(), which is enforced in
 		 * BufferGetLSNAtomic().
 		 *
+		 * Do not let LSN go backwards. This might happen if a concurrent
+		 * backend generated the LSN later that we did but applied before us.
+		 *
 		 * If checksums are enabled, you might think we should reset the
 		 * checksum here. That will happen when the page is written sometime
 		 * later in this checkpoint cycle.
 		 */
-		if (!XLogRecPtrIsInvalid(lsn))
+		if (!XLogRecPtrIsInvalid(lsn) && lsn > PageGetLSN(page))
 			PageSetLSN(page, lsn);
 
 		buf_state |= BM_DIRTY | BM_JUST_DIRTIED;

Reply via email to