On Mon, Mar 26, 2018 at 02:32:22PM +0530, Dilip Kumar wrote:
> On Fri, Mar 23, 2018 at 1:19 PM, Michael Paquier <mich...@paquier.xyz>
> wrote:
> In StartupXLOG, just before the CreateCheckPoint() call,  we are calling
> LocalSetXLogInsertAllowed().  So, I am thinking can we just remove
> InitXLogInsert from CreateCheckpoint. And, we don't need to move it to
> bootstrap.c.  Or am I missing something?

I have finally been able to spend more time on this issue, and checked
for a couple of hours all the calls to RecoveryInProgress() that could
be triggered within a critical section to see if the move I suggested
previously is worth it ot not as this would cost some memory for
standbys all the time, which would suck for many read-only sessions.

There are a couple of calls potentially happening in critical sections,
however except for the one in UpdateFullPageWrites() those are used for
sanity  checks in code paths that should never trigger it, take
XLogInsertBegin() for example.  So with assertions this would trigger
a failure before the real elog(ERROR) message shows up.

Hence, I am changing opinion still I think that instead of the patch you
proposed first we could just do a call to InitXLogInsert() before
entering the critical section.  This is also more consistent with what
CreateCheckpoint() does.  That's also less risky for a backpatch as your
patch increases the window between the beginning of the critical section
and the real moment where the check for RecoveryInProgress is needed.  A
comment explaining why the initialization needs to happen is also
essential.

All in all, this would give the simple patch attached.

Thoughts?
--
Michael
From 0fc3878375890aef5c0a6dc57d73c30aa97c88df Mon Sep 17 00:00:00 2001
From: Michael Paquier <mich...@paquier.xyz>
Date: Tue, 27 Mar 2018 16:38:37 +0900
Subject: [PATCH] Fix WAL insert when updating full_page_writes for
 checkpointer

At startup or when receiving a SIGHUP signal to update its configuration,
the checkpointer may need to update the shared memory for
full_page_writes and needs to write a WAL record about it.  However,
just after a promotion, there is a small window where it is possible
that the memory context in charge of building WAL records has not been
correctly initialized, leading to a crash of the checkpointer.  With
assertions enabled, this causes a failure when allocating memory in a
critical section.  And this fails when trying to build the FPW record
without assertions.

Report and patch by Dilip Kumar, reviewed by Michael Paquier.

Discussion:
    https://postgr.es/m/CAFiTN-u4BA8KXcQUWDPNgaKAjDXC%3DC2whnzBM8TAcv%3DstckYUw%40mail.gmail.com
---
 src/backend/access/transam/xlog.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index cb9c2a29cb..aae1131977 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9543,6 +9543,16 @@ UpdateFullPageWrites(void)
 	if (fullPageWrites == Insert->fullPageWrites)
 		return;
 
+	/*
+	 * Initialize InitXLogInsert working areas before entering the critical
+	 * section.  Normally, this is done by the first call to
+	 * RecoveryInProgress() or LocalSetXLogInsertAllowed(), but it can
+	 * be possible that this has not been initialized yet for a checkpointer
+	 * which updates the value of full_page_writes for a received SIGHUP or
+	 * at process startup.
+	 */
+	InitXLogInsert();
+
 	START_CRIT_SECTION();
 
 	/*
-- 
2.16.3

Attachment: signature.asc
Description: PGP signature

Reply via email to