Hello.

While looking a patch, I found that a standby with archive_mode=always
fails to archive segments under certain conditions.

A. Walreceiver is gracefully terminated just after a segment is
   finished.

B. Walreceiver is gracefully terminated while receiving filling chunks
  for a segment switch.

The two above are reprodusible (without distinction between the two)
using a helper patch. See below.

There's one more issue here.

C. Standby doesn't archive a segment until walreceiver receives any
  data for the next segment.

I'm not sure wehther we assume C as an issue.

The first attached patch fixes A and B. A side-effect of that is that
standby archives the previous segment of the streaming start
location. Concretely 00..0100..2 gets to be archived in the above case
(recovery starts at 0/3000000).  That behavior doesn't seem to be a
proble since the segment is a part of the standby's data anyway.

The second attached patch fixes all of A to C, but seems somewhat
redundant.

Any opnions and/or suggestions are welcome.


The attached files are:

1. v1-0001-Make-sure-standby-archives-all-segments.patch:
  Fix for A and B.

2. v1-0001-Make-sure-standby-archives-all-segments-immediate.patch:
  Fix for A, B and C.

3. repro.sh
  The reproducer shell script used below.

4. repro_helper.patch
  Helper patch for repro.sh for master and patch 1 above.

5. repro_helper2.patch
  Helper patch for repro.sh for patch 2 above.

=====
** REPRODUCER

The failure is reproducible with some code tweak.

1. Create a primary server with archive_mode=always then start it.
2. Create and start a standby.
3. touch /tmp/hoge

4. psql -c "create table t(); drop table t; select pg_switch_wal(); select 
pg_sleep(1); create table t(); drop table t; select pg_switch_wal();"

5. look into the archive directory of the standby.
   If no missing segments found in archive, repeat from 3.

The third attached shell script is a reproducer for the problem,
needing the aid of the fourth patch attached.

$ mkdir testdir
$ cd testdir
$ bash ..../repro.sh
....
After test 2:
Primary location: 0/8000310
Standby location: 0/8000310
# primary archive
000000010000000000000003
000000010000000000000004
000000010000000000000005
000000010000000000000006
000000010000000000000007
000000010000000000000008
# standby archive
000000010000000000000003
000000010000000000000005
000000010000000000000006
000000010000000000000008

The segment 4 is skipped by the issue A and 7 is skipped by the issue
B.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

>From afa907bca7db8ea6335d47bd02761f567591d553 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga....@gmail.com>
Date: Tue, 30 Jun 2020 14:21:30 +0900
Subject: [PATCH v1] Make sure standby archives all segments

Standby fails to archive a segment if standby is stopped just after a
segment is finished or stopped just after a segment swtich. Make sure
that walreceiver archives all segments by rechecking at start.
---
 src/backend/replication/walreceiver.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index d1ad75da87..680154365d 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -938,6 +938,23 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 				else
 					XLogArchiveNotify(xlogfname);
 			}
+			else if (XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
+			{
+				/*
+				 * If we are starting streaming at the beginning of a segment,
+				 * there may be casees where the previous segment have not been
+				 * archived yet.  Make sure it is archived.
+				 */
+				char		xlogfname[MAXFNAMELEN];
+				XLogSegNo	prevseg;
+
+				XLByteToPrevSeg(recptr, prevseg, wal_segment_size);
+				XLogFileName(xlogfname, ThisTimeLineID, prevseg,
+							 wal_segment_size);
+
+				/* Mark as ".ready" of not yet */
+				XLogArchiveCheckDone(xlogfname);
+			}
 			recvFile = -1;
 
 			/* Create/use new log file */
-- 
2.18.4

>From 7af716134dceb3bafce421dfeaffebf1e1e3e17d Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga....@gmail.com>
Date: Mon, 29 Jun 2020 16:12:01 +0900
Subject: [PATCH v1] Make sure standby archives all segments immediately

Standby may get a bit being late in archive, since walsender doesn't
archive a segment until it receives any data for the next segment, Fix
that by archiving just after a segment is finished.

Also, standby fails to archive a segment if standby is stopped just
after a segment is finished or stopped just after a segment
swtich. Make sure that walreceiver archives all segments by rechecking
at start.
---
 src/backend/replication/walreceiver.c | 82 ++++++++++++++++++---------
 1 file changed, 54 insertions(+), 28 deletions(-)

diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index d1ad75da87..831718c859 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -902,49 +902,34 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 	{
 		int			segbytes;
 
-		if (recvFile < 0 || !XLByteInSeg(recptr, recvSegNo, wal_segment_size))
+		/* Open the segment if not yet */
+		if (recvFile < 0)
 		{
 			bool		use_existent;
 
+			recvFileTLI = ThisTimeLineID;
+
 			/*
-			 * fsync() and close current file before we switch to next one. We
-			 * would otherwise have to reopen this file to fsync it later
+			 * If we are starting streaming at the beginning of a segment,
+			 * there may be the case where the previous segment have not been
+			 * archived yet.  Make sure it is archived.
 			 */
-			if (recvFile >= 0)
+			if (XLogArchiveMode == ARCHIVE_MODE_ALWAYS && recvSegNo == 0)
 			{
 				char		xlogfname[MAXFNAMELEN];
+				XLogSegNo	prevseg;
 
-				XLogWalRcvFlush(false);
+				XLByteToPrevSeg(recptr, prevseg, wal_segment_size);
+				XLogFileName(xlogfname, recvFileTLI, prevseg, wal_segment_size);
 
-				XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
-
-				/*
-				 * XLOG segment files will be re-read by recovery in startup
-				 * process soon, so we don't advise the OS to release cache
-				 * pages associated with the file like XLogFileClose() does.
-				 */
-				if (close(recvFile) != 0)
-					ereport(PANIC,
-							(errcode_for_file_access(),
-							 errmsg("could not close log segment %s: %m",
-									xlogfname)));
-
-				/*
-				 * Create .done file forcibly to prevent the streamed segment
-				 * from being archived later.
-				 */
-				if (XLogArchiveMode != ARCHIVE_MODE_ALWAYS)
-					XLogArchiveForceDone(xlogfname);
-				else
-					XLogArchiveNotify(xlogfname);
+				/* Mark as ".ready" of not yet */
+				XLogArchiveCheckDone(xlogfname);
 			}
-			recvFile = -1;
 
 			/* Create/use new log file */
 			XLByteToSeg(recptr, recvSegNo, wal_segment_size);
 			use_existent = true;
 			recvFile = XLogFileInit(recvSegNo, &use_existent, true);
-			recvFileTLI = ThisTimeLineID;
 		}
 
 		/* Calculate the start offset of the received logs */
@@ -985,6 +970,47 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 		buf += byteswritten;
 
 		LogstreamResult.Write = recptr;
+
+		/*
+		 * Close the current WAL segment if it is completed then let the file
+		 * be archived if needed.
+		 */
+		if (!XLByteInSeg(recptr, recvSegNo, wal_segment_size))
+		{
+			char		xlogfname[MAXFNAMELEN];
+
+			Assert (recvFile >= 0);
+
+			/*
+			 * fsync() and close current file before we switch to next one. We
+			 * would otherwise have to reopen this file to fsync it later
+			 */
+			XLogWalRcvFlush(false);
+
+			XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
+
+			/*
+			 * XLOG segment files will be re-read by recovery in startup
+			 * process soon, so we don't advise the OS to release cache
+			 * pages associated with the file like XLogFileClose() does.
+			 */
+			if (close(recvFile) != 0)
+				ereport(PANIC,
+						(errcode_for_file_access(),
+						 errmsg("could not close log segment %s: %m",
+								xlogfname)));
+
+			/*
+			 * Create .done file forcibly to prevent the streamed segment
+			 * from being archived later.
+			 */
+			if (XLogArchiveMode != ARCHIVE_MODE_ALWAYS)
+				XLogArchiveForceDone(xlogfname);
+			else
+				XLogArchiveNotify(xlogfname);
+
+			recvFile = -1;
+		}
 	}
 
 	/* Update shared-memory status */
-- 
2.18.4

#! /bin/bash

ROOT=`pwd`
LOGFILE="repro.log"
PGPORT1=15432
PGPORT2=15433

PGDATA1=$ROOT/reprodata1
ARCHDIR1=$ROOT/reproarc1
PGDATA2=$ROOT/reprodata2
ARCHDIR2=$ROOT/reproarc2


function cleanup {
        echo -n "Killing servers..."
        pg_ctl -D $PGDATA1 -m i stop
        pg_ctl -D $PGDATA2 -m i stop
        echo "done."
        exit 1
}

rm -r $PGDATA1 $PGDATA2 $ARCHDIR1 $ARCHDIR2
mkdir $ARCHDIR1 $ARCHDIR2

# Create primary
echo "# Creating primary"
initdb -D $PGDATA1 &>$LOGFILE
cat >> $PGDATA1/postgresql.conf <<EOF
wal_keep_segments=10
archive_mode=always
archive_command='cp %p $ARCHDIR1/%f'
EOF

# Start primary
echo "# Starting primary"
pg_ctl -D $PGDATA1 -o"-p $PGPORT1" start &>>$LOGFILE

# Create standby
echo "# Creating standby"
pg_basebackup -D $PGDATA2 -h /tmp -p $PGPORT1 &>>$LOGFILE
cat >> $PGDATA2/postgresql.conf <<EOF
archive_command='cp %p $ARCHDIR2/%f'
primary_conninfo='host=/tmp port=$PGPORT1'
EOF
touch $PGDATA2/standby.signal

trap cleanup ERR 2 3 15

# Start primary
echo "# Starting standby"
pg_ctl -D $PGDATA2 -o"-p $PGPORT2" start &>>$LOGFILE
sleep 3

echo "Start:"
echo -n "Primary location: "
psql -tAp $PGPORT1 -c "select pg_current_wal_lsn()"
echo -n "Standby location: "
psql -tAp $PGPORT2 -c "select pg_last_wal_receive_lsn()"

# Delocate from boundary..
psql -p $PGPORT1 -c "create table t(); drop table t" &>>$LOGFILE
sleep 1

# TEST 1: walreceiver stops just after a segment is completed
echo "# test 1" >> $LOGFILE
touch /tmp/hoge1
psql -p $PGPORT1 -c "create table t(a int); insert into t (select a from 
generate_series(0, 260000) a); drop table t;" &>>$LOGFILE
echo "# test 1 end" >> $LOGFILE

psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" 
&>>$LOGFILE
sleep 2

echo "After test 1:"
echo -n "Primary location: "
psql -tAp $PGPORT1 -c "select pg_current_wal_lsn()"
echo -n "Standby location: "
psql -tAp $PGPORT2 -c "select pg_last_wal_receive_lsn()"

psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" 
&>>$LOGFILE
psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" 
&>>$LOGFILE
sleep 2

# TEST 2: walreceiver stops while receiving filling chunks after a wal switch.
echo "# test 2" >> $LOGFILE
touch /tmp/hoge2
psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" 
&>>$LOGFILE
echo "# test 2 end" >> $LOGFILE

sleep 2

echo "After test 2:"
echo -n "Primary location: "
psql -tAp $PGPORT1 -c "select pg_current_wal_lsn()"
echo -n "Standby location: "
psql -tAp $PGPORT2 -c "select pg_last_wal_receive_lsn()"

# stop servers
pg_ctl -D $PGDATA1 stop &>>$LOGFILE
pg_ctl -D $PGDATA2 stop &>>$LOGFILE

#show last three archived segments
echo "# primary archive"
ls $ARCHDIR1 | egrep '[3-9]$'
echo "# standby archive"
ls $ARCHDIR2 | egrep '[3-9]$'
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index d1ad75da87..dcccef151d 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -985,6 +985,29 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 		buf += byteswritten;
 
 		LogstreamResult.Write = recptr;
+
+		{
+			/* fake oneshot SIGTERM just at segment end */
+			struct stat b;
+			char *sigfile1 = "/tmp/hoge1";
+			char *sigfile2 = "/tmp/hoge2";
+
+			if (LogstreamResult.Write % wal_segment_size == 0 &&
+				stat(sigfile1, &b) == 0)
+			{
+				unlink(sigfile1);
+				got_SIGTERM = true;
+				ereport(LOG,(errmsg("STOP BY trig1@%lX", LogstreamResult.Write)));
+			}
+
+			if (LogstreamResult.Write % wal_segment_size == 0x500000 &&
+				stat(sigfile2, &b) == 0)
+			{
+				unlink(sigfile2);
+				got_SIGTERM = true;
+				ereport(LOG,(errmsg("STOP BY trig2@%lX", LogstreamResult.Write)));
+			}
+		}
 	}
 
 	/* Update shared-memory status */
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 831718c859..b9a7c73ed7 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1011,6 +1011,29 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 
 			recvFile = -1;
 		}
+
+		{
+			/* fake oneshot SIGTERM just at segment end */
+			struct stat b;
+			char *sigfile1 = "/tmp/hoge1";
+			char *sigfile2 = "/tmp/hoge2";
+
+			if (LogstreamResult.Write % wal_segment_size == 0 &&
+				stat(sigfile1, &b) == 0)
+			{
+				unlink(sigfile1);
+				got_SIGTERM = true;
+				ereport(LOG,(errmsg("STOP BY trig1@%lX", LogstreamResult.Write)));
+			}
+
+			if (LogstreamResult.Write % wal_segment_size == 0x500000 &&
+				stat(sigfile2, &b) == 0)
+			{
+				unlink(sigfile2);
+				got_SIGTERM = true;
+				ereport(LOG,(errmsg("STOP BY trig2@%lX", LogstreamResult.Write)));
+			}
+		}
 	}
 
 	/* Update shared-memory status */

Reply via email to