On 2021/01/14 14:23, Tom Lane wrote:
Fujii Masao <[email protected]> writes:
On 2021/01/14 13:59, Michael Paquier wrote:
florican is telling that this test has some stability problems:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2021-01-14%2003%3A55%3A45
My guess is that the requested WAL file was removed unfortunately by
checkpoint because no replication slot is used and wal_keep_size is not set.
So easy fix is to set wal_keep_size to 512MB or other in that test. Thought?
florican did pass this test on the v13 branch, so I agree it's probably
a timing issue not any deeper bug. Your theory seems plausible.
Thanks for the check!
So, barring any objection, I will push the attached patch that sets
wal_keep_size in the test.
BTW, I included the URL to Michael's report [1] in the commit log. But this
URL doesn't seem to work fine maybe because <message-id> part includes
a slash character.
[1]
https://postgr.es/m/X//[email protected]
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From 2ed9eed25bfd7b6e22eade23e2c602af7123c731 Mon Sep 17 00:00:00 2001
From: Fujii Masao <[email protected]>
Date: Thu, 14 Jan 2021 14:37:01 +0900
Subject: [PATCH] Stabilize timeline switch regression test.
Commit fef5b47f6b added the regression test to check whether a standby is
able to follow a primary on a newer timeline when WAL archiving is enabled.
But the buildfarm member florican reported that this test failed because
the requested WAL segment was removed and replication failed. This is a
timing issue. Since neither replication slot is used nor wal_keep_size is set
in the test, checkpoint could remove the WAL segment that's still necessary
for replication.
This commit stabilizes the test by setting wal_keep_size.
Back-patch to v13 where the regression test that this commit stabilizes
was added.
Author: Fujii Masao
Discussion: https://postgr.es/m/X//[email protected]
---
src/test/recovery/t/004_timeline_switch.pl | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/src/test/recovery/t/004_timeline_switch.pl
b/src/test/recovery/t/004_timeline_switch.pl
index edadab790f..91a63f4e58 100644
--- a/src/test/recovery/t/004_timeline_switch.pl
+++ b/src/test/recovery/t/004_timeline_switch.pl
@@ -75,6 +75,10 @@ is($result, qq(2000), 'check content of standby 2');
# Initialize master node
my $node_master_2 = get_new_node('master_2');
$node_master_2->init(allows_streaming => 1, has_archiving => 1);
+$node_master_2->append_conf(
+ 'postgresql.conf', qq(
+wal_keep_size = 512MB
+));
$node_master_2->start;
# Take backup
--
2.27.0
From c7680a8a59218b78a71b5dc1a91d935576369735 Mon Sep 17 00:00:00 2001
From: Fujii Masao <[email protected]>
Date: Thu, 14 Jan 2021 14:37:01 +0900
Subject: [PATCH] Stabilize timeline switch regression test.
Commit fef5b47f6b added the regression test to check whether a standby is
able to follow a primary on a newer timeline when WAL archiving is enabled.
But the buildfarm member florican reported that this test failed because
the requested WAL segment was removed and replication failed. This is a
timing issue. Since neither replication slot is used nor wal_keep_size is set
in the test, checkpoint could remove the WAL segment that's still necessary
for replication.
This commit stabilizes the test by setting wal_keep_size.
Back-patch to v13 where the regression test that this commit stabilizes
was added.
Author: Fujii Masao
Discussion: https://postgr.es/m/X//[email protected]
---
src/test/recovery/t/004_timeline_switch.pl | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/src/test/recovery/t/004_timeline_switch.pl
b/src/test/recovery/t/004_timeline_switch.pl
index 8dad044db4..c8dbd8f9df 100644
--- a/src/test/recovery/t/004_timeline_switch.pl
+++ b/src/test/recovery/t/004_timeline_switch.pl
@@ -75,6 +75,10 @@ is($result, qq(2000), 'check content of standby 2');
# Initialize primary node
my $node_primary_2 = get_new_node('primary_2');
$node_primary_2->init(allows_streaming => 1, has_archiving => 1);
+$node_primary_2->append_conf(
+ 'postgresql.conf', qq(
+wal_keep_size = 512MB
+));
$node_primary_2->start;
# Take backup
--
2.27.0