On 2019-11-07 05:16, Michael Paquier wrote:
The current defaults of pg_basebackup have been thought so as the
backups taken have a good stability and so as monitoring is eased
thanks to --wal-method=stream and the use of replication slots.
Shouldn't the use of a least a temporary replication slot be mandatory
for the stability of the copy? It seems to me that there is a good
argument for having a second process which streams WAL on top of the
main backup process, and just use a WAL receiver for that.
Is this something that the walreceiver should be doing independent of this
patch?
There could be an argument for switching a WAL receiver to use a
temporary replication slot by default. Still, it seems to me that
this backup solution suffers from the same set of problems we have
spent years to fix with pg_basebackup with missing WAL files caused by
concurrent checkpoints removing things needed while the copy of the
main data folder and other tablespaces happens.
I looked into this. It seems trivial to make walsender create and use a
temporary replication slot by default if no permanent replication slot
is specified. This is basically the logic that pg_basebackup has but
done server-side. See attached patch for a demonstration. Any reason
not to do that?
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 6ae76011ece6a5900cc06e2350b0ccb930eb41a0 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <pe...@eisentraut.org>
Date: Sat, 9 Nov 2019 22:10:19 +0100
Subject: [PATCH] walsender uses a temporary replication slot by default
---
src/backend/replication/walsender.c | 9 +++++++--
src/bin/pg_basebackup/pg_basebackup.c | 8 +++++++-
2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/src/backend/replication/walsender.c
b/src/backend/replication/walsender.c
index 7f5671504f..c16455f3f0 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -563,6 +563,12 @@ StartReplication(StartReplicationCmd *cmd)
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
(errmsg("cannot use a logical
replication slot for physical replication"))));
}
+ else
+ {
+ char *slotname = psprintf("pg_walsender_%d", MyProcPid);
+
+ ReplicationSlotCreate(slotname, false, RS_TEMPORARY);
+ }
/*
* Select the timeline. If it was given explicitly by the client, use
@@ -703,8 +709,7 @@ StartReplication(StartReplicationCmd *cmd)
Assert(streamingDoneSending && streamingDoneReceiving);
}
- if (cmd->slotname)
- ReplicationSlotRelease();
+ ReplicationSlotRelease();
/*
* Copy is finished now. Send a single-row result set indicating the
next
diff --git a/src/bin/pg_basebackup/pg_basebackup.c
b/src/bin/pg_basebackup/pg_basebackup.c
index a9d162a7da..687bd331dd 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -68,6 +68,11 @@ typedef struct TablespaceList
*/
#define MINIMUM_VERSION_FOR_TEMP_SLOTS 100000
+/*
+ * Version 13 creates temporary slots automatically.
+ */
+#define MINIMUM_VERSION_FOR_AUTO_TEMP_SLOTS 130000
+
/*
* Different ways to include WAL
*/
@@ -569,7 +574,8 @@ StartLogStreamer(char *startpos, uint32 timeline, char
*sysidentifier)
"pg_xlog" : "pg_wal");
/* Temporary replication slots are only supported in 10 and newer */
- if (PQserverVersion(conn) < MINIMUM_VERSION_FOR_TEMP_SLOTS)
+ if (PQserverVersion(conn) < MINIMUM_VERSION_FOR_TEMP_SLOTS ||
+ PQserverVersion(conn) >= MINIMUM_VERSION_FOR_AUTO_TEMP_SLOTS)
temp_replication_slot = false;
/*
--
2.24.0