Updated patch attached. On 2019-09-18 10:31, Michael Paquier wrote:
- * Verify XLOG status looks valid. + * Check that contents look valid. */ - if (ControlFile->state < DB_SHUTDOWNED || - ControlFile->state > DB_IN_PRODUCTION || - !XRecOffIsValid(ControlFile->checkPoint)) + if (!XRecOffIsValid(ControlFile->checkPoint)) ereport(FATAL, Doesn't seem like a good idea to me to remove this sanity check for normal deployments, but actually you moved that down in StartupXLOG(). It seems to me tha this is unrelated and could be a separate patch so as the errors produced are more verbose. I think that we should also change that code to use a switch/case on ControlFile->state.
Done. Yes, this was really a change made to get more precise error messaged during debugging. It could be committed separately.
The current defaults of pg_basebackup have been thought so as the backups taken have a good stability and so as monitoring is eased thanks to --wal-method=stream and the use of replication slots. Shouldn't the use of a least a temporary replication slot be mandatory for the stability of the copy? It seems to me that there is a good argument for having a second process which streams WAL on top of the main backup process, and just use a WAL receiver for that.
Is this something that the walreceiver should be doing independent of this patch?
One problem which is not tackled here is what to do for the tablespace map. pg_basebackup has its own specific trick for that, and with that new feature we may want something equivalent? Not something to consider as a first stage of course.
The updated has support for tablespaces without mapping. I'm thinking about putting the mapping specification into a GUC list somehow. Shouldn't be too hard.
*/ -static void +void WriteControlFile(void) [...] -static void +void ReadControlFile(void) [...] If you begin to publish those routines, it seems to me that there could be more consolidation with controldata_utils.c which includes now a routine to update a control file.
Hmm, maybe long-term, but it seems too much dangerous surgery for this patch.
+#ifndef FRONTEND +extern void InitControlFile(uint64 sysidentifier); +extern void WriteControlFile(void); +extern void ReadControlFile(void); +#endif It would be nice to avoid that.
Fixed by renaming a function in pg_resetwal.c.
+ /* + * Wait until done. Start WAL receiver in the meantime, once base + * backup has received the starting position. + */ + while (BaseBackupPID != 0) + { + PG_SETMASK(&UnBlockSig); + pg_usleep(1000000L); + PG_SETMASK(&BlockSig); + primary_sysid = strtoull(walrcv_identify_system(wrconn, &primaryTLI), NULL, 10); No more strtol with base 10 stuff please :)
Hmm, why not? What's the replacement? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From ac34ece7665b62d542653cf12238973a1a45a18b Mon Sep 17 00:00:00 2001 From: Peter Eisentraut <pe...@eisentraut.org> Date: Mon, 28 Oct 2019 09:23:43 +0100 Subject: [PATCH v3] Base backup client as auxiliary backend process Discussion: https://www.postgresql.org/message-id/flat/61b8d18d-c922-ac99-b990-a31ba63cd...@2ndquadrant.com --- doc/src/sgml/protocol.sgml | 12 +- doc/src/sgml/ref/initdb.sgml | 17 + src/backend/access/transam/xlog.c | 184 ++++---- src/backend/bootstrap/bootstrap.c | 9 + src/backend/postmaster/pgstat.c | 6 + src/backend/postmaster/postmaster.c | 114 ++++- src/backend/replication/basebackup.c | 70 +++ .../libpqwalreceiver/libpqwalreceiver.c | 400 ++++++++++++++++++ src/backend/replication/repl_gram.y | 9 +- src/backend/replication/repl_scanner.l | 1 + src/backend/storage/file/fd.c | 36 +- src/bin/initdb/initdb.c | 39 +- src/bin/pg_resetwal/pg_resetwal.c | 6 +- src/include/access/xlog.h | 8 +- src/include/miscadmin.h | 2 + src/include/pgstat.h | 1 + src/include/replication/basebackup.h | 2 + src/include/replication/walreceiver.h | 4 + src/include/storage/fd.h | 2 +- src/include/utils/guc.h | 2 +- src/test/recovery/t/018_basebackup.pl | 29 ++ 21 files changed, 831 insertions(+), 122 deletions(-) create mode 100644 src/test/recovery/t/018_basebackup.pl diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 80275215e0..f54b820edf 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -2466,7 +2466,7 @@ <title>Streaming Replication Protocol</title> </varlistentry> <varlistentry> - <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [ <literal>NOVERIFY_CHECKSUMS</literal> ] + <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [ <literal>NOVERIFY_CHECKSUMS</literal> ] [ <literal>EXCLUDE_CONF</literal> ] <indexterm><primary>BASE_BACKUP</primary></indexterm> </term> <listitem> @@ -2576,6 +2576,16 @@ <title>Streaming Replication Protocol</title> </para> </listitem> </varlistentry> + + <varlistentry> + <term><literal>EXCLUDE_CONF</literal></term> + <listitem> + <para> + Do not copy configuration files, that is, files that end in + <filename>.conf</filename>. + </para> + </listitem> + </varlistentry> </variablelist> </para> <para> diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml index da5c8f5307..1261e02d59 100644 --- a/doc/src/sgml/ref/initdb.sgml +++ b/doc/src/sgml/ref/initdb.sgml @@ -286,6 +286,23 @@ <title>Options</title> </listitem> </varlistentry> + <varlistentry> + <term><option>-r</option></term> + <term><option>--replica</option></term> + <listitem> + <para> + Initialize a data directory for a physical replication replica. The + data directory will not be initialized with a full database system, + but will instead only contain a minimal set of files. A server that + is started on this data directory will first fetch a base backup and + then switch to standby mode. The connection information for the base + backup has to be configured by setting <xref + linkend="guc-primary-conninfo"/>, and other parameters as desired, + before the server is started. + </para> + </listitem> + </varlistentry> + <varlistentry> <term><option>-S</option></term> <term><option>--sync-only</option></term> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 2e3cc51006..6f49ccdada 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -904,8 +904,6 @@ static void CheckRecoveryConsistency(void); static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int whichChkpt, bool report); static bool rescanLatestTimeLine(void); -static void WriteControlFile(void); -static void ReadControlFile(void); static char *str_time(pg_time_t tnow); static bool CheckForStandbyTrigger(void); @@ -4481,7 +4479,7 @@ rescanLatestTimeLine(void) * ReadControlFile() verifies they are correct. We could split out the * I/O and compatibility-check functions, but there seems no need currently. */ -static void +void WriteControlFile(void) { int fd; @@ -4573,7 +4571,7 @@ WriteControlFile(void) XLOG_CONTROL_FILE))); } -static void +void ReadControlFile(void) { pg_crc32c crc; @@ -5079,6 +5077,41 @@ XLOGShmemInit(void) InitSharedLatch(&XLogCtl->recoveryWakeupLatch); } +void +InitControlFile(uint64 sysidentifier) +{ + char mock_auth_nonce[MOCK_AUTH_NONCE_LEN]; + + /* + * Generate a random nonce. This is used for authentication requests that + * will fail because the user does not exist. The nonce is used to create + * a genuine-looking password challenge for the non-existent user, in lieu + * of an actual stored password. + */ + if (!pg_strong_random(mock_auth_nonce, MOCK_AUTH_NONCE_LEN)) + ereport(PANIC, + (errcode(ERRCODE_INTERNAL_ERROR), + errmsg("could not generate secret authorization token"))); + + memset(ControlFile, 0, sizeof(ControlFileData)); + /* Initialize pg_control status fields */ + ControlFile->system_identifier = sysidentifier; + memcpy(ControlFile->mock_authentication_nonce, mock_auth_nonce, MOCK_AUTH_NONCE_LEN); + ControlFile->state = DB_SHUTDOWNED; + ControlFile->unloggedLSN = FirstNormalUnloggedLSN; + + /* Set important parameter values for use when replaying WAL */ + ControlFile->MaxConnections = MaxConnections; + ControlFile->max_worker_processes = max_worker_processes; + ControlFile->max_wal_senders = max_wal_senders; + ControlFile->max_prepared_xacts = max_prepared_xacts; + ControlFile->max_locks_per_xact = max_locks_per_xact; + ControlFile->wal_level = wal_level; + ControlFile->wal_log_hints = wal_log_hints; + ControlFile->track_commit_timestamp = track_commit_timestamp; + ControlFile->data_checksum_version = bootstrap_data_checksum_version; +} + /* * This func must be called ONCE on system install. It creates pg_control * and the initial XLOG segment. @@ -5094,7 +5127,6 @@ BootStrapXLOG(void) char *recptr; bool use_existent; uint64 sysidentifier; - char mock_auth_nonce[MOCK_AUTH_NONCE_LEN]; struct timeval tv; pg_crc32c crc; @@ -5115,17 +5147,6 @@ BootStrapXLOG(void) sysidentifier |= ((uint64) tv.tv_usec) << 12; sysidentifier |= getpid() & 0xFFF; - /* - * Generate a random nonce. This is used for authentication requests that - * will fail because the user does not exist. The nonce is used to create - * a genuine-looking password challenge for the non-existent user, in lieu - * of an actual stored password. - */ - if (!pg_strong_random(mock_auth_nonce, MOCK_AUTH_NONCE_LEN)) - ereport(PANIC, - (errcode(ERRCODE_INTERNAL_ERROR), - errmsg("could not generate secret authorization token"))); - /* First timeline ID is always 1 */ ThisTimeLineID = 1; @@ -5233,30 +5254,12 @@ BootStrapXLOG(void) openLogFile = -1; /* Now create pg_control */ - - memset(ControlFile, 0, sizeof(ControlFileData)); - /* Initialize pg_control status fields */ - ControlFile->system_identifier = sysidentifier; - memcpy(ControlFile->mock_authentication_nonce, mock_auth_nonce, MOCK_AUTH_NONCE_LEN); - ControlFile->state = DB_SHUTDOWNED; + InitControlFile(sysidentifier); ControlFile->time = checkPoint.time; ControlFile->checkPoint = checkPoint.redo; ControlFile->checkPointCopy = checkPoint; - ControlFile->unloggedLSN = FirstNormalUnloggedLSN; - - /* Set important parameter values for use when replaying WAL */ - ControlFile->MaxConnections = MaxConnections; - ControlFile->max_worker_processes = max_worker_processes; - ControlFile->max_wal_senders = max_wal_senders; - ControlFile->max_prepared_xacts = max_prepared_xacts; - ControlFile->max_locks_per_xact = max_locks_per_xact; - ControlFile->wal_level = wal_level; - ControlFile->wal_log_hints = wal_log_hints; - ControlFile->track_commit_timestamp = track_commit_timestamp; - ControlFile->data_checksum_version = bootstrap_data_checksum_version; /* some additional ControlFile fields are set in WriteControlFile() */ - WriteControlFile(); /* Bootstrap the commit log, too */ @@ -6231,45 +6234,59 @@ StartupXLOG(void) CurrentResourceOwner = AuxProcessResourceOwner; /* - * Verify XLOG status looks valid. + * Check that contents look valid. */ - if (ControlFile->state < DB_SHUTDOWNED || - ControlFile->state > DB_IN_PRODUCTION || - !XRecOffIsValid(ControlFile->checkPoint)) + if (!XRecOffIsValid(ControlFile->checkPoint)) ereport(FATAL, - (errmsg("control file contains invalid data"))); + (errmsg("control file contains invalid checkpoint location"))); - if (ControlFile->state == DB_SHUTDOWNED) + switch (ControlFile->state) { - /* This is the expected case, so don't be chatty in standalone mode */ - ereport(IsPostmasterEnvironment ? LOG : NOTICE, - (errmsg("database system was shut down at %s", - str_time(ControlFile->time)))); + case DB_SHUTDOWNED: + /* This is the expected case, so don't be chatty in standalone mode */ + ereport(IsPostmasterEnvironment ? LOG : NOTICE, + (errmsg("database system was shut down at %s", + str_time(ControlFile->time)))); + break; + + case DB_SHUTDOWNED_IN_RECOVERY: + ereport(LOG, + (errmsg("database system was shut down in recovery at %s", + str_time(ControlFile->time)))); + break; + + case DB_SHUTDOWNING: + ereport(LOG, + (errmsg("database system shutdown was interrupted; last known up at %s", + str_time(ControlFile->time)))); + break; + + case DB_IN_CRASH_RECOVERY: + ereport(LOG, + (errmsg("database system was interrupted while in recovery at %s", + str_time(ControlFile->time)), + errhint("This probably means that some data is corrupted and" + " you will have to use the last backup for recovery."))); + break; + + case DB_IN_ARCHIVE_RECOVERY: + ereport(LOG, + (errmsg("database system was interrupted while in recovery at log time %s", + str_time(ControlFile->checkPointCopy.time)), + errhint("If this has occurred more than once some data might be corrupted" + " and you might need to choose an earlier recovery target."))); + break; + + case DB_IN_PRODUCTION: + ereport(LOG, + (errmsg("database system was interrupted; last known up at %s", + str_time(ControlFile->time)))); + break; + + default: + ereport(FATAL, + (errmsg("control file contains invalid database cluster state"))); } - else if (ControlFile->state == DB_SHUTDOWNED_IN_RECOVERY) - ereport(LOG, - (errmsg("database system was shut down in recovery at %s", - str_time(ControlFile->time)))); - else if (ControlFile->state == DB_SHUTDOWNING) - ereport(LOG, - (errmsg("database system shutdown was interrupted; last known up at %s", - str_time(ControlFile->time)))); - else if (ControlFile->state == DB_IN_CRASH_RECOVERY) - ereport(LOG, - (errmsg("database system was interrupted while in recovery at %s", - str_time(ControlFile->time)), - errhint("This probably means that some data is corrupted and" - " you will have to use the last backup for recovery."))); - else if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY) - ereport(LOG, - (errmsg("database system was interrupted while in recovery at log time %s", - str_time(ControlFile->checkPointCopy.time)), - errhint("If this has occurred more than once some data might be corrupted" - " and you might need to choose an earlier recovery target."))); - else if (ControlFile->state == DB_IN_PRODUCTION) - ereport(LOG, - (errmsg("database system was interrupted; last known up at %s", - str_time(ControlFile->time)))); /* This is just to allow attaching to startup process with a debugger */ #ifdef XLOG_REPLAY_DELAY @@ -6284,24 +6301,31 @@ StartupXLOG(void) */ ValidateXLOGDirectoryStructure(); - /*---------- + /* * If we previously crashed, perform a couple of actions: + * * - The pg_wal directory may still include some temporary WAL segments - * used when creating a new segment, so perform some clean up to not - * bloat this path. This is done first as there is no point to sync this - * temporary data. - * - There might be data which we had written, intending to fsync it, - * but which we had not actually fsync'd yet. Therefore, a power failure - * in the near future might cause earlier unflushed writes to be lost, - * even though more recent data written to disk from here on would be - * persisted. To avoid that, fsync the entire data directory. - *--------- + * used when creating a new segment, so perform some clean up to not + * bloat this path. This is done first as there is no point to sync + * this temporary data. + * + * - There might be data which we had written, intending to fsync it, but + * which we had not actually fsync'd yet. Therefore, a power failure + * in the near future might cause earlier unflushed writes to be lost, + * even though more recent data written to disk from here on would be + * persisted. To avoid that, fsync the entire data directory. Errors + * are logged but not considered fatal. Aborting on error would result + * in failure to start for harmless cases such as read-only files in + * the data directory, and that's not good either. + * + * Note that if we previously crashed due to a PANIC on fsync(), we'll + * be rewriting all changes again during recovery. */ if (ControlFile->state != DB_SHUTDOWNED && ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY) { RemoveTempXlogFiles(); - SyncDataDirectory(); + SyncDataDirectory(true, LOG); } /* diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c index 9238fbe98d..a8b1ffd08a 100644 --- a/src/backend/bootstrap/bootstrap.c +++ b/src/backend/bootstrap/bootstrap.c @@ -36,6 +36,7 @@ #include "postmaster/bgwriter.h" #include "postmaster/startup.h" #include "postmaster/walwriter.h" +#include "replication/basebackup.h" #include "replication/walreceiver.h" #include "storage/bufmgr.h" #include "storage/bufpage.h" @@ -326,6 +327,9 @@ AuxiliaryProcessMain(int argc, char *argv[]) case StartupProcess: statmsg = pgstat_get_backend_desc(B_STARTUP); break; + case BaseBackupProcess: + statmsg = pgstat_get_backend_desc(B_BASE_BACKUP); + break; case BgWriterProcess: statmsg = pgstat_get_backend_desc(B_BG_WRITER); break; @@ -451,6 +455,11 @@ AuxiliaryProcessMain(int argc, char *argv[]) StartupProcessMain(); proc_exit(1); /* should never return */ + case BaseBackupProcess: + /* don't set signals, basebackup has its own agenda */ + BaseBackupMain(); + proc_exit(1); /* should never return */ + case BgWriterProcess: /* don't set signals, bgwriter has its own agenda */ BackgroundWriterMain(); diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 011076c3e3..5e7907f258 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -2934,6 +2934,9 @@ pgstat_bestart(void) case StartupProcess: lbeentry.st_backendType = B_STARTUP; break; + case BaseBackupProcess: + lbeentry.st_backendType = B_BASE_BACKUP; + break; case BgWriterProcess: lbeentry.st_backendType = B_BG_WRITER; break; @@ -4289,6 +4292,9 @@ pgstat_get_backend_desc(BackendType backendType) case B_BG_WORKER: backendDesc = "background worker"; break; + case B_BASE_BACKUP: + backendDesc = "base backup"; + break; case B_BG_WRITER: backendDesc = "background writer"; break; diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 5f30359165..51912bf718 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -116,6 +116,7 @@ #include "postmaster/postmaster.h" #include "postmaster/syslogger.h" #include "replication/logicallauncher.h" +#include "replication/walreceiver.h" #include "replication/walsender.h" #include "storage/fd.h" #include "storage/ipc.h" @@ -248,6 +249,7 @@ bool restart_after_crash = true; /* PIDs of special child processes; 0 when not running */ static pid_t StartupPID = 0, + BaseBackupPID = 0, BgWriterPID = 0, CheckpointerPID = 0, WalWriterPID = 0, @@ -539,6 +541,7 @@ static void ShmemBackendArrayRemove(Backend *bn); #endif /* EXEC_BACKEND */ #define StartupDataBase() StartChildProcess(StartupProcess) +#define StartBaseBackup() StartChildProcess(BaseBackupProcess) #define StartBackgroundWriter() StartChildProcess(BgWriterProcess) #define StartCheckpointer() StartChildProcess(CheckpointerProcess) #define StartWalWriter() StartChildProcess(WalWriterProcess) @@ -572,6 +575,8 @@ PostmasterMain(int argc, char *argv[]) bool listen_addr_saved = false; int i; char *output_config_variable = NULL; + struct stat stat_buf; + bool basebackup_signal_file_found = false; InitProcessGlobals(); @@ -886,12 +891,27 @@ PostmasterMain(int argc, char *argv[]) /* Verify that DataDir looks reasonable */ checkDataDir(); - /* Check that pg_control exists */ - checkControlFile(); - /* And switch working directory into it */ ChangeToDataDir(); + if (stat(BASEBACKUP_SIGNAL_FILE, &stat_buf) == 0) + { + int fd; + + fd = BasicOpenFilePerm(STANDBY_SIGNAL_FILE, O_RDWR | PG_BINARY, + S_IRUSR | S_IWUSR); + if (fd >= 0) + { + (void) pg_fsync(fd); + close(fd); + } + basebackup_signal_file_found = true; + } + + /* Check that pg_control exists */ + if (!basebackup_signal_file_found) + checkControlFile(); + /* * Check for invalid combinations of GUC settings. */ @@ -970,7 +990,8 @@ PostmasterMain(int argc, char *argv[]) * processes will inherit the correct function pointer and not need to * repeat the test. */ - LocalProcessControlFile(false); + if (!basebackup_signal_file_found) + LocalProcessControlFile(false); /* * Initialize SSL library, if specified. @@ -1386,6 +1407,39 @@ PostmasterMain(int argc, char *argv[]) */ AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STARTING); + if (basebackup_signal_file_found) + { + BaseBackupPID = StartBaseBackup(); + + /* + * Wait until done. Start WAL receiver in the meantime, once base + * backup has received the starting position. + */ + while (BaseBackupPID != 0) + { + PG_SETMASK(&UnBlockSig); + pg_usleep(1000000L); + PG_SETMASK(&BlockSig); + MaybeStartWalReceiver(); + } + + /* + * XXX Shut down WAL receiver. It will be restarted later in xlog.c, + * and that will complain if it's already running. + */ + ShutdownWalRcv(); + + /* + * Base backup done, now signal standby mode. + */ + durable_rename(BASEBACKUP_SIGNAL_FILE, STANDBY_SIGNAL_FILE, FATAL); + + /* + * Reread the control file that came in with the base backup. + */ + ReadControlFile(); + } + /* * We're ready to rock and roll... */ @@ -2665,6 +2719,8 @@ SIGHUP_handler(SIGNAL_ARGS) SignalChildren(SIGHUP); if (StartupPID != 0) signal_child(StartupPID, SIGHUP); + if (BaseBackupPID != 0) + signal_child(BaseBackupPID, SIGHUP); if (BgWriterPID != 0) signal_child(BgWriterPID, SIGHUP); if (CheckpointerPID != 0) @@ -2824,6 +2880,8 @@ pmdie(SIGNAL_ARGS) if (StartupPID != 0) signal_child(StartupPID, SIGTERM); + if (BaseBackupPID != 0) + signal_child(BaseBackupPID, SIGTERM); if (BgWriterPID != 0) signal_child(BgWriterPID, SIGTERM); if (WalReceiverPID != 0) @@ -3062,6 +3120,23 @@ reaper(SIGNAL_ARGS) continue; } + /* + * Was it the base backup process? + */ + if (pid == BaseBackupPID) + { + BaseBackupPID = 0; + if (EXIT_STATUS_0(exitstatus)) + ; + else if (EXIT_STATUS_1(exitstatus)) + ereport(FATAL, + (errmsg("base backup failed"))); + else + HandleChildCrash(pid, exitstatus, + _("base backup process")); + continue; + } + /* * Was it the bgwriter? Normal exit can be ignored; we'll start a new * one at the next iteration of the postmaster's main loop, if @@ -3583,6 +3658,18 @@ HandleChildCrash(int pid, int exitstatus, const char *procname) StartupStatus = STARTUP_SIGNALED; } + /* Take care of the base backup process too */ + if (pid == BaseBackupPID) + BaseBackupPID = 0; + else if (BaseBackupPID != 0 && take_action) + { + ereport(DEBUG2, + (errmsg_internal("sending %s to process %d", + (SendStop ? "SIGSTOP" : "SIGQUIT"), + (int) BaseBackupPID))); + signal_child(BaseBackupPID, (SendStop ? SIGSTOP : SIGQUIT)); + } + /* Take care of the bgwriter too */ if (pid == BgWriterPID) BgWriterPID = 0; @@ -3817,6 +3904,7 @@ PostmasterStateMachine(void) if (CountChildren(BACKEND_TYPE_NORMAL | BACKEND_TYPE_WORKER) == 0 && StartupPID == 0 && WalReceiverPID == 0 && + BaseBackupPID == 0 && BgWriterPID == 0 && (CheckpointerPID == 0 || (!FatalError && Shutdown < ImmediateShutdown)) && @@ -3911,6 +3999,7 @@ PostmasterStateMachine(void) /* These other guys should be dead already */ Assert(StartupPID == 0); Assert(WalReceiverPID == 0); + Assert(BaseBackupPID == 0); Assert(BgWriterPID == 0); Assert(CheckpointerPID == 0); Assert(WalWriterPID == 0); @@ -4094,6 +4183,8 @@ TerminateChildren(int signal) if (signal == SIGQUIT || signal == SIGKILL) StartupStatus = STARTUP_SIGNALED; } + if (BaseBackupPID != 0) + signal_child(BgWriterPID, signal); if (BgWriterPID != 0) signal_child(BgWriterPID, signal); if (CheckpointerPID != 0) @@ -4919,6 +5010,7 @@ SubPostmasterMain(int argc, char *argv[]) strcmp(argv[1], "--forkavlauncher") == 0 || strcmp(argv[1], "--forkavworker") == 0 || strcmp(argv[1], "--forkboot") == 0 || + strcmp(argv[1], "--forkbasebackup") == 0 || strncmp(argv[1], "--forkbgworker=", 15) == 0) PGSharedMemoryReAttach(); else @@ -4958,7 +5050,8 @@ SubPostmasterMain(int argc, char *argv[]) * (re-)read control file, as it contains config. The postmaster will * already have read this, but this process doesn't know about that. */ - LocalProcessControlFile(false); + if (strcmp(argv[1], "--forkbasebackup") != 0) + LocalProcessControlFile(false); /* * Reload any libraries that were preloaded by the postmaster. Since we @@ -5019,7 +5112,8 @@ SubPostmasterMain(int argc, char *argv[]) /* And run the backend */ BackendRun(&port); /* does not return */ } - if (strcmp(argv[1], "--forkboot") == 0) + if (strcmp(argv[1], "--forkboot") == 0 || + strcmp(argv[1], "--forkbasebackup") == 0) { /* Restore basic shared memory pointers */ InitShmemAccess(UsedShmemSegAddr); @@ -5431,7 +5525,7 @@ StartChildProcess(AuxProcType type) av[ac++] = "postgres"; #ifdef EXEC_BACKEND - av[ac++] = "--forkboot"; + av[ac++] = (type == BaseBackupProcess) ? "--forkbasebackup" : "--forkboot"; av[ac++] = NULL; /* filled in by postmaster_forkexec */ #endif @@ -5475,6 +5569,10 @@ StartChildProcess(AuxProcType type) ereport(LOG, (errmsg("could not fork startup process: %m"))); break; + case BaseBackupProcess: + ereport(LOG, + (errmsg("could not fork base backup process: %m"))); + break; case BgWriterProcess: ereport(LOG, (errmsg("could not fork background writer process: %m"))); @@ -5616,7 +5714,7 @@ static void MaybeStartWalReceiver(void) { if (WalReceiverPID == 0 && - (pmState == PM_STARTUP || pmState == PM_RECOVERY || + (pmState == PM_INIT || pmState == PM_STARTUP || pmState == PM_RECOVERY || pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) && Shutdown == NoShutdown) { diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index d0f210de8c..388525500e 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -29,6 +29,7 @@ #include "port.h" #include "postmaster/syslogger.h" #include "replication/basebackup.h" +#include "replication/walreceiver.h" #include "replication/walsender.h" #include "replication/walsender_private.h" #include "storage/bufpage.h" @@ -38,6 +39,7 @@ #include "storage/ipc.h" #include "storage/reinit.h" #include "utils/builtins.h" +#include "utils/guc.h" #include "utils/ps_status.h" #include "utils/relcache.h" #include "utils/timestamp.h" @@ -123,6 +125,9 @@ static long long int total_checksum_failures; /* Do not verify checksums. */ static bool noverify_checksums = false; +/* Do not copy config files. */ +static bool exclude_conf = false; + /* * The contents of these directories are removed or recreated during server * start so they are not included in backups. The directories themselves are @@ -652,6 +657,7 @@ parse_basebackup_options(List *options, basebackup_options *opt) bool o_maxrate = false; bool o_tablespace_map = false; bool o_noverify_checksums = false; + bool o_exclude_conf = false; MemSet(opt, 0, sizeof(*opt)); foreach(lopt, options) @@ -740,6 +746,15 @@ parse_basebackup_options(List *options, basebackup_options *opt) noverify_checksums = true; o_noverify_checksums = true; } + else if (strcmp(defel->defname, "exclude_conf") == 0) + { + if (o_exclude_conf) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("duplicate option \"%s\"", defel->defname))); + exclude_conf = true; + o_exclude_conf = true; + } else elog(ERROR, "option \"%s\" not recognized", defel->defname); @@ -1149,6 +1164,18 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces, continue; } + if (exclude_conf) + { + char *dot = strrchr(de->d_name, '.'); + if (dot && strcmp(dot, ".conf") == 0) + { + elog(DEBUG2, + "configuration file \"%s\" excluded from backup", + de->d_name); + continue; + } + } + snprintf(pathbuf, sizeof(pathbuf), "%s/%s", path, de->d_name); /* Skip pg_control here to back up it last */ @@ -1743,3 +1770,46 @@ throttle(size_t increment) */ throttled_last = GetCurrentTimestamp(); } + + +/* + * base backup worker process (client) main function + */ +void +BaseBackupMain(void) +{ + WalReceiverConn *wrconn = NULL; + char *err; + TimeLineID primaryTLI; + uint64 primary_sysid; + + /* Load the libpq-specific functions */ + load_file("libpqwalreceiver", false); + if (WalReceiverFunctions == NULL) + elog(ERROR, "libpqwalreceiver didn't initialize correctly"); + + /* Establish the connection to the primary */ + wrconn = walrcv_connect(PrimaryConnInfo, false, cluster_name[0] ? cluster_name : "basebackup", &err); + if (!wrconn) + ereport(ERROR, + (errmsg("could not connect to the primary server: %s", err))); + + /* + * Get the remote sysid and stick it into the local control file, so that + * the walreceiver is happy. The control file will later be overwritten + * by the base backup. + */ + primary_sysid = strtoull(walrcv_identify_system(wrconn, &primaryTLI), NULL, 10); + InitControlFile(primary_sysid); + WriteControlFile(); + + walrcv_base_backup(wrconn); + + walrcv_disconnect(wrconn); + + SyncDataDirectory(false, ERROR); + + ereport(LOG, + (errmsg("base backup completed"))); + proc_exit(0); +} diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c index 6eba08a920..e45bce830f 100644 --- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c +++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c @@ -17,8 +17,14 @@ #include "postgres.h" #include <unistd.h> +#include <sys/stat.h> #include <sys/time.h> +#ifdef USE_SYSTEMD +#include <systemd/sd-daemon.h> +#endif + +#include "common/string.h" #include "libpq-fe.h" #include "pqexpbuffer.h" #include "access/xlog.h" @@ -27,10 +33,13 @@ #include "mb/pg_wchar.h" #include "miscadmin.h" #include "pgstat.h" +#include "pgtar.h" #include "replication/walreceiver.h" #include "utils/builtins.h" +#include "utils/guc.h" #include "utils/memutils.h" #include "utils/pg_lsn.h" +#include "utils/ps_status.h" #include "utils/tuplestore.h" PG_MODULE_MAGIC; @@ -61,6 +70,7 @@ static int libpqrcv_server_version(WalReceiverConn *conn); static void libpqrcv_readtimelinehistoryfile(WalReceiverConn *conn, TimeLineID tli, char **filename, char **content, int *len); +static void libpqrcv_base_backup(WalReceiverConn *conn); static bool libpqrcv_startstreaming(WalReceiverConn *conn, const WalRcvStreamOptions *options); static void libpqrcv_endstreaming(WalReceiverConn *conn, @@ -88,6 +98,7 @@ static WalReceiverFunctionsType PQWalReceiverFunctions = { libpqrcv_identify_system, libpqrcv_server_version, libpqrcv_readtimelinehistoryfile, + libpqrcv_base_backup, libpqrcv_startstreaming, libpqrcv_endstreaming, libpqrcv_receive, @@ -356,6 +367,395 @@ libpqrcv_server_version(WalReceiverConn *conn) return PQserverVersion(conn->streamConn); } +/* + * XXX copied from pg_basebackup.c + */ + +unsigned long long totaldone; +unsigned long long totalsize_kb; +int tablespacenum; +int tablespacecount; + +static void +base_backup_report_progress(void) +{ + int percent; + char *progress; + + percent = totalsize_kb ? (int) ((totaldone / 1024) * 100 / totalsize_kb) : 0; + + /* + * Avoid overflowing past 100% or the full size. This may make the total + * size number change as we approach the end of the backup (the estimate + * will always be wrong if WAL is included), but that's better than having + * the done column be bigger than the total. + */ + if (percent > 100) + percent = 100; + if (totaldone / 1024 > totalsize_kb) + totalsize_kb = totaldone / 1024; + + /* Note: no translation of ps status */ + progress = psprintf((tablespacecount == 1 ? + "%llu/%llu kB (%d%%), %d/%d tablespace" : + "%llu/%llu kB (%d%%), %d/%d tablespaces"), + totaldone / 1024, + totalsize_kb, + percent, + tablespacenum, + tablespacecount); + + set_ps_display(progress, false); +#ifdef USE_SYSTEMD + sd_pid_notifyf(PostmasterPid, 0, "STATUS=base backup %s", progress); +#endif + + pfree(progress); +} + +static void +ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res) +{ + char current_path[MAXPGPATH]; + char filename[MAXPGPATH]; + pgoff_t current_len_left = 0; + int current_padding = 0; + char *copybuf = NULL; + FILE *file = NULL; + off_t flush_offset; + + strlcpy(current_path, DataDir, sizeof(current_path)); + + /* + * Get the COPY data + */ + res = PQgetResult(conn); + if (PQresultStatus(res) != PGRES_COPY_OUT) + ereport(ERROR, + (errmsg("could not get COPY data stream: %s", + PQerrorMessage(conn)))); + + while (1) + { + int r; + + if (copybuf != NULL) + { + PQfreemem(copybuf); + copybuf = NULL; + } + + r = PQgetCopyData(conn, ©buf, 0); + + if (r == -1) + { + /* + * End of chunk + */ + if (file) + fclose(file); + + break; + } + else if (r == -2) + { + ereport(ERROR, + (errmsg("could not read COPY data: %s", + PQerrorMessage(conn)))); + } + + if (file == NULL) + { + int filemode; + + /* + * No current file, so this must be the header for a new file + */ + if (r != 512) + ereport(ERROR, + (errmsg("invalid tar block header size: %d", r))); + + current_len_left = read_tar_number(©buf[124], 12); + + /* Set permissions on the file */ + filemode = read_tar_number(©buf[100], 8); + + /* + * All files are padded up to 512 bytes + */ + current_padding = + ((current_len_left + 511) & ~511) - current_len_left; + + /* + * First part of header is zero terminated filename + */ + snprintf(filename, sizeof(filename), "%s/%s", current_path, + copybuf); + if (filename[strlen(filename) - 1] == '/') + { + /* + * Ends in a slash means directory or symlink to directory + */ + if (copybuf[156] == '5') + { + /* + * Directory + */ + filename[strlen(filename) - 1] = '\0'; /* Remove trailing slash */ + if (MakePGDirectory(filename) != 0) + { + if (errno != EEXIST) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not create directory \"%s\": %m", + filename))); + } +#ifndef WIN32 + if (chmod(filename, (mode_t) filemode)) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not set permissions on directory \"%s\": %m", + filename))); +#endif + } + /* + * Symbolic link + */ + else if (copybuf[156] == '2') + { + /* TODO: tablespace mapping */ + const char *tblspc_path = ©buf[157]; + + filename[strlen(filename) - 1] = '\0'; /* Remove trailing slash */ + + if (symlink(tblspc_path, filename) != 0) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not create symbolic link from \"%s\" to \"%s\": %m", + filename, tblspc_path))); + } + else + { + ereport(ERROR, + (errmsg("unrecognized link indicator \"%c\"", + copybuf[156]))); + } + continue; /* directory or link handled */ + } + + /* + * regular file + */ + file = fopen(filename, "wb"); + if (!file) + ereport(ERROR, + (errcode_for_file_access(), + (errmsg("could not create file \"%s\": %m", filename)))); + + flush_offset = 0; + +#ifndef WIN32 + if (chmod(filename, (mode_t) filemode)) + ereport(ERROR, + (errcode_for_file_access(), + (errmsg("could not set permissions on file \"%s\": %m", + filename)))); +#endif + + if (current_len_left == 0) + { + /* + * Done with this file, next one will be a new tar header + */ + fclose(file); + file = NULL; + continue; + } + } /* new file */ + else + { + /* + * Continuing blocks in existing file + */ + if (current_len_left == 0 && r == current_padding) + { + /* + * Received the padding block for this file, ignore it and + * close the file, then move on to the next tar header. + */ + fclose(file); + file = NULL; + continue; + } + + if (fwrite(copybuf, r, 1, file) != 1) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not write to file \"%s\": %m", filename))); + + pg_flush_data(fileno(file), flush_offset, r); + flush_offset += r; + totaldone += r; + base_backup_report_progress(); + + current_len_left -= r; + if (current_len_left == 0 && current_padding == 0) + { + /* + * Received the last block, and there is no padding to be + * expected. Close the file and move on to the next tar + * header. + */ + fclose(file); + file = NULL; + continue; + } + } /* continuing data in existing file */ + } /* loop over all data blocks */ + base_backup_report_progress(); + + if (file != NULL) + ereport(ERROR, + (errmsg("COPY stream ended before last file was finished"))); + + if (copybuf != NULL) + PQfreemem(copybuf); +} + +/* + * Make base backup from remote and write to local disk. + */ +static void +libpqrcv_base_backup(WalReceiverConn *conn) +{ + StringInfoData stmt; + PGresult *res; + char xlogstart[64]; + TimeLineID starttli; + XLogRecPtr recptr; + bool error; + + ereport(LOG, + (errmsg("initiating base backup, waiting for remote checkpoint to complete"))); + set_ps_display("waiting for checkpoint", false); + + initStringInfo(&stmt); + appendStringInfo(&stmt, "BASE_BACKUP PROGRESS NOWAIT EXCLUDE_CONF"); + if (cluster_name && cluster_name[0]) + appendStringInfo(&stmt, " LABEL %s", quote_literal_cstr(cluster_name)); + + if (PQsendQuery(conn->streamConn, stmt.data) == 0) + ereport(ERROR, + (errmsg("could not start base backup on remote server: %s", + pchomp(PQerrorMessage(conn->streamConn))))); + + /* + * First result set: WAL start position and timeline ID + */ + res = PQgetResult(conn->streamConn); + if (PQresultStatus(res) != PGRES_TUPLES_OK) + { + PQclear(res); + ereport(ERROR, + (errmsg("could not start base backup on remote server: %s", + pchomp(PQerrorMessage(conn->streamConn))))); + } + if (PQntuples(res) != 1) + { + PQclear(res); + ereport(ERROR, + (errmsg("server returned unexpected response to BASE_BACKUP command; got %d rows and %d fields, expected %d rows and %d fields", + PQntuples(res), PQnfields(res), 1, 2))); + } + + ereport(LOG, + (errmsg("remote checkpoint completed"))); + + strlcpy(xlogstart, PQgetvalue(res, 0, 0), sizeof(xlogstart)); + starttli = atoi(PQgetvalue(res, 0, 1)); + PQclear(res); + elog(DEBUG1, "write-ahead log start point: %s on timeline %u", + xlogstart, starttli); + recptr = pg_lsn_in_internal(xlogstart, &error); + if (error) + elog(ERROR, "invalid LSN received: %s", xlogstart); + + /* + * Second result set: tablespace information + */ + res = PQgetResult(conn->streamConn); + if (PQresultStatus(res) != PGRES_TUPLES_OK) + { + PQclear(res); + ereport(ERROR, + (errmsg("could not get backup header: %s", + pchomp(PQerrorMessage(conn->streamConn))))); + } + if (PQntuples(res) < 1) + { + PQclear(res); + ereport(ERROR, + (errmsg("no data returned from server"))); + } + + totalsize_kb = totaldone = 0; + tablespacecount = PQntuples(res); + for (int i = 0; i < PQntuples(res); i++) + { + totalsize_kb += atol(PQgetvalue(res, i, 2)); + } + + RequestXLogStreaming(starttli, recptr, PrimaryConnInfo, PrimarySlotName); + + /* + * Start receiving chunks + */ + for (int i = 0; i < PQntuples(res); i++) + { + tablespacenum = i; + ReceiveAndUnpackTarFile(conn->streamConn, res); + } + tablespacenum++; + base_backup_report_progress(); + + PQclear(res); + + /* + * Final result set: WAL end position and timeline ID + */ + res = PQgetResult(conn->streamConn); + if (PQresultStatus(res) != PGRES_TUPLES_OK) + { + PQclear(res); + ereport(ERROR, + (errmsg("could not get write-ahead log end position from server: %s", + pchomp(PQerrorMessage(conn->streamConn))))); + } + if (PQntuples(res) != 1) + { + PQclear(res); + ereport(ERROR, + (errmsg("no write-ahead log end position returned from server"))); + } + PQclear(res); + + res = PQgetResult(conn->streamConn); + if (PQresultStatus(res) != PGRES_COMMAND_OK) + { + const char *sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE); + + if (sqlstate && + strcmp(sqlstate, "XX001" /*ERRCODE_DATA_CORRUPTED*/) == 0) + ereport(ERROR, + (errmsg("checksum error occurred"))); + else + ereport(ERROR, + (errmsg("final receive failed: %s", + pchomp(PQerrorMessage(conn->streamConn))))); + } + PQclear(res); +} + /* * Start streaming WAL data from given streaming options. * diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y index c4e11cc4e8..8c962bc711 100644 --- a/src/backend/replication/repl_gram.y +++ b/src/backend/replication/repl_gram.y @@ -78,6 +78,7 @@ static SQLCmd *make_sqlcmd(void); %token K_WAL %token K_TABLESPACE_MAP %token K_NOVERIFY_CHECKSUMS +%token K_EXCLUDE_CONF %token K_TIMELINE %token K_PHYSICAL %token K_LOGICAL @@ -154,8 +155,7 @@ var_name: IDENT { $$ = $1; } ; /* - * BASE_BACKUP [LABEL '<label>'] [PROGRESS] [FAST] [WAL] [NOWAIT] - * [MAX_RATE %d] [TABLESPACE_MAP] [NOVERIFY_CHECKSUMS] + * BASE_BACKUP [option]... */ base_backup: K_BASE_BACKUP base_backup_opt_list @@ -214,6 +214,11 @@ base_backup_opt: $$ = makeDefElem("noverify_checksums", (Node *)makeInteger(true), -1); } + | K_EXCLUDE_CONF + { + $$ = makeDefElem("exclude_conf", + (Node *)makeInteger(true), -1); + } ; create_replication_slot: diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l index 380faeb5f6..6a2d8d142b 100644 --- a/src/backend/replication/repl_scanner.l +++ b/src/backend/replication/repl_scanner.l @@ -93,6 +93,7 @@ MAX_RATE { return K_MAX_RATE; } WAL { return K_WAL; } TABLESPACE_MAP { return K_TABLESPACE_MAP; } NOVERIFY_CHECKSUMS { return K_NOVERIFY_CHECKSUMS; } +EXCLUDE_CONF { return K_EXCLUDE_CONF; } TIMELINE { return K_TIMELINE; } START_REPLICATION { return K_START_REPLICATION; } CREATE_REPLICATION_SLOT { return K_CREATE_REPLICATION_SLOT; } diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index fe2bb8f859..8d2e971ff7 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -3117,21 +3117,14 @@ looks_like_temp_rel_name(const char *name) * Other symlinks are presumed to point at files we're not responsible * for fsyncing, and might not have privileges to write at all. * - * Errors are logged but not considered fatal; that's because this is used - * only during database startup, to deal with the possibility that there are - * issued-but-unsynced writes pending against the data directory. We want to - * ensure that such writes reach disk before anything that's done in the new - * run. However, aborting on error would result in failure to start for - * harmless cases such as read-only files in the data directory, and that's - * not good either. - * - * Note that if we previously crashed due to a PANIC on fsync(), we'll be - * rewriting all changes again during recovery. + * If pre_sync is true, issue flush requests to the kernel before starting the + * actual fsync calls. This can be skipped if the caller has already done it + * itself. * * Note we assume we're chdir'd into PGDATA to begin with. */ void -SyncDataDirectory(void) +SyncDataDirectory(bool pre_sync, int loglevel) { bool xlog_is_symlink; @@ -3150,7 +3143,7 @@ SyncDataDirectory(void) struct stat st; if (lstat("pg_wal", &st) < 0) - ereport(LOG, + ereport(loglevel, (errcode_for_file_access(), errmsg("could not stat file \"%s\": %m", "pg_wal"))); @@ -3164,15 +3157,18 @@ SyncDataDirectory(void) /* * If possible, hint to the kernel that we're soon going to fsync the data - * directory and its contents. Errors in this step are even less + * directory and its contents. Errors in this step are less * interesting than normal, so log them only at DEBUG1. */ + if (pre_sync) + { #ifdef PG_FLUSH_DATA_WORKS - walkdir(".", pre_sync_fname, false, DEBUG1); - if (xlog_is_symlink) - walkdir("pg_wal", pre_sync_fname, false, DEBUG1); - walkdir("pg_tblspc", pre_sync_fname, true, DEBUG1); + walkdir(".", pre_sync_fname, false, DEBUG1); + if (xlog_is_symlink) + walkdir("pg_wal", pre_sync_fname, false, DEBUG1); + walkdir("pg_tblspc", pre_sync_fname, true, DEBUG1); #endif + } /* * Now we do the fsync()s in the same order. @@ -3183,10 +3179,10 @@ SyncDataDirectory(void) * in pg_tblspc, they'll get fsync'd twice. That's not an expected case * so we don't worry about optimizing it. */ - walkdir(".", datadir_fsync_fname, false, LOG); + walkdir(".", datadir_fsync_fname, false, loglevel); if (xlog_is_symlink) - walkdir("pg_wal", datadir_fsync_fname, false, LOG); - walkdir("pg_tblspc", datadir_fsync_fname, true, LOG); + walkdir("pg_wal", datadir_fsync_fname, false, loglevel); + walkdir("pg_tblspc", datadir_fsync_fname, true, loglevel); } /* diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 88a261d9bd..4722ad2107 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -136,6 +136,7 @@ static char *pwfilename = NULL; static char *superuser_password = NULL; static const char *authmethodhost = NULL; static const char *authmethodlocal = NULL; +static bool replica = false; static bool debug = false; static bool noclean = false; static bool do_sync = true; @@ -2938,6 +2939,22 @@ initialize_data_directory(void) /* Now create all the text config files */ setup_config(); + /* + * If data directory for replica requested, write basebackup.signal, and + * then we are done here. + */ + if (replica) + { + char *path; + char *lines[1] = {NULL}; + + path = psprintf("%s/basebackup.signal", pg_data); + writefile(path, lines); + free(path); + + return; + } + /* Bootstrap template1 */ bootstrap_template1(); @@ -3029,6 +3046,7 @@ main(int argc, char *argv[]) {"wal-segsize", required_argument, NULL, 12}, {"data-checksums", no_argument, NULL, 'k'}, {"allow-group-access", no_argument, NULL, 'g'}, + {"replica", no_argument, NULL, 'r'}, {NULL, 0, NULL, 0} }; @@ -3070,7 +3088,7 @@ main(int argc, char *argv[]) /* process command-line options */ - while ((c = getopt_long(argc, argv, "dD:E:kL:nNU:WA:sST:X:g", long_options, &option_index)) != -1) + while ((c = getopt_long(argc, argv, "dD:E:kL:nNrU:WA:sST:X:g", long_options, &option_index)) != -1) { switch (c) { @@ -3116,6 +3134,9 @@ main(int argc, char *argv[]) case 'N': do_sync = false; break; + case 'r': + replica = true; + break; case 'S': sync_only = true; break; @@ -3337,9 +3358,19 @@ main(int argc, char *argv[]) /* translator: This is a placeholder in a shell command. */ appendPQExpBuffer(start_db_cmd, " -l %s start", _("logfile")); - printf(_("\nSuccess. You can now start the database server using:\n\n" - " %s\n\n"), - start_db_cmd->data); + if (!replica) + { + printf(_("\nSuccess. You can now start the database server using:\n\n" + " %s\n\n"), + start_db_cmd->data); + } + else + { + printf(_("\nSo far so good. Now configure the replication connection in\n" + "postgresql.conf, and then start the database server using:\n\n" + " %s\n\n"), + start_db_cmd->data); + } destroyPQExpBuffer(start_db_cmd); diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c index c4ee0168a9..1acf9353d1 100644 --- a/src/bin/pg_resetwal/pg_resetwal.c +++ b/src/bin/pg_resetwal/pg_resetwal.c @@ -76,7 +76,7 @@ static int WalSegSz; static int set_wal_segsize; static void CheckDataVersion(void); -static bool ReadControlFile(void); +static bool read_controlfile(void); static void GuessControlValues(void); static void PrintControlValues(bool guessed); static void PrintNewControlValues(void); @@ -393,7 +393,7 @@ main(int argc, char *argv[]) /* * Attempt to read the existing pg_control file */ - if (!ReadControlFile()) + if (!read_controlfile()) GuessControlValues(); /* @@ -578,7 +578,7 @@ CheckDataVersion(void) * to the current format. (Currently we don't do anything of the sort.) */ static bool -ReadControlFile(void) +read_controlfile(void) { int fd; int len; diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index d519252aad..d0d5968dcf 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -127,8 +127,8 @@ extern char *archiveCleanupCommand; extern bool recoveryTargetInclusive; extern int recoveryTargetAction; extern int recovery_min_apply_delay; -extern char *PrimaryConnInfo; -extern char *PrimarySlotName; +extern PGDLLIMPORT char *PrimaryConnInfo; +extern PGDLLIMPORT char *PrimarySlotName; /* indirectly set via GUC system */ extern TransactionId recoveryTargetXid; @@ -299,6 +299,9 @@ extern Size XLOGShmemSize(void); extern void XLOGShmemInit(void); extern void BootStrapXLOG(void); extern void LocalProcessControlFile(bool reset); +extern void InitControlFile(uint64 sysidentifier); +extern void WriteControlFile(void); +extern void ReadControlFile(void); extern void StartupXLOG(void); extern void ShutdownXLOG(int code, Datum arg); extern void InitXLOGAccess(void); @@ -354,6 +357,7 @@ extern void do_pg_abort_backup(void); extern SessionBackupState get_backup_status(void); /* File path names (all relative to $PGDATA) */ +#define BASEBACKUP_SIGNAL_FILE "basebackup.signal" #define RECOVERY_SIGNAL_FILE "recovery.signal" #define STANDBY_SIGNAL_FILE "standby.signal" #define BACKUP_LABEL_FILE "backup_label" diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h index bc6e03fbc7..75efc3cf5f 100644 --- a/src/include/miscadmin.h +++ b/src/include/miscadmin.h @@ -398,6 +398,7 @@ typedef enum CheckerProcess = 0, BootstrapProcess, StartupProcess, + BaseBackupProcess, BgWriterProcess, CheckpointerProcess, WalWriterProcess, @@ -410,6 +411,7 @@ extern AuxProcType MyAuxProcType; #define AmBootstrapProcess() (MyAuxProcType == BootstrapProcess) #define AmStartupProcess() (MyAuxProcType == StartupProcess) +#define AmBaseBackupProcess() (MyAuxProcType == BaseBackupProcess) #define AmBackgroundWriterProcess() (MyAuxProcType == BgWriterProcess) #define AmCheckpointerProcess() (MyAuxProcType == CheckpointerProcess) #define AmWalWriterProcess() (MyAuxProcType == WalWriterProcess) diff --git a/src/include/pgstat.h b/src/include/pgstat.h index fe076d823d..6b6a06ced8 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -721,6 +721,7 @@ typedef enum BackendType B_AUTOVAC_LAUNCHER, B_AUTOVAC_WORKER, B_BACKEND, + B_BASE_BACKUP, B_BG_WORKER, B_BG_WRITER, B_CHECKPOINTER, diff --git a/src/include/replication/basebackup.h b/src/include/replication/basebackup.h index 503a5b9f0b..480165c51c 100644 --- a/src/include/replication/basebackup.h +++ b/src/include/replication/basebackup.h @@ -33,4 +33,6 @@ extern void SendBaseBackup(BaseBackupCmd *cmd); extern int64 sendTablespace(char *path, bool sizeonly); +extern void BaseBackupMain(void); + #endif /* _BASEBACKUP_H */ diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h index e12a934966..835c0b8214 100644 --- a/src/include/replication/walreceiver.h +++ b/src/include/replication/walreceiver.h @@ -214,6 +214,7 @@ typedef void (*walrcv_readtimelinehistoryfile_fn) (WalReceiverConn *conn, TimeLineID tli, char **filename, char **content, int *size); +typedef void (*walrcv_base_backup_fn) (WalReceiverConn *conn); typedef bool (*walrcv_startstreaming_fn) (WalReceiverConn *conn, const WalRcvStreamOptions *options); typedef void (*walrcv_endstreaming_fn) (WalReceiverConn *conn, @@ -241,6 +242,7 @@ typedef struct WalReceiverFunctionsType walrcv_identify_system_fn walrcv_identify_system; walrcv_server_version_fn walrcv_server_version; walrcv_readtimelinehistoryfile_fn walrcv_readtimelinehistoryfile; + walrcv_base_backup_fn walrcv_base_backup; walrcv_startstreaming_fn walrcv_startstreaming; walrcv_endstreaming_fn walrcv_endstreaming; walrcv_receive_fn walrcv_receive; @@ -266,6 +268,8 @@ extern PGDLLIMPORT WalReceiverFunctionsType *WalReceiverFunctions; WalReceiverFunctions->walrcv_server_version(conn) #define walrcv_readtimelinehistoryfile(conn, tli, filename, content, size) \ WalReceiverFunctions->walrcv_readtimelinehistoryfile(conn, tli, filename, content, size) +#define walrcv_base_backup(conn) \ + WalReceiverFunctions->walrcv_base_backup(conn) #define walrcv_startstreaming(conn, options) \ WalReceiverFunctions->walrcv_startstreaming(conn, options) #define walrcv_endstreaming(conn, next_tli) \ diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h index 625fbc386a..1c57ad901f 100644 --- a/src/include/storage/fd.h +++ b/src/include/storage/fd.h @@ -148,7 +148,7 @@ extern void fsync_fname(const char *fname, bool isdir); extern int durable_rename(const char *oldfile, const char *newfile, int loglevel); extern int durable_unlink(const char *fname, int loglevel); extern int durable_link_or_rename(const char *oldfile, const char *newfile, int loglevel); -extern void SyncDataDirectory(void); +extern void SyncDataDirectory(bool pre_sync, int loglevel); extern int data_sync_elevel(int elevel); /* Filename components */ diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h index 6791e0cbc2..2e12330b00 100644 --- a/src/include/utils/guc.h +++ b/src/include/utils/guc.h @@ -259,7 +259,7 @@ extern int temp_file_limit; extern int num_temp_buffers; -extern char *cluster_name; +extern PGDLLIMPORT char *cluster_name; extern PGDLLIMPORT char *ConfigFileName; extern char *HbaFileName; extern char *IdentFileName; diff --git a/src/test/recovery/t/018_basebackup.pl b/src/test/recovery/t/018_basebackup.pl new file mode 100644 index 0000000000..99731fc388 --- /dev/null +++ b/src/test/recovery/t/018_basebackup.pl @@ -0,0 +1,29 @@ +# Test basebackup worker functionality +use strict; +use warnings; +use PostgresNode; +use TestLib; +use Test::More tests => 2; + +my $node1 = get_new_node('node1'); +$node1->init(allows_streaming => 1); +$node1->start; + +$node1->safe_psql('postgres', + "CREATE TABLE tab_int AS SELECT generate_series(1,1000) AS a"); + +my $node2 = get_new_node('node2'); +$node2->init(allows_streaming => 1, extra => [ '--replica' ]); +$node2->append_conf('postgresql.conf', "primary_conninfo = '" . $node1->connstr . "'"); +my $old_mtime = (stat($node2->data_dir . '/postgresql.conf'))[9]; +$node2->start; + +$node1->wait_for_catchup($node2, 'replay', $node1->lsn('insert')); + +is($node2->safe_psql('postgres', "SELECT count(*) FROM tab_int"), + qq(1000), + 'check content of standby'); + +my $new_mtime = (stat($node2->data_dir . '/postgresql.conf'))[9]; +is($new_mtime, $old_mtime, + 'configuration files were not copied'); base-commit: 61ecea45e50bcd3b87d4e905719e63e41d6321ce -- 2.23.0