On Mon, Feb 9, 2015 at 8:29 PM, Fujii Masao <masao.fu...@gmail.com> wrote: > On Sun, Feb 8, 2015 at 2:54 PM, Michael Paquier > <michael.paqu...@gmail.com> wrote: >> On Fri, Feb 6, 2015 at 4:58 PM, Fujii Masao wrote: >>> - * Wait for more WAL to arrive. Time out after 5 >>> seconds, >>> - * like when polling the archive, to react to a trigger >>> - * file promptly. >>> + * Wait for more WAL to arrive. Time out after >>> the amount of >>> + * time specified by wal_retrieve_retry_interval, like >>> + * when polling the archive, to react to a >>> trigger file promptly. >>> */ >>> WaitLatch(&XLogCtl->recoveryWakeupLatch, >>> WL_LATCH_SET | WL_TIMEOUT, >>> - 5000L); >>> + wal_retrieve_retry_interval * 1000L); >>> >>> This change can prevent the startup process from reacting to >>> a trigger file. Imagine the case where the large interval is set >>> and the user want to promote the standby by using the trigger file >>> instead of pg_ctl promote. I think that the sleep time should be 5s >>> if the interval is set to more than 5s. Thought? >> >> I disagree here. It is interesting to accelerate the check of WAL >> availability from a source in some cases for replication, but the >> opposite is true as well as mentioned by Alexey at the beginning of >> the thread to reduce the number of requests when requesting WAL >> archives from an external storage type AWS. Hence a correct solution >> would be to check periodically for the trigger file with a maximum >> one-time wait of 5s to ensure backward-compatible behavior. We could >> reduce it to 1s or something like that as well. > > You seem to have misunderstood the code in question. Or I'm missing something. > The timeout of the WaitLatch is just the interval to check for the trigger > file > while waiting for more WAL to arrive from streaming replication. Not related > to > the retry time to restore WAL from the archive.
[Re-reading the code...] Aah.. Yes you are right. Sorry for the noise. Yes let's wait for a maximum of 5s then. I also noticed in previous patch that the wait was maximized to 5s. To begin with, a loop should have been used if it was a sleep, but as now patch uses a latch this limit does not make much sense... Patch updated is attached. Regards, -- Michael
From 9b7e3bb32c744b328a0d99db3040cadfcba606aa Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@otacoo.com> Date: Mon, 19 Jan 2015 16:08:48 +0900 Subject: [PATCH] Add wal_retrieve_retry_interval This parameter aids to control at which timing WAL availability is checked when a node is in recovery, particularly when successive failures happen when fetching WAL archives, or when fetching WAL records from a streaming source. Default value is 5s. --- doc/src/sgml/config.sgml | 17 ++++++++++ src/backend/access/transam/xlog.c | 46 +++++++++++++++++++-------- src/backend/utils/misc/guc.c | 12 +++++++ src/backend/utils/misc/postgresql.conf.sample | 3 ++ src/include/access/xlog.h | 1 + 5 files changed, 66 insertions(+), 13 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 6bcb106..d82b26a 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2985,6 +2985,23 @@ include_dir 'conf.d' </listitem> </varlistentry> + <varlistentry id="guc-wal-retrieve-retry-interval" xreflabel="wal_retrieve_retry_interval"> + <term><varname>wal_retrieve_retry_interval</varname> (<type>integer</type>) + <indexterm> + <primary><varname>wal_retrieve_retry_interval</> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Specify the amount of time to wait when WAL is not available from + any sources (streaming replication, local <filename>pg_xlog</> or + WAL archive) before retrying to retrieve WAL. This parameter can + only be set in the <filename>postgresql.conf</> file or on the + server command line. The default value is 5 seconds. + </para> + </listitem> + </varlistentry> + </variablelist> </sect2> </sect1> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 629a457..1f9c3c4 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -93,6 +93,7 @@ int sync_method = DEFAULT_SYNC_METHOD; int wal_level = WAL_LEVEL_MINIMAL; int CommitDelay = 0; /* precommit delay in microseconds */ int CommitSiblings = 5; /* # concurrent xacts needed to sleep */ +int wal_retrieve_retry_interval = 5000; #ifdef WAL_DEBUG bool XLOG_DEBUG = false; @@ -10340,8 +10341,8 @@ static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, bool fetching_ckpt, XLogRecPtr tliRecPtr) { - static pg_time_t last_fail_time = 0; - pg_time_t now; + TimestampTz now = GetCurrentTimestamp(); + TimestampTz last_fail_time = now; /*------- * Standby mode is implemented by a state machine: @@ -10351,7 +10352,9 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, * 2. Check trigger file * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM) * 4. Rescan timelines - * 5. Sleep 5 seconds, and loop back to 1. + * 5. Sleep the amount of time defined by wal_retrieve_retry_interval + * while checking periodically for the presence of user-defined + * trigger file and loop back to 1. * * Failure to read from the current source advances the state machine to * the next state. @@ -10490,14 +10493,25 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, * machine, so we've exhausted all the options for * obtaining the requested WAL. We're going to loop back * and retry from the archive, but if it hasn't been long - * since last attempt, sleep 5 seconds to avoid - * busy-waiting. + * since last attempt, sleep the amount of time specified + * by wal_retrieve_retry_interval. */ - now = (pg_time_t) time(NULL); - if ((now - last_fail_time) < 5) + now = GetCurrentTimestamp(); + if (!TimestampDifferenceExceeds(last_fail_time, now, + wal_retrieve_retry_interval)) { - pg_usleep(1000000L * (5 - (now - last_fail_time))); - now = (pg_time_t) time(NULL); + long secs, wait_time; + int microsecs; + TimestampDifference(last_fail_time, now, &secs, µsecs); + + wait_time = wal_retrieve_retry_interval * 1000L - + (1000000L * secs + 1L * microsecs); + + WaitLatch(&XLogCtl->recoveryWakeupLatch, + WL_LATCH_SET | WL_TIMEOUT, + wait_time / 1000); + ResetLatch(&XLogCtl->recoveryWakeupLatch); + now = GetCurrentTimestamp(); } last_fail_time = now; currentSource = XLOG_FROM_ARCHIVE; @@ -10562,6 +10576,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, case XLOG_FROM_STREAM: { bool havedata; + long wait_time = wal_retrieve_retry_interval; /* * Check if WAL receiver is still active. @@ -10653,13 +10668,18 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, } /* - * Wait for more WAL to arrive. Time out after 5 seconds, - * like when polling the archive, to react to a trigger - * file promptly. + * Wait for more WAL to arrive. Time out after the amount + * of time defined by wal_retrieve_retry_interval like when + * polling the archive for no more than 5s to ensure quick + * responsiveness of system. */ + if (wal_retrieve_retry_interval >= 5000) + wait_time = 5000; + + /* wait a bit */ WaitLatch(&XLogCtl->recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, - 5000L); + wait_time * 1L); ResetLatch(&XLogCtl->recoveryWakeupLatch); break; } diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 9572777..c4598ed 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -2364,6 +2364,18 @@ static struct config_int ConfigureNamesInt[] = }, { + {"wal_retrieve_retry_interval", PGC_SIGHUP, WAL_SETTINGS, + gettext_noop("Specifies the amount of time to wait when WAL is not " + "available from a source."), + NULL, + GUC_UNIT_MS + }, + &wal_retrieve_retry_interval, + 5000, 1, INT_MAX, + NULL, NULL, NULL + }, + + { {"wal_segment_size", PGC_INTERNAL, PRESET_OPTIONS, gettext_noop("Shows the number of pages per write ahead log segment."), NULL, diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index b053659..73e2bca 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -260,6 +260,9 @@ #wal_receiver_timeout = 60s # time that receiver waits for # communication from master # in milliseconds; 0 disables +#wal_retrieve_retry_interval = 5s # time to wait before retrying to + # retrieve WAL from a source (streaming + # replication, archive or local pg_xlog) #------------------------------------------------------------------------------ diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 138deaf..be27a85 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -93,6 +93,7 @@ extern int CheckPointSegments; extern int wal_keep_segments; extern int XLOGbuffers; extern int XLogArchiveTimeout; +extern int wal_retrieve_retry_interval; extern bool XLogArchiveMode; extern char *XLogArchiveCommand; extern bool EnableHotStandby; -- 2.3.0
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers