On 2015-01-19 17:16:11 +0900, Michael Paquier wrote:
> On Mon, Jan 19, 2015 at 4:10 PM, Michael Paquier
> <[email protected]> wrote:
> > On Sat, Jan 17, 2015 at 2:44 AM, Andres Freund <[email protected]>
> > wrote:
> >> Not this patch's fault, but I'm getting a bit tired seeing the above open
> >> coded. How about adding a function that does the sleeping based on a
> >> timestamptz and a ms interval?
> > You mean in plugins, right? I don't recall seeing similar patterns in
> > other code paths of backend. But I think that we can use something
> > like that in timestamp.c then because we need to leverage that between
> > two timestamps, the last failure and now():
> > TimestampSleepDifference(start_time, stop_time, internal_ms);
> > Perhaps you have something else in mind?
> >
> > Attached is an updated patch.
> Actually I came with better than last patch by using a boolean flag as
> return value of TimestampSleepDifference and use
> TimestampDifferenceExceeds directly inside it.
> Subject: [PATCH] Add wal_availability_check_interval to control WAL fetching
> on failure
I think that name isn't a very good. And its isn't very accurate
either.
How about wal_retrieve_retry_interval? Not very nice, but I think it's
still better than the above.
> + <varlistentry id="wal-availability-check-interval"
> xreflabel="wal_availability_check_interval">
> + <term><varname>wal_availability_check_interval</varname>
> (<type>integer</type>)
> + <indexterm>
> + <primary><varname>wal_availability_check_interval</> recovery
> parameter</primary>
> + </indexterm>
> + </term>
> + <listitem>
> + <para>
> + This parameter specifies the amount of time to wait when
> + WAL is not available for a node in recovery. Default value is
> + <literal>5s</>.
> + </para>
> + <para>
> + A node in recovery will wait for this amount of time if
> + <varname>restore_command</> returns nonzero exit status code when
> + fetching new WAL segment files from archive or when a WAL receiver
> + is not able to fetch a WAL record when using streaming replication.
> + </para>
> + </listitem>
> + </varlistentry>
> +
> </variablelist>
Walreceiver doesn't wait that amount, but rather how long the connection
is intact. And restore_command may or may not retry.
> /*-------
> * Standby mode is implemented by a state machine:
> @@ -10490,15 +10511,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool
> randAccess,
> * machine, so we've exhausted all the
> options for
> * obtaining the requested WAL. We're
> going to loop back
> * and retry from the archive, but if
> it hasn't been long
> - * since last attempt, sleep 5 seconds
> to avoid
> - * busy-waiting.
> + * since last attempt, sleep the amount
> of time specified
> + * by wal_availability_check_interval
> to avoid busy-waiting.
> */
> - now = (pg_time_t) time(NULL);
> - if ((now - last_fail_time) < 5)
> - {
> - pg_usleep(1000000L * (5 - (now
> - last_fail_time)));
> - now = (pg_time_t) time(NULL);
> - }
> + now = GetCurrentTimestamp();
> + if
> (TimestampSleepDifference(last_fail_time, now,
> +
> wal_availability_check_interval))
> + now = GetCurrentTimestamp();
Not bad, that's much easier to read imo.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers