On 26 October 2016 at 11:34, Thomas Munro <thomas.mu...@enterprisedb.com> wrote:

> It works by taking advantage of the { time, end-of-WAL } samples that
> sending servers already include in message headers to standbys.  That
> seems to provide a pretty good proxy for when the WAL was written, if
> you ignore messages where the LSN hasn't advanced.  The patch
> introduces a new GUC replay_lag_sample_interval, defaulting to 1s, to
> control how often the standby should record these timestamped LSNs
> into a small circular buffer.  When its recovery process eventually
> replays a timestamped LSN, the timestamp is sent back to the upstream
> server in a new reply message field.  The value visible in
> pg_stat_replication.replay_lag can then be updated by comparing with
> the current time.

Why not just send back the lag as calculated by max_standby_streaming_delay?
I.e. at the end of replay of each chunk record the current delay in
shmem, then send it back periodically.

If we have two methods of calculation it would be confusing.

Admittedly the approach here is the same one I advocated a some years
back when Robert and I were discussing time delayed standbys.

> Compared to the usual techniques people use to estimate replay lag,
> this approach has the following advantages:
> 1.  The lag is measured in time, not LSN difference.
> 2.  The lag time is computed using two observations of a single
> server's clock, so there is no clock skew.
> 3.  The lag is updated even between commits (during large data loads etc).

Yes, good reasons.

> In the previous version I was effectively showing the ping time
> between the servers during idle times when the standby was fully
> caught up because there was nothing happening.  I decided that was not
> useful information and that it's more newsworthy and interesting to
> see the estimated replay lag for the most recent real replayed
> activity, so I changed that.
> In the last thread[1], Robert Haas wrote:
>> Well, one problem with this is that you can't put a loop inside of a
>> spinlock-protected critical section.
> Fixed.
>> In general, I think this is a pretty reasonable way of attacking this
>> problem, but I'd say it's significantly under-commented.  Where should
>> someone go to get a general overview of this mechanism?  The answer is
>> not "at place XXX within the patch".  (I think it might merit some
>> more extensive documentation, too, although I'm not exactly sure what
>> that should look like.)
> I have added lots of comments.
>> When you overflow the buffer, you could thin in out in a smarter way,
>> like by throwing away every other entry instead of the oldest one.  I
>> guess you'd need to be careful how you coded that, though, because
>> replaying an entry with a timestamp invalidates some of the saved
>> entries without formally throwing them out.
> Done, by overwriting the newest sample rather than the oldest if the
> buffer is full.  That seems to give pretty reasonable degradation,
> effectively lowering the sampling rate, without any complicated buffer
> or rate management code.
>> Conceivably, 0002 could be split into two patches, one of which
>> computes "stupid replay lag" considering only records that naturally
>> carry timestamps, and a second adding the circular buffer to handle
>> the case where much time passes without finding such a record.
> I contemplated this but decided that it'd be best to use ONLY samples
> from walsender headers, and never use the time stamps from commit
> records for this.  If we use times from commit records, then a
> cascading sending server will not be able to compute the difference in
> time without introducing clock skew (not to mention the difficulty of
> combining timestamps from two sources if we try to do both).  I
> figured that it's better to have value that shows a cascading
> sender->standby->cascading sender round trip time that is free of
> clock skew, than a master->cascading sender->standby->cascading sender
> incomplete round trip that includes clock skew.
> By the same reasoning I decided against introducing a new periodic WAL
> record or field from the master to hold extra time stamps in between
> commits to do this, in favour of the buffered transient timestamp
> approach I took in this patch.  That said, I can see there are
> arguments for doing it with extra periodic WAL timestamps, if people
> don't think it'd be too invasive to mess with the WAL for this, and
> don't care about cascading standbys giving skewed readings.  One
> advantage would be that persistent WAL timestamps would still be able
> to provide lag estimates if a standby has been down for a while and
> was catching up, and this approach can't until it's caught up due to
> lack of buffered transient timestamps.  Thoughts?

-1 for adding anything to the WAL for this.

Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to