Re: [HACKERS] Measuring replay lag

Thomas Munro Wed, 22 Feb 2017 22:24:09 -0800

On Tue, Feb 21, 2017 at 6:21 PM, Simon Riggs <[email protected]> wrote:
> I think what we need to show some test results with the graph of lag
> over time for these cases:
> 1. steady state - pgbench on master, so we can see how that responds
> 2. blocked apply on standby - so we can see how the lag increases but
> also how the accuracy goes down as the lag increases and whether the
> reported value changes (depending upon algo)
> 3. burst mode - where we go from not moving to moving at high speed
> and then stop again quickly
> +other use cases you or others add


Here are graphs of the 'burst' example from my previous email, with
LAG_TRACKER_BUFFER_SIZE set to 4 (really small so that it fills up)
and 8192 (the size I'm proposing in this patch).  It looks like the
resampling and interpolation work pretty well to me when the buffer is
full.

The overall graph looks pretty similar, but it is more likely to short
hiccups caused by occasional slow WAL fsyncs in walreceiver.   See the
attached graphs with 'spike' in the name: in the large buffer version
we see a short spike in write/flush lag and that results in apply
falling behind, but in the small buffer version we can only guess that
that might have happened because apply fell behind during the 3rd and
4th write bursts.  We don't know exactly why because we didn't have
sufficient samples to detect a short lived write/flush delay.

The workload just does this in a loop:

  DROP TABLE IF EXISTS foo;
  CREATE TABLE foo AS SELECT generate_series(1, 10000000);
  SELECT pg_sleep(10);

While testing with a small buffer I found a thinko when write_head is
moved back, fixed in the attached.

-- 
Thomas Munro
http://www.enterprisedb.com

replication-lag-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Measuring replay lag

Reply via email to