On 23 January 2017 at 11:56, Ivan Kartyshov <i.kartys...@postgrespro.ru> wrote:
> Thank you for reviews and suggested improvements.
> I rewrote patch to make it more stable.
> I've made a few changes:
> 1) WAITLSN now doesn`t depend on snapshot
> 2) Check current replayed LSN rather than in xact_redo_commit
> 3) Add syntax WAITLSN_INFINITE '0/693FF800' - for infinite wait and
> WAITLSN_NO_WAIT '0/693FF800' for check if LSN was replayed as you
> 4) Reduce the count of loops with GUCs (WalRcvForceReply() which in 9.5
> doesn`t exist).
> 5) Optimize loop that set latches.
> 6) Add two GUCs that helps us to configure influence on StartupXLOG:
> count_waitlsn (denominator to check not each LSN)
> interval_waitlsn (Interval in milliseconds to additional LSN check)
> On 09/15/2016 05:41 AM, Thomas Munro wrote:
>> You hold a spinlock in one arbitrary slot, but that
>> doesn't seem sufficient: another backend may also read it, compute a
>> new value and then write it, while holding a different spin lock. Or
>> am I missing something?
> We acquire an individual spinlock on each member of array, so you cannot
> compute new value and write it concurrently.
> We have been tested it on different servers and OS`s, in different cases and
> workloads. New version is nearly as fast as vanilla on primary and bring
> tiny influence on standby performance.
> 144 Intel Cores with HT
> 3TB RAM
> all data on ramdisk
> primary + hotstandby on the same node.
> A dataset was created with "pgbench -i -s 1000" command. For each round of
> test we pause replay on standby, make 1000000 transaction on primary with
> pgbench, start replay on standby and measure replication gap disappearing
> time under different standby workload. The workload was "WAITLSN
> ('Very/FarLSN', 1000ms timeout)" followed by "select abalance from
> pgbench_accounts there aid = random_aid;"
> For vanilla 1000ms timeout was enforced on pgbench side by -R option.
> GUC waitlsn parameters was adopted for 1000ms timeout on standby with 35000
> tps rate on primary.
> interval_waitlsn = 500 (ms)
> count_waitlsn = 30000
> On 200 clients, slave caching up master as vanilla without significant
> On 500 clients, slave caching up master 3% slower then vanilla.
> On 1000 clients, 12% slower.
> On 5000 clients, 3 time slower because it far above our hardware ability.
> How to use it
> WAITLSN ‘LSN’ [, timeout in ms];
> WAITLSN_INFINITE ‘LSN’;
> WAITLSN_NO_WAIT ‘LSN’;
> #Wait until LSN 0/303EC60 will be replayed, or 10 second passed.
> WAITLSN ‘0/303EC60’, 10000;
> #Or same without timeout.
> WAITLSN ‘0/303EC60’;
> WAITLSN_INFINITE '0/693FF800';
> #To check if LSN is replayed can be used.
> WAITLSN_NO_WAIT '0/693FF800';
> Notice: WAITLSN will release on PostmasterDeath or Interruption events
> if they come earlier then target LSN or timeout.
> Thank you for reading, will be glad to get your feedback.
Could you please rebase your patch as it no longer applies cleanly.
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: