On 23 January 2017 at 11:56, Ivan Kartyshov <i.kartys...@postgrespro.ru> wrote: > Thank you for reviews and suggested improvements. > I rewrote patch to make it more stable. > > Changes > ======= > I've made a few changes: > 1) WAITLSN now doesn`t depend on snapshot > 2) Check current replayed LSN rather than in xact_redo_commit > 3) Add syntax WAITLSN_INFINITE '0/693FF800' - for infinite wait and > WAITLSN_NO_WAIT '0/693FF800' for check if LSN was replayed as you > advised. > 4) Reduce the count of loops with GUCs (WalRcvForceReply() which in 9.5 > doesn`t exist). > 5) Optimize loop that set latches. > 6) Add two GUCs that helps us to configure influence on StartupXLOG: > count_waitlsn (denominator to check not each LSN) > interval_waitlsn (Interval in milliseconds to additional LSN check) > > Feedback > ======== > On 09/15/2016 05:41 AM, Thomas Munro wrote: >> >> You hold a spinlock in one arbitrary slot, but that >> doesn't seem sufficient: another backend may also read it, compute a >> new value and then write it, while holding a different spin lock. Or >> am I missing something? > > > We acquire an individual spinlock on each member of array, so you cannot > compute new value and write it concurrently. > > Tested > ====== > We have been tested it on different servers and OS`s, in different cases and > workloads. New version is nearly as fast as vanilla on primary and bring > tiny influence on standby performance. > > Hardware: > 144 Intel Cores with HT > 3TB RAM > all data on ramdisk > primary + hotstandby on the same node. > > A dataset was created with "pgbench -i -s 1000" command. For each round of > test we pause replay on standby, make 1000000 transaction on primary with > pgbench, start replay on standby and measure replication gap disappearing > time under different standby workload. The workload was "WAITLSN > ('Very/FarLSN', 1000ms timeout)" followed by "select abalance from > pgbench_accounts there aid = random_aid;" > For vanilla 1000ms timeout was enforced on pgbench side by -R option. > GUC waitlsn parameters was adopted for 1000ms timeout on standby with 35000 > tps rate on primary. > interval_waitlsn = 500 (ms) > count_waitlsn = 30000 > > On 200 clients, slave caching up master as vanilla without significant > delay. > On 500 clients, slave caching up master 3% slower then vanilla. > On 1000 clients, 12% slower. > On 5000 clients, 3 time slower because it far above our hardware ability. > > How to use it > ========== > WAITLSN ‘LSN’ [, timeout in ms]; > WAITLSN_INFINITE ‘LSN’; > WAITLSN_NO_WAIT ‘LSN’; > > #Wait until LSN 0/303EC60 will be replayed, or 10 second passed. > WAITLSN ‘0/303EC60’, 10000; > > #Or same without timeout. > WAITLSN ‘0/303EC60’; > orfile:///home/vis/Downloads/waitlsn_10dev_v2.patch > WAITLSN_INFINITE '0/693FF800'; > > #To check if LSN is replayed can be used. > WAITLSN_NO_WAIT '0/693FF800'; > > Notice: WAITLSN will release on PostmasterDeath or Interruption events > if they come earlier then target LSN or timeout. > > > Thank you for reading, will be glad to get your feedback.
Could you please rebase your patch as it no longer applies cleanly. Thanks Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers