On 01/26/2016 07:43 AM, Stas Kelvich wrote:
Thanks for reviews and commit!

   As Simon and Andres already mentioned in this thread replay of twophase 
transaction is significantly slower then the same operations in normal mode. 
Major reason is that each state file is fsynced during replay and while it is 
not a problem for recovery, it is a problem for replication. Under high 2pc 
update load lag between master and async replica is constantly increasing (see 
graph below).

   One way to improve things is to move fsyncs to restartpoints, but as we saw 
previously it is a half-measure and just frequent calls to fopen can cause 

   Other option is to use the same scenario for replay that was used already 
for non-recovery mode: read state files to memory during replay of prepare, and 
if checkpoint/restartpoint occurs between prepare and commit move data to 
files. On commit we can read xlog or files. So here is the patch that 
implements this scenario for replay.

   Patch is quite straightforward. During replay of prepare records 
RecoverPreparedFromXLOG() is called to create memory state in GXACT, PROC, 
PGPROC; on commit XlogRedoFinishPrepared() is called to clean up that state. 
Also there are several functions (PrescanPreparedTransactions, 
StandbyTransactionIdIsPrepared) that were assuming that during replay all 
prepared xacts have files in pg_twophase, so I have extended them to check 
GXACT too.
   Side effect of that behaviour is that we can see prepared xacts in 
pg_prepared_xacts view on slave.

While this patch touches quite sensible part of postgres replay and there is 
some rarely used code paths, I wrote shell script to setup master/slave 
replication and test different failure scenarios that can happened with 
instances. Attaching this file to show test scenarios that I have tested and 
more importantly to show what I didn’t tested. Particularly I failed to 
reproduce situation where StandbyTransactionIdIsPrepared() is called, may be 
somebody can suggest way how to force it’s usage. Also I’m not too sure about 
necessity of calling cache invalidation callbacks during 
XlogRedoFinishPrepared(), I’ve marked this place in patch with 2REVIEWER 

Tests shows that this patch increases speed of 2pc replay to the level when 
replica can keep pace with master.

Graph: replica lag under a pgbench run for a 200 seconds with 2pc update transactions (80 
connections, one update per 2pc tx, two servers with 12 cores each, 10GbE interconnect) 
on current master and with suggested patch. Replica lag measured with "select 
sent_location-replay_location as delay from pg_stat_replication;" each second.

Some comments:

* The patch needs a rebase against the latest TwoPhaseFileHeader change
* Rework the check.sh script into a TAP test case (src/test/recovery), as suggested by Alvaro and Michael down thread
* Add documentation for RecoverPreparedFromXLOG

+        * that xlog record. We need just to clen up memmory state.

'clean' + 'memory'

+        * This is usually called after end-of-recovery checkpoint, so all 2pc
+        * files moved xlog to files. But if we restart slave when master is
+        * switched off this function will be called before checkpoint ans we 
+        * to check PGXACT array as it can contain prepared transactions that
+        * didn't created any state files yet.


"We need to check the PGXACT array for prepared transactions that doesn't have any state file in case of a slave restart with the master being off."

+                * prepare xlog resords in shared memory in the same way as it 


+                * We need such behaviour because speed of 2PC replay on 
replica should
+                * be at least not slower than 2PC tx speed on master.


"We need this behaviour because the speed of the 2PC replay on the replica should be at least the same as the 2PC transaction speed of the master."

I'll leave the 2REVIEWER section to Simon.

Best regards,

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to