Re: [HACKERS] Logical replication and multimaster

Konstantin Knizhnik Wed, 02 Dec 2015 12:19:49 -0800

Thank you for reply.

On 12/02/2015 08:30 PM, Robert Haas wrote:


Logical decoding only begins decoding a transaction once the
transaction is complete.  So I would guess that the sequence of
operations here is something like this - correct me if I'm wrong:

1. Do the transaction.
2. PREPARE.
3. Replay the transaction.
4. PREPARE the replay.
5. COMMIT PREPARED on original machine.
6. COMMIT PREPARED on replica.


Logical decoding is started after execution of XLogFlush method.
So atually transaction is not yet completed at this moment:
- it is not marked as committed in clog
- It is marked as in-progress in procarray
- locks are not released

We are not using PostgreSQL two-phase commit here.
Instead of our DTM catches control in TransactionIdCommitTree and sends request 
to arbiter which in turn wait status of committing transactions on replicas.
The problem is that transactions are delivered to replica through single 
channel: logical replication slot.
And while such transaction is waiting acknowledgement from arbiter, it is 
blocking replication channel preventing other (parallel transactions)  from 
been replicated and applied.

I have implemented pool of background workers. May be it will be useful not 
only for me.
It consists of one produces-multiple consumers queue implemented using buffer 
in shared memory, spinlock and two semaphores.
API is very simple:

typedef void(*BgwPoolExecutor)(int id, void* work, size_t size);
typedef BgwPool*(*BgwPoolConstructor)(void);

extern void BgwPoolStart(int nWorkers, BgwPoolConstructor constructor);
extern void BgwPoolInit(BgwPool* pool, BgwPoolExecutor executor, char const* 
dbname, size_t queueSize);
extern void BgwPoolExecute(BgwPool* pool, void* work, size_t size);

You just place in this queue some bulk of bytes (work, size), it is placed in 
queue and then first available worker will dequeue it and execute.

Using this pool and larger number of accounts (reducing possibility of 
conflict), I get better results.
So now receiver of logical replication is not executing transactions directly, 
instead of it receiver is placing them in queue and them are executed 
concurrent by pool of background workers.

At cluster with three nodes results of out debit-credit benchmark are the 
following:


        TPS
Multimaster (ACID transactions)
        12500
Multimaster (async replication)
        34800
Standalone PostgreSQL
        44000

We tested two modes: when client randomly distribute queries between cluster nodes and when client is working only with one master nodes and other are just used as replicas. Performance is slightly better in the second case, but the difference is not verylarge (about 11000 TPS in first case).


Number of workers in pool has signficant imact on performance: with 8 workers 
we get about 7800 TPS and with 16 workers - 12500.
Also performance greatly depends on number of accounts (and so probability of 
lock conflicts). In case of 100 accounts speed is less than 1000 TPS.

Step 3 introduces latency proportional to the amount of work the
transaction did, which could be a lot.   If you were doing synchronous
physical replication, the replay of the COMMIT record would only need
to wait for the replay of the commit record itself.  But with
synchronous logical replication, you've got to wait for the replay of
the entire transaction.  That's a major bummer, especially if replay
is single-threaded and there a large number of backends generating
transactions.  Of course, the 2PC dance itself can also add latency -
that's most likely to be the issue if the transactions are each very
short.

What I'd suggest is trying to measure where the latency is coming
from.  You should be able to measure how much time each transaction
spends (a) executing, (b) preparing itself, (c) waiting for the replay
thread to begin replaying it, (d) waiting for the replay thread to
finish replaying it, and (e) committing.  Separating (c) and (d) might
be a little bit tricky, but I bet it's worth putting some effort in,
because the answer is probably important to understanding what sort of
change will help here.  If (c) is the problem, you might be able to
get around it by having multiple processes, though that only helps if
applying is slower than decoding.  But if (d) is the problem, then the
only solution is probably to begin applying the transaction
speculatively before it's prepared/committed.  I think.

Re: [HACKERS] Logical replication and multimaster

Reply via email to