On Monday, November 16, 2015 2:47 AM, Konstantin Knizhnik
> Some time ago at PgConn.Vienna we have proposed eXtensible
> Transaction Manager API (XTM).
> The idea is to be able to provide custom implementation of
> transaction managers as standard Postgres extensions,
> primary goal is implementation of distritibuted transaction manager.
> It should not only support 2PC, but also provide consistent
> snapshots for global transaction executed at different nodes.
> Actually, current version of XTM API propose any particular 2PC
> model. It can be implemented either at coordinator side
> (as it is done in our pg_tsdtm implementation based on timestamps
> and not requiring centralized arbiter), either by arbiter
I'm not entirely clear on what you're saying here. I admit I've
not kept in close touch with the distributed processing discussions
lately -- is there a write-up and/or diagram to give an overview of
where we're at with this effort?
> In the last case 2PC logic is hidden under XTM
> SetTransactionStatus method:
> bool (*SetTransactionStatus)(TransactionId xid, int nsubxids,
> TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
> which encapsulates TransactionIdSetTreeStatus in clog.c.
> But you may notice that original TransactionIdSetTreeStatus function
> is void - it is not intended to return anything.
> It is called in RecordTransactionCommit in critical section where it
> is not expected that commit may fail.
This issue, though, seems clear enough. At some point a
transaction must cross a hard line between when it is not committed
and when it is, since after commit subsequent transactions can then
see the data and modify it. There has to be some "point of no
return" in order to have any sane semantics. Entering that critical
section is it.
> But in case of DTM transaction may be rejected by arbiter. XTM API
> allows to control access to CLOG, so everybody will see that
> transaction is aborted. But we in any case have to somehow notify
> client about abort of transaction.
If you are saying that DTM tries to roll back a transaction after
any participating server has entered the RecordTransactionCommit()
critical section, then IMO it is broken. Full stop. That can't
work with any reasonable semantics as far as I can see.
> We can not just call elog(ERROR,...) in SetTransactionStatus
> implementation because inside critical section it cause Postgres
> crash with panic message. So we have to remember that transaction is
> rejected and report error later after exit from critical section:
I don't believe that is a good plan. You should not enter the
critical section for recording that a commit is complete until all
the work for the commit is done except for telling the all the
servers that all servers are ready.
> There is one more problem - at this moment the state of transaction
> is TRANS_COMMIT.
> If ERROR handler will try to abort it, then we get yet another fatal
> error: attempt to rollback committed transaction.
> So we need to hide the fact that transaction is actually committed
> in local XLOG.
That is one of pretty much an infinite variety of problems you have
if you don't have a "hard line" for when the transaction is finally
> This approach works but looks a little bit like hacker approach. It
> requires not only to replace direct call of
> TransactionIdSetTreeStatus with indirect (though XTM API), but also
> requires to make some non obvious changes in
> So what are the alternatives?
> 1. Move RecordTransactionCommit to XTM. In this case we have to copy
> original RecordTransactionCommit to DTM implementation and patch it
> here. It is also not nice, because it will complicate maintenance of
> DTM implementation.
> The primary idea of XTM is to allow development of DTM as standard
> PostgreSQL extension without creating of specific clones of main
> PostgreSQL source tree. But this idea will be compromised if we have
> copy&paste some pieces of PostgreSQL code.
> In some sense it is even worser than maintaining separate branch -
> in last case at least we have some way to perfrtom automatic merge.
You can have a call in XTM that says you want to record the the
commit on all participating servers, but I don't see where that
would involve moving anything we have now out of each participating
server -- it would just need to function like a real,
professional-quality distributed transaction manager doing the
second phase of a two-phase commit. If any participating server
goes through the first phase and reports that all the heavy lifting
is done, and then is swallowed up in a pyroclastic flow of an
erupting volcano before phase 2 comes around, the DTM must
periodically retry until the administrator cancels the attempt.
> 2. Propose some alternative two-phase commit implementation in
> PostgreSQL core. The main motivation for such "lightweight"
> implementation of 2PC in pg_dtm is that original mechanism of
> prepared transactions in PostgreSQL adds to much overhead.
> In our benchmarks we have found that simple credit-debit banking
> test (without any DTM) works almost 10 times slower with PostgreSQL
> 2PC than without it. This is why we try to propose alternative
> solution (right now pg_dtm is 2 times slower than vanilla
> PostgreSQL, but it not only performs 2PC but also provide consistent
Are you talking about 10x the latency on a commit, or that the
overall throughput under saturation load is one tenth of running
without something to guarantee the transactional integrity of the
whole set of nodes? The former would not be too surprising, while
the latter would be rather amazing.
One interesting coincidence is that I've been told off-list about a
large company which developed their own internal distributed
version of PostgreSQL in which they implemented Serializable
Snapshot Isolation, and reported a 10x performance impact. They
were reportedly fine with that, since the integrity of their data
was worth it to them and it performed better than the alternatives;
but the number matching like that does make me wonder how hard it
would be to break that barrier in terms of latency, if you really
preserve ACID properties across a distributed system.
Of course, latency and throughput are two completely different
> May be somebody can suggest some other solution?
Well, it seems like it might be possible for some short cuts to be
taken when the part of the distributed transaction run on a
particular server was read-only. I'm not sure of the details, but
it seems conceptually possible to minimize network latency issues.
> Or give some comments concerning current approach?
I don't see how the DTM approach described could provide anything
resembling ACID guarantees; but if you want to give up all claim to
that, you might be able to provide something useful with a more lax
set of assurances. (After all, there are some popular products out
there which are pretty "relaxed" about such things.) In that case
maybe the XTM interface could include some way to query what
guarantees an implementation does provide. I don't mean this
sarcastically -- there are use cases for different levels of
The Enterprise PostgreSQL Company
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: