On Fri, Jul 28, 2017 at 10:14 AM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Fri, Jul 28, 2017 at 7:28 AM, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote:
>> That also requires to share the same XID space with all remote nodes.
> You are putting your finger on the main bottleneck with global
> consistency that XC and XL has because of that. And the source feeding
> the XIDs is a SPOF.
>> Perhaps the CSN based snapshot can make this more simple.
> Hm. This needs a closer look.

With or without CSNs, sharing the same XID space across all nodes is
undesirable in a loosely-coupled network.  If only a small fraction of
transactions are distributed, incurring the overhead of synchronizing
XID assignment for every transaction is not good.  Suppose node A
processes many transactions and node B only a few transactions; then,
XID advancement caused by node A forces node B to perform vacuum for
wraparound.  Not fun.  Or, if you have an OLTP workload running on A
and an OLTP workload running B that are independent of each other, and
occasional reporting queries that scan both, you'll be incurring the
overhead of keeping A and B consistent for a lot of transactions that
don't need it.  Of course, when A and B are tightly coupled and
basically all transactions are scanning both, locking the XID space
together *may* be the best approach, but even then there are notable
disadvantages - e.g. they can't both continue processing write
transactions if the connection between the two is severed.

An alternative approach is to have some kind of other identifier,
let's call it a distributed transaction ID (DXID) which is mapped by
each node onto a local XID.

Regardless of whether we share XIDs or DXIDs, we need a more complex
concept of transaction state than we have now.  Right now,
transactions are basically in-progress, committed, or aborted, but
there's also the state where the status of the transaction is known by
someone but not locally.  You can imagine something like: during the
prepare phase, all nodes set the status in clog to "prepared".  Then,
if that succeeds, the leader changes the status to "committed" or
"aborted" and tells all nodes to do the same.  Thereafter, any time
someone inquires about the status of that transaction, we go ask all
of the other nodes in the cluster; if they all think it's prepared,
then it's prepared -- but if any of them think it's committed or
aborted, then we change our local status to match and return that
status.  So once one node changes the status to committed or aborted
it can propagate through the cluster even if connectivity is lost for
a while.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to