On Fri, Jul 28, 2017 at 10:14 AM, Michael Paquier <michael.paqu...@gmail.com> wrote: > On Fri, Jul 28, 2017 at 7:28 AM, Masahiko Sawada <sawada.m...@gmail.com> > wrote: >> That also requires to share the same XID space with all remote nodes. > > You are putting your finger on the main bottleneck with global > consistency that XC and XL has because of that. And the source feeding > the XIDs is a SPOF. > >> Perhaps the CSN based snapshot can make this more simple. > > Hm. This needs a closer look.
With or without CSNs, sharing the same XID space across all nodes is undesirable in a loosely-coupled network. If only a small fraction of transactions are distributed, incurring the overhead of synchronizing XID assignment for every transaction is not good. Suppose node A processes many transactions and node B only a few transactions; then, XID advancement caused by node A forces node B to perform vacuum for wraparound. Not fun. Or, if you have an OLTP workload running on A and an OLTP workload running B that are independent of each other, and occasional reporting queries that scan both, you'll be incurring the overhead of keeping A and B consistent for a lot of transactions that don't need it. Of course, when A and B are tightly coupled and basically all transactions are scanning both, locking the XID space together *may* be the best approach, but even then there are notable disadvantages - e.g. they can't both continue processing write transactions if the connection between the two is severed. An alternative approach is to have some kind of other identifier, let's call it a distributed transaction ID (DXID) which is mapped by each node onto a local XID. Regardless of whether we share XIDs or DXIDs, we need a more complex concept of transaction state than we have now. Right now, transactions are basically in-progress, committed, or aborted, but there's also the state where the status of the transaction is known by someone but not locally. You can imagine something like: during the prepare phase, all nodes set the status in clog to "prepared". Then, if that succeeds, the leader changes the status to "committed" or "aborted" and tells all nodes to do the same. Thereafter, any time someone inquires about the status of that transaction, we go ask all of the other nodes in the cluster; if they all think it's prepared, then it's prepared -- but if any of them think it's committed or aborted, then we change our local status to match and return that status. So once one node changes the status to committed or aborted it can propagate through the cluster even if connectivity is lost for a while. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers