Thank you for your feedback.
My comments are inside.
On 11/07/2015 05:11 PM, Amit Kapila wrote:
Today, while studying your proposal and related material, I noticed
that in both the approaches DTM and tsDTM, you are talking about
committing a transaction and acquiring the snapshot consistently, but
not touched upon the how the locks will be managed across nodes and
how deadlock detection across nodes will work. This will also be one
of the crucial points in selecting one of the approaches.
Lock manager is one of the tasks we are currently working on.
There are still a lot of open questions:
1. Should distributed lock manager (DLM) do something else except detection of
2. Should DLM be part of XTM API or it should be separate API?
3. Should DLM be implemented by separate process or should it be part of
4. How to globally identify resource owners (0transactions) in global lock
graph. In case of DTM we have global (shared) XIDs,
and in tsDTM - global transactions IDs, assigned by application (which is not
so clear how to retrieve).
In other cases we may need to have local->global transaction id mapping, so
looks like DLM should be part of DTM...
Also I have
noticed that discussion about Rollback is not there, example how will
Rollback happen with API's provided in your second approach (tsDTM)?
In tsDTM approach two phase commit is performed by coordinator and currently is
using standard PostgreSQL two phase commit:
Code in GO performing two phase commit:
exec(conn1, "prepare transaction '" + gtid + "'")
exec(conn2, "prepare transaction '" + gtid + "'")
exec(conn1, "select dtm_begin_prepare($1)", gtid)
exec(conn2, "select dtm_begin_prepare($1)", gtid)
csn = _execQuery(conn1, "select dtm_prepare($1, 0)", gtid)
csn = _execQuery(conn2, "select dtm_prepare($1, $2)", gtid, csn)
exec(conn1, "select dtm_end_prepare($1, $2)", gtid, csn)
exec(conn2, "select dtm_end_prepare($1, $2)", gtid, csn)
exec(conn1, "commit prepared '" + gtid + "'")
exec(conn2, "commit prepared '" + gtid + "'")
If commit at some of the nodes failed, coordinator should rollback prepared
transaction at all nodes.
Similarly, having some discussion on parts of recovery that could be affected
would be great.
We are currently implementing fault tolerance and recovery for DTM approach
(with centralized arbiter).
There are several replicas of arbiter, synchronized using RAFT protocol.
But with tsDTM approach recovery model is still obscure...
We are thinking about it.
I think in this patch, it is important to see the completeness of all the
API's that needs to be exposed for the implementation of distributed
transactions and the same is difficult to visualize without having complete
picture of all the components that has some interaction with the distributed
transaction system. On the other hand we can do it in incremental fashion
as and when more parts of the design are clear.
That is exactly what we are going to do - we are trying to integrate DTM with
existed systems (pg_shard, postgres_fdw, BDR) and find out what is missed and
should be added. In parallel we are trying to compare efficiency and
scalability of different solutions.
For example we still considering scalability problems with tsDTM approach: to provide acceptable performance, it requires very precise clock synchronization (we have to use PTP instead of NTP). So it may be waste of time trying to provide fault tolerance
for tsDTM if we finally found out that this approach can not provide better scalability than simpler DTM approach.
EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>