On 09.11.2015 07:46, Amit Kapila wrote:
On Sat, Nov 7, 2015 at 10:23 PM, Konstantin Knizhnik
<k.knizh...@postgrespro.ru <mailto:k.knizh...@postgrespro.ru>> wrote:
Lock manager is one of the tasks we are currently working on.
There are still a lot of open questions:
1. Should distributed lock manager (DLM) do something else except
detection of distributed deadlock?
I think so. Basically DLM should be responsible for maintaining
all the lock information which inturn means that any backend process
that needs to acquire/release lock needs to interact with DLM, without
I don't think even global deadlock detection can work (to detect deadlocks
among all the nodes, it needs to know the lock info of all nodes).
I hope that it will not needed, otherwise it will add significant
Unless I missed something, locks can still managed locally, but we need
DLM to detect global deadlocks.
Deadlock detection in PostgreSQL is performed only after timeout
expiration for lock waiting. Only then lock graph is analyzed for
presence of loops. At this moment we can send our local lock graph to
DLM. If it is really distributed deadlock, then sooner or later
all nodes involved in this deadlock will send their local graphs to DLM
and DLM will be able to build global graph and detect distributed
deadlock. In this scenario DLM will be accessed very rarely - only when
lock can not be granted within deadlock_timeout.
One of the possible problems is false global deadlock detection, because
local lock graphs corresponds to different moments of time, so we can
found "false" loop. I am still thinking about best solution of this problem.
This is somewhat inline with what currently we do during lock conflicts,
i.e if the backend incur a lock conflict and decides to sleep, before
sleeping it tries to detect an early deadlock, so if we make DLM
for managing locks the same could be even achieved in distributed system.
2. Should DLM be part of XTM API or it should be separate API?
We might be able to do it either ways (make it part of XTM API or devise a
separate API). I think here more important point is to first get the
design for Distributed Transactions (which primarily includes consistent
Commit/Rollback, snapshots, distributed lock manager (DLM) and recovery).
3. Should DLM be implemented by separate process or should it be
part of arbiter (dtmd).
That's important decision. I think it will depend on which kind of design
we choose for distributed transaction manager (arbiter based solution or
non-arbiter based solution, something like tsDTM). I think DLM should be
separate, else arbiter will become hot-spot with respect to contention.
There are pros and contras.
Pros for integrating DLM in DTM:
1. DTM arbiter has information about local to global transaction ID
mapping which may be needed to DLM
2. If my assumptions about DLM are correct, then it will be accessed
relatively rarely and should not cause significant
impact on performance.
1. tsDTM doesn't need centralized arbiter but still needs DLM
2. Logically DLM and DTM are independent components
Can you please explain more about tsDTM approach, how timestamps
are used, what is exactly CSN (is it Commit Sequence Number) and
how it is used in prepare phase? IS CSN used as timestamp?
Is the coordinator also one of the PostgreSQL instance?
In tsDTM approach system time (in microseconds) is used as CSN (commit
We also enforce that assigned CSN is unique and monotonic within
CSN are assigned locally and do not require interaction with some other
This is why in theory tsDTM approach should provide good scalability.
I think in this patch, it is important to see the completeness of
API's that needs to be exposed for the implementation of distributed
transactions and the same is difficult to visualize without
picture of all the components that has some interaction with the
transaction system. On the other hand we can do it in
as and when more parts of the design are clear.
That is exactly what we are going to do - we are trying to
integrate DTM with existed systems (pg_shard, postgres_fdw, BDR)
and find out what is missed and should be added. In parallel we
are trying to compare efficiency and scalability of different
One thing, I have noticed that in DTM approach, you seems to
be considering a centralized (Arbiter) transaction manager and
centralized Lock manager which seems to be workable approach,
but I think such an approach won't scale in write-heavy or mixed
read-write workload. Have you thought about distributing responsibility
of global transaction and lock management?
From my point of view there are two different scenarios:
1. When most transactions are local and only few of them are global (for
example most operation in branch of the back are
performed with accounts of clients of this branch, but there are few
transactions involved accounts from different branches).
2. When most or all transactions are global.
It seems to me that first approach is more popular in real life and
actually good performance of distributed system can be achieved only in
case when most transaction are local (involves only one node). There are
several approaches allowing to optimize local transactions. For example
once used in SAP HANA
We have also DTM implementation based on this approach but it is not yet
If most of transaction are global, them affect random subsets of cluster
nodes (so it is not possible to logically split cluster into groups of
tightly coupled nodes) and number of nodes is not very large (<10) then
I do not think that there can be better alternative (from performance
point of view) than centralized arbiter.
But it is only my speculations and it will be really very interesting
for me to know access patterns of real customers, using distributed systems.
I think such a system
might be somewhat difficult to design, but the scalability will be better.
EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>