On 2020-09-18 00:54, Bruce Momjian wrote:
On Tue, Sep  8, 2020 at 01:36:16PM +0300, Alexey Kondratov wrote:
Thank you for the link!

After a quick look on the Sawada-san's patch set I think that there are two
major differences:

1. There is a built-in foreign xacts resolver in the [1], which should be much more convenient from the end-user perspective. It involves huge in-core
changes and additional complexity that is of course worth of.

However, it's still not clear for me that it is possible to resolve all foreign prepared xacts on the Postgres' own side with a 100% guarantee. Imagine a situation when the coordinator node is actually a HA cluster group (primary + sync + async replica) and it failed just after PREPARE stage of
after local COMMIT. In that case all foreign xacts will be left in the
prepared state. After failover process complete synchronous replica will become a new primary. Would it have all required info to properly resolve
orphan prepared xacts?

Probably, this situation is handled properly in the [1], but I've not yet finished a thorough reading of the patch set, though it has a great doc!

On the other hand, previous 0003 and my proposed patch rely on either manual resolution of hung prepared xacts or usage of external monitor/resolver. This approach is much simpler from the in-core perspective, but doesn't look
as complete as [1] though.

Have we considered how someone would clean up foreign transactions if the coordinating server dies? Could it be done manually? Would an external
resolver, rather than an internal one, make this easier?

Both Sawada-san's patch [1] and in this thread (e.g. mine [2]) use 2PC with a special gid format including a xid + server identification info. Thus, one can select from pg_prepared_xacts, get xid and coordinator info, then use txid_status() on the coordinator (or ex-coordinator) to get transaction status and finally either commit or abort these stale prepared xacts. Of course this could be wrapped into some user-level support routines as it is done in the [1].

As for the benefits of using an external resolver, I think that there are some of them from the whole system perspective:

1) If one follows the logic above, then this resolver could be stateless, it takes all the required info from the Postgres nodes themselves.

2) Then you can easily put it into container, which make it easier do deploy to all these 'cloud' stuff like kubernetes.

3) Also you can scale resolvers independently from Postgres nodes.

I do not think that either of these points is a game changer, but we use a very simple external resolver altogether with [2] in our sharding prototype and it works just fine so far.


[1] https://www.postgresql.org/message-id/CA%2Bfd4k4HOVqqC5QR4H984qvD0Ca9g%3D1oLYdrJT_18zP9t%2BUsJg%40mail.gmail.com

[2] https://www.postgresql.org/message-id/3ef7877bfed0582019eab3d462a43275%40postgrespro.ru

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company


Reply via email to