On 2020-09-09 08:35, Masahiko Sawada wrote:
On Wed, 9 Sep 2020 at 02:00, Alexey Kondratov
<a.kondra...@postgrespro.ru> wrote:
On 2020-09-08 14:48, Fujii Masao wrote:
>
> IIUC, yes, the information required for automatic resolution is
> WAL-logged and the standby tries to resolve those orphan transactions
> from WAL after the failover. But Sawada-san's patch provides
> the special function for manual resolution, so there may be some cases
> where manual resolution is necessary.
>
I've found a note about manual resolution in the v25 0002:
+After that we prepare all foreign transactions by calling
+PrepareForeignTransaction() API. If we failed on any of them we
change
to
+rollback, therefore at this time some participants might be prepared
whereas
+some are not prepared. The former foreign transactions need to be
resolved
+using pg_resolve_foreign_xact() manually and the latter ends
transaction
+in one-phase by calling RollbackForeignTransaction() API.
but it's not yet clear for me.
Sorry, the above description in README is out of date. In the v25
patch, it's true that if a backend fails to prepare a transaction on a
foreign server, it’s possible that some foreign transactions are
prepared whereas others are not. But at the end of the transaction
after changing to rollback, the process does rollback (or rollback
prepared) all of them. So the use case of pg_resolve_foreign_xact() is
to resolve orphaned foreign prepared transactions or to resolve a
foreign transaction that is not resolved for some reasons, bugs etc.
OK, thank you for the explanation!
Once the transaction is committed locally any ERROR (or higher level
message) will be escalated to PANIC.
I think this is true only inside the critical section and it's not
necessarily true for all errors happening after the local commit,
right?
It's not actually related to critical section errors escalation. Any
error in the backend after the local commit and
ProcArrayEndTransaction() will try to abort the current transaction and
do RecordTransactionAbort(), but it's too late to do so and PANIC will
be risen:
/*
* Check that we haven't aborted halfway through
RecordTransactionCommit.
*/
if (TransactionIdDidCommit(xid))
elog(PANIC, "cannot abort transaction %u, it was already
committed",
xid);
At least that's how I understand it.
And I do see possible ERROR level
messages in the postgresCommitForeignTransaction() for example:
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ ereport(ERROR, (errmsg("could not commit transaction
on server %s",
+
frstate->server->servername)));
I don't think that it's very convenient to get a PANIC every time we
failed to commit one of the prepared foreign xacts, since it could be
not so rare in the distributed system. That's why I tried to get rid
of
possible ERRORs as far as possible in my proposed patch.
In my patch, the second phase of 2PC is executed only by the resolver
process. Therefore, even if an error would happen during committing a
foreign prepared transaction, we just need to relaunch the resolver
process and trying again. During that, the backend process will be
just waiting. If a backend process raises an error after the local
commit, the client will see transaction failure despite the local
transaction having been committed. An error could happen even by
palloc. So the patch uses a background worker to commit prepared
foreign transactions, not by backend itself.
Yes, if it's a background process, then it seems to be safe.
BTW, it seems that I've chosen a wrong thread for posting my patch and
staring a discussion :) Activity from this thread moved to [1] and you
solution with built-in resolver is discussed [2]. I'll try to take a
look on v25 closely and write to [2] instead.
[1]
https://www.postgresql.org/message-id/2020081009525213277261%40highgo.ca
[2]
https://www.postgresql.org/message-id/CAExHW5uBy9QwjdSO4j82WC4aeW-Q4n2ouoZ1z70o%3D8Vb0skqYQ%40mail.gmail.com
Regards
--
Alexey Kondratov
Postgres Professional https://www.postgrespro.com
Russian Postgres Company