On 2020-07-15 15:06, Masahiko Sawada wrote:
On Tue, 14 Jul 2020 at 09:08, Masahiro Ikeda <ikeda...@oss.nttdata.com>
wrote:
> I've attached the latest version patches. I've incorporated the review
> comments I got so far and improved locking strategy.
Thanks for updating the patch!
I have three questions about the v23 patches.
1. messages related to user canceling
In my understanding, there are two messages
which can be output when a user cancels the COMMIT command.
A. When prepare is failed, the output shows that
committed locally but some error is occurred.
```
postgres=*# COMMIT;
^CCancel request sent
WARNING: canceling wait for resolving foreign transaction due to user
request
DETAIL: The transaction has already committed locally, but might not
have been committed on the foreign server.
ERROR: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
CONTEXT: remote SQL command: PREPARE TRANSACTION
'fx_1020791818_519_16399_10'
```
B. When prepare is succeeded,
the output show that committed locally.
```
postgres=*# COMMIT;
^CCancel request sent
WARNING: canceling wait for resolving foreign transaction due to user
request
DETAIL: The transaction has already committed locally, but might not
have been committed on the foreign server.
COMMIT
```
In case of A, I think that "committed locally" message can confuse
user.
Because although messages show committed but the transaction is
"ABORTED".
I think "committed" message means that "ABORT" is committed locally.
But is there a possibility of misunderstanding?
No, you're right. I'll fix it in the next version patch.
I think synchronous replication also has the same problem. It says
"the transaction has already committed" but it's not true when
executing ROLLBACK PREPARED.
Thanks for replying and sharing the synchronous replication problem.
BTW how did you test the case (A)? It says canceling wait for foreign
transaction resolution but the remote SQL command is PREPARE
TRANSACTION.
I think the timing of failure is important for 2PC test.
Since I don't have any good solution to simulate those flexibly,
I use the GDB debugger.
The message of the case (A) is sent
after performing the following operations.
1. Attach the debugger to a backend process.
2. Set a breakpoint to PreCommit_FdwXact() in CommitTransaction().
// Before PREPARE.
3. Execute "BEGIN" and insert data into two remote foreign tables.
4. Issue a "Commit" command
5. The backend process stops at the breakpoint.
6. Stop a remote foreign server.
7. Detach the debugger.
// The backend continues and prepare is failed. TR try to abort all
remote txs.
// It's unnecessary to resolve remote txs which prepare is failed,
isn't it?
8. Send a cancel request.
BTW, I concerned that how to test the 2PC patches.
There are many failure patterns, such as failure timing,
failure server/nw (and unexpected recovery), and those combinations...
Though it's best to test those failure patterns automatically,
I have no idea for now, so I manually check some patterns.
I've incorporated the above your comments in the local branch. I'll
post the latest version patch after incorporating other comments soon.
OK, Thanks.
Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION