Le 25/03/2020 à 03:49, Dave Sharpe a écrit : > > Hi pghackers, > > This is my first time posting here ... Gilles Darold and I are > developing a new FDW which is based on the contrib/postgres_fdw. The > postgres_fdw logic uses a RegisterXactCallback function to send local > transactions remote. But I found that a registered XactCallback is not > always called for a successful client ROLLBACK statement. So, a > successful local ROLLBACK does not get sent remote by FDW, and now the > local and remote transaction states are messed up, out of sync. The > local database is "eating" the successful rollback. > > > > I attach a git format-patch file > 0001-Fix-CommitTransactionCommand-to-CallXactCallbacks-in.patch > > The patch fixes the problem and is ready to commit as far as I can > tell. It adds some comment lines and three lines of code to > CommitTransactionCommand() in the TBLOCK_ABORT_END case. Line (1) to > reset the transaction's blockState back to TBLOCK_ABORT, ahead of (2) > a new call to callXactCallbacks(). If the callback returns successful > (which is usually the case), (3) the new code switches back to > TBLOCK_ABORT_END, then continues with old code to CleanupTransaction() > as before. If any callback does error out, the TBLOCK_ABORT was the > correct block state for an error. > > > > I have not added a regression test since my scenario involves a C > module ... I didn't see such a regression test, but somebody can teach > me where to put the C module and Makefile if a regression test should > be added. Heads up that the scenario to hit this is a bit complex (to > me) ... I attach the following unit test files: > > 1) eat_rollback.c, _/PG_init() installs my/_utility_hook() for INFO > log for debugging. > > RegisterSubXactCallback(mySubtransactionCallback) which injects some > logging and an ERROR for savepoints, i.e., negative testing, e.g., > like a remote FDW savepoint failed. > > And RegisterXactTransaction(myTransactionCallback) with info logging. > > 2) Makefile, to make the eat_rollback.so > > 3) eat_rollback2.sql, drives the fail sequence, especially the > SAVEPOINT, which errors out, then later a successful ROLLBACK gets > incorrectly eaten (no CALLBACK info logging, demonstrates the bug), > then another successful ROLLBACK (now there is CALLBACK info logging). > > 4) eat_rollback2.out, output without the fix, the rollback at line 68 > is successful but there is not myTransactionCallback() INFO output > > 5) eat_rollback2.fixed, output after the fix, the rollback at line 68 > is successful, and now there is a myTransactionCallback() INFO log. > Success! > > > > It first failed for me in v11.1, I suspect it failed since before then > too. And it is still failing in current master. > > > > Bye the way, we worked around the bug in our FDW by handling sub/xact > in the utility hook. A transaction callback is still needed for > implicit, internal rollbacks. Another advantage of the workaround is > (I think) it reduces the code complexity and improves performance > because the entire subxact stack is not unwound to drive each and > every subtransaction level to remote. Also sub/transaction statements > are sent remote as they arrive (local and remote are more > "transactionally" synced, not stacked by FDW for later). And of > course, the workaround doesn't hit the above bug, since our utility > hook correctly handles the client ROLLBACK. If it makes sense to the > community, I could try and patch contrib/postgres_fdw to do what we > are doing. But note that I don't need it myself: we have our own new > FDW for remote DB2 z/OS (!) ... at LzLabs we are a building a software > defined mainframe with PostgreSQL (of all things). > > > > Hope it helps! > > > > Dave Sharpe > > /I don't necessarily agree with everything I say./(MM) > > www.lzlabs.com >
Hi, As I'm involved in this thread I have given a review to this bug report and I don't think that there is a problem here but as a disclaimer my knowledge on internal transaction management is probably not enough to have a definitive opinion. Actually the callback function is called when the error is thrown: psql:eat_rollback2.sql:20: INFO: 00000: myTransactionCallback() XactEvent 2 (is abort) level 1 <------------------- LOCATION: myTransactionCallback, eat_rollback.c:52 psql:eat_rollback2.sql:20: ERROR: XX000: no no no LOCATION: mySubtransactionCallback, eat_rollback.c:65 this is probably why the callback is not called on the subsequent ROLLBACK execution because abort processing is already done (src/backend/access/transam/xact.c:3890). With this simple test and the debug extension: BEGIN; DROP INDEX "index2", "index3"; -- will fail indexes doesn't exist ROLLBACK; the callback function is also called at error level, not on the ROLLBACK: ---------------------------------------------------------------------------------BEGIN psql:eat_rollback-2/sql/eat_rollback3.sql:39: INFO: 00000: my_utility_hook() utility transaction kind 0 (is not rollback) LOCATION: my_utility_hook, eat_rollback.c:28 BEGIN ---------------------------------------------------------------------------------DROP INDEX + error psql:eat_rollback-2/sql/eat_rollback3.sql:41: INFO: 00000: myTransactionCallback() XactEvent 2 (is abort) level 1 <--------------- LOCATION: myTransactionCallback, eat_rollback.c:52 psql:eat_rollback-2/sql/eat_rollback3.sql:41: ERROR: 42704: index "concur_index2" does not exist LOCATION: DropErrorMsgNonExistent, tablecmds.c:1209 ---------------------------------------------------------------------------------ROLLBACK psql:eat_rollback-2/sql/eat_rollback3.sql:43: INFO: 00000: my_utility_hook() utility transaction kind 3 (is rollback) LOCATION: my_utility_hook, eat_rollback.c:28 ROLLBACK So this is probably the expected behavior , however the proposed patch will throw the callback function two times which might not be what we want. ---------------------------------------------------------------------------------BEGIN psql:eat_rollback-2/sql/eat_rollback3.sql:39: INFO: 00000: my_utility_hook() utility transaction kind 0 (is not rollback) LOCATION: my_utility_hook, eat_rollback.c:28 BEGIN ---------------------------------------------------------------------------------DROP INDEX + error psql:eat_rollback-2/sql/eat_rollback3.sql:41: INFO: 00000: myTransactionCallback() XactEvent 2 (is abort) level 1 <-------------- first call LOCATION: myTransactionCallback, eat_rollback.c:52 psql:eat_rollback-2/sql/eat_rollback3.sql:41: ERROR: 42704: index "concur_index2" does not exist LOCATION: DropErrorMsgNonExistent, tablecmds.c:1209 ---------------------------------------------------------------------------------ROLLBACK psql:eat_rollback-2/sql/eat_rollback3.sql:43: INFO: 00000: my_utility_hook() utility transaction kind 3 (is rollback)<-------------- second call LOCATION: my_utility_hook, eat_rollback.c:28 psql:eat_rollback-2/sql/eat_rollback3.sql:43: INFO: 00000: myTransactionCallback() XactEvent 2 (is abort) level 1 LOCATION: myTransactionCallback, eat_rollback.c:52 ROLLBACK But again I may be wrong. -- Gilles Darold http://www.darold.net/