Re: [PATCH] Fix CommitTransactionCommand() to CallXactCallbacks() in TBLOCK_ABORT_END

2020-03-28 Thread Dave Sharpe
From Gilles Darold  on 2020-03-26T16:09:04.
> Actually the callback function is called when the error is thrown:

> psql:eat_rollback2.sql:20: INFO:  0: myTransactionCallback() XactEvent 2 
> (is abort) level 1 <-
> LOCATION:  myTransactionCallback, eat_rollback.c:52
> psql:eat_rollback2.sql:20: ERROR:  XX000: no no no
> LOCATION:  mySubtransactionCallback, eat_rollback.c:65

> this is probably why the callback is not called on the subsequent ROLLBACK 
> execution because abort processing is
>  already done (src/backend/access/transam/xact.c:3890).
So I withdraw this patch and fix. The callback during the error will drive the 
ROLLBACK remote, as required in the fdw.
Great catch, thanks Gilles!

Cheers,
Dave



[PATCH] Fix CommitTransactionCommand() to CallXactCallbacks() in TBLOCK_ABORT_END

2020-03-24 Thread Dave Sharpe
Hi pghackers,
This is my first time posting here ...  Gilles Darold and I are developing a 
new FDW which is based on the contrib/postgres_fdw. The postgres_fdw logic uses 
a RegisterXactCallback function to send local transactions remote. But I found 
that a registered XactCallback is not always called for a successful client 
ROLLBACK statement. So, a successful local ROLLBACK does not get sent remote by 
FDW, and now the local and remote transaction states are messed up, out of 
sync. The local database is "eating" the successful rollback.

I attach a git format-patch file 
0001-Fix-CommitTransactionCommand-to-CallXactCallbacks-in.patch
The patch fixes the problem and is ready to commit as far as I can tell. It 
adds some comment lines and three lines of code to CommitTransactionCommand() 
in the TBLOCK_ABORT_END case. Line (1) to reset the transaction's blockState 
back to TBLOCK_ABORT, ahead of (2) a new call to callXactCallbacks(). If the 
callback returns successful (which is usually the case), (3) the new code 
switches back to TBLOCK_ABORT_END, then continues with old code to 
CleanupTransaction() as before. If any callback does error out, the 
TBLOCK_ABORT was the correct block state for an error.

I have not added a regression test since my scenario involves a C module ... I 
didn't see such a regression test, but somebody can teach me where to put the C 
module and Makefile if a regression test should be added. Heads up that the 
scenario to hit this is a bit complex (to me) ... I attach the following unit 
test files:
1) eat_rollback.c, _PG_init() installs my_utility_hook() for INFO log for 
debugging.
RegisterSubXactCallback(mySubtransactionCallback) which injects some logging 
and an ERROR for savepoints, i.e., negative testing, e.g., like a remote FDW 
savepoint failed.
And RegisterXactTransaction(myTransactionCallback) with info logging.
2) Makefile, to make the eat_rollback.so
3) eat_rollback2.sql, drives the fail sequence, especially the SAVEPOINT, which 
errors out, then later a successful ROLLBACK gets incorrectly eaten (no 
CALLBACK info logging, demonstrates the bug), then another successful ROLLBACK 
(now there is CALLBACK info logging).
4) eat_rollback2.out, output without the fix, the rollback at line 68 is 
successful but there is not myTransactionCallback() INFO output
5) eat_rollback2.fixed, output after the fix, the rollback at line 68 is 
successful, and now there is a myTransactionCallback() INFO log. Success!

It first failed for me in v11.1, I suspect it failed since before then too. And 
it is still failing in current master.

Bye the way, we worked around the bug in our FDW by handling sub/xact in the 
utility hook. A transaction callback is still needed for implicit, internal 
rollbacks. Another advantage of the workaround is (I think) it reduces the code 
complexity and improves performance because the entire subxact stack is not 
unwound to drive each and every subtransaction level to remote. Also 
sub/transaction statements are sent remote as they arrive (local and remote are 
more "transactionally" synced, not stacked by FDW for later). And of course, 
the workaround doesn't hit the above bug, since our utility hook correctly 
handles the client ROLLBACK. If it makes sense to the community, I could try 
and patch contrib/postgres_fdw to do what we are doing. But note that I don't 
need it myself: we have our own new FDW for remote DB2 z/OS (!) ... at LzLabs 
we are a building a software defined mainframe with PostgreSQL (of all things).

Hope it helps!

Dave Sharpe
I don't necessarily agree with everything I say. (MM)
www.lzlabs.com


0001-Fix-CommitTransactionCommand-to-CallXactCallbacks-in.patch
Description:  0001-Fix-CommitTransactionCommand-to-CallXactCallbacks-in.patch


eat_rollback.c
Description: eat_rollback.c


Makefile
Description: Makefile


eat_rollback2.sql
Description: eat_rollback2.sql


eat_rollback2.out
Description: eat_rollback2.out


eat_rollback2.fixed
Description: eat_rollback2.fixed