Tom Lane wrote:
At the API level, I like the PREPARE/COMMIT/ROLLBACK statements, but I think you have missed a bet in that it needs to be possible to issue "COMMIT PREPARED gid" for the same gid several times without error. Consider a scenario where the transaction monitor crashes during the commit phase. When it recovers, it will be aware that it had committed to commit, but it won't know which nodes were successfully committed. So it will need to resend the COMMIT commands. It would be bad for the nodes to simply say "yes boss" if they are told to COMMIT a gid they have no record of. So I think the gid's have to stick around after COMMIT PREPARED or ROLLBACK PREPARED, and there needs to be a fourth command (RELEASE PREPARED?) to actually remove the state data when the transaction monitor is satisfied that everything's done. RELEASE of an unknown gid is okay to be a no-op.
Isn't this usually where the GTM would issue "recover" requests to determine the state of the individual resources involved in the global transaction, and then only commit/abort the resources that need it? (I think the equivalent in Heikki's work is a SELECT of the pg_prepared_xact view)
I found the Berkeley DB distributed transaction docs quite useful for working out how two-phase commit fits together:
http://pybsddb.sourceforge.net/ref/xa/intro.html
I would be inclined to require GIDs to be numbers (probably int8's) instead of strings, so that we don't have any problems with funny characters in the file names. That's negotiable though, as we could certainly uuencode the strings or something to avoid that trap.
Aren't the GIDs generated externally by the GTM? We need more than an int8 there. See for example Heikki's JDBC driver patch: it is given a javax.transaction.xa.Xid by the TM in prepare/commit/etc. The Xid is basically just a couple of raw bytearrays. The driver base64-encodes that into a string GID to give to the backend.
-O
---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings