Re: [HACKERS] Transactions involving multiple postgres foreign servers

Jim Nasby Fri, 09 Jan 2015 16:03:39 -0800

On 1/8/15, 12:00 PM, Kevin Grittner wrote:

Robert Haas <robertmh...@gmail.com> wrote:

On Thu, Jan 8, 2015 at 10:19 AM, Kevin Grittner <kgri...@ymail.com> wrote:

Robert Haas <robertmh...@gmail.com> wrote:

Andres is talking in my other ear suggesting that we ought to
reuse the 2PC infrastructure to do all this.


If you mean that the primary transaction and all FDWs in the
transaction must use 2PC, that is what I was saying, although
apparently not clearly enough.  All nodes *including the local one*
must be prepared and committed with data about the nodes saved
safely off somewhere that it can be read in the event of a failure
of any of the nodes *including the local one*.  Without that, I see
this whole approach as a train wreck just waiting to happen.


Clearly, all the nodes other than the local one need to use 2PC.  I am
unconvinced that the local node must write a 2PC state file only to
turn around and remove it again almost immediately thereafter.


The key point is that the distributed transaction data must be
flagged as needing to commit rather than roll back between the
prepare phase and the final commit.  If you try to avoid the
PREPARE, flagging, COMMIT PREPARED sequence by building the
flagging of the distributed transaction metadata into the COMMIT
process, you still have the problem of what to do on crash
recovery.  You really need to use 2PC to keep that clean, I think.


If we had an independent transaction coordinator then I agree with you Kevin. I 
think Robert is proposing that if we are controlling one of the nodes that's 
participating as well as coordinating the overall transaction that we can take 
some shortcuts. AIUI a PREPARE means you are completely ready to commit. In 
essence you're just waiting to write and fsync the commit message. That is in 
fact the state that a coordinating PG node would be in by the time everyone 
else has done their prepare. So from that standpoint we're OK.

Now, as soon as ANY of the nodes commit, our coordinating node MUST be able to 
commit as well! That would require it to have a real prepared transaction of 
it's own created. However, as long as there is zero chance of any other 
prepared transactions committing before our local transaction, that step isn't 
actually needed. Our local transaction will either commit or abort, and that 
will determine what needs to happen on all other nodes.

I'm ignoring the question of how the local node needs to store info about the 
other nodes in case of a crash, but AFAICT you could reliably recover manually 
from what I just described.

I think the question is: are we OK with "going under the skirt" in this 
fashion? Presumably it would provide better performance, whereas forcing ourselves to eat 
our own 2PC dogfood would presumably make it easier for someone to plugin an external 
coordinator instead of using our own. I think there's also a lot to be said for getting a 
partial implementation of this available today (requiring manual recovery), so long as 
it's not in core.

BTW, I found 
https://www.cs.rutgers.edu/~pxk/417/notes/content/transactions.html a useful 
read, specifically the 2PC portion.

I'm not really clear on the mechanism that is being proposed for
doing this, but one way would be to have the PREPARE of the local
transaction be requested explicitly and to have that cause all FDWs
participating in the transaction to also be prepared.  (That might
be what Andres meant; I don't know.)


We want this to be client-transparent, so that the client just says
COMMIT and everything Just Works.


What about the case where one or more nodes doesn't support 2PC.
Do we silently make the choice, without the client really knowing?


We abort. (Unless we want to have a running_with_scissors GUC...)

That doesn't strike me as the
only possible mechanism to drive this, but it might well be the
simplest and cleanest.  The trickiest bit might be to find a good
way to persist the distributed transaction information in a way
that survives the failure of the main transaction -- or even the
abrupt loss of the machine it's running on.


I'd be willing to punt on surviving a loss of the entire machine.  But
I'd like to be able to survive an abrupt reboot.


As long as people are aware that there is an urgent need to find
and fix all data stores to which clusters on the failed machine
were connected via FDW when there is a hard machine failure, I
guess it is OK.  In essence we just document it and declare it to
be somebody else's problem.  In general I would expect a
distributed transaction manager to behave well in the face of any
single-machine failure, but if there is one aspect of a
full-featured distributed transaction manager we could give up, I
guess that would be it.


ISTM that one option here would be to "simply" write and sync WAL record(s) of 
all externally prepared transactions. That would be enough for a hot standby to find all 
the other servers and tell them to either commit or abort, based on whether our local 
transaction committed or aborted. If you wanted, you could even have the standby be 
responsible for telling all the other participants to commit...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

Reply via email to