On Fri, Apr 7, 2017 at 10:56 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> Vinayak, why did you marked this patch as "Move to next CF"? AFAIU
> there is not discussion yet.

I'd like to discuss this patch.  Clearly, a lot of work has been done
here, but I am not sure about the approach.

If we were to commit this patch set, then you could optionally enable
two_phase_commit for a postgres_fdw foreign server.  If you did, then,
modulo bugs and administrator shenanigans, and given proper
configuration, you would be guaranteed that a successful commit of a
transaction which touched postgres_fdw foreign tables would eventually
end up committed or rolled back on all of the nodes, rather than
committed on some and rolled back on others.  However, you would not
be guaranteed that all of those commits or rollbacks happen at
anything like the same time.  So, you would have a sort of eventual
consistency.  Any given snapshot might not be consistent, but if you
waited long enough and with all nodes online, eventually all
distributed transactions would be resolved in a consistent manner.
That's kinda cool, but I think what people really want is a stronger
guarantee, namely, that they will get consistent snapshots.  It's not
clear to me that this patch gets us any closer to that goal.  Does
anyone have a plan for how we'd get from here to that stronger goal?
If not, is the patch useful enough to justify committing it for what
it can already do?  It would be particularly good to hear some
end-user views on this functionality and whether or not they would use
it and find it valuable.

On a technical level, I am pretty sure that it is not OK to call
AtEOXact_FDWXacts() from the sections of CommitTransaction,
AbortTransaction, and PrepareTransaction that are described as
"non-critical resource releasing".  At that point, it's too late to
throw an error, and it is very difficult to imagine something that
involves a TCP connection to another machine not being subject to
error.  You might say "well, we can just make sure that any problems
are reporting as a WARNING rather than an ERROR", but that's pretty
hard to guarantee; most backend code assumes it can ERROR, so anything
you call is a potential hazard.  There is a second problem, too: any
code that runs from here is not interruptible.  The user can hit ^C
all day and nothing will happen.  That's a bad situation when you're
busy doing network I/O.  I'm not exactly sure what the best thing to
do about this problem would be.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to