On Sat, Feb 10, 2018 at 4:08 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Thu, Feb 8, 2018 at 3:58 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
>>> Overall, what's the status of this patch?  Are we hung up on this
>>> issue only, or are there other things?
>>
>> AFAIK there is no more technical issue in this patch so far other than
>> this issue. The patch has tests and docs, and includes all stuff to
>> support atomic commit to distributed transactions: the introducing
>> both the atomic commit ability to distributed transactions and some
>> corresponding FDW APIs, and having postgres_fdw support 2pc. I think
>> this patch needs to be reviewed, especially the functionality of
>> foreign transaction resolution which is re-designed before.
>
> OK.  I'm going to give 0002 a read-through now, but I think it would
> be a good thing if others also contributed to the review effort.
> There is a lot of code here, and there are a lot of other patches
> competing for attention.  That said, here we go:

I appreciate your reviewing. I'll thoroughly update the whole patch
based on your comments and suggestions. Here is answer, question and
my understanding.

>
> The fdw-transactions section of the documentation seems to imply that
> henceforth every FDW must call FdwXactRegisterForeignServer, which I
> think is an unacceptable API break.
>
> It doesn't seem advisable to make this behavior depend on
> max_prepared_foreign_transactions.  I think that it should be an
> server-level option: use 2PC for this server, or not?  FDWs that don't
> have 2PC default to "no"; but others can be set to "yes" if the user
> wishes.  But we shouldn't force that decision to be on a cluster-wide
> basis.

Since I've added a new option two_phase_commit to postgres_fdw we need
to ask FDW whether the foreign server is 2PC-capable server or not in
order to register the foreign server information. That's why the patch
asked FDW to call  FdwXactRegisterForeignServer. However we can
register a foreign server information automatically by executor (e.g.
at  BeginDirectModify and at BeginForeignModify) if a foreign server
itself has that information. We can add two_phase_commit_enabled
column to pg_foreign_server system catalog and that column is set to
true if the foriegn server is 2PC-capable (i.g. has enough functions)
and user want to use it.

>
> But that brings up another issue: why is MyFdwConnections named that
> way and why does it have those semantics?  That is, why do we really
> need a list of every FDW connection? I think we really only need the
> ones that are 2PC-capable writes.  If a non-2PC-capable foreign server
> is involved in the transaction, then we don't really to keep it in a
> list.  We just need to flag the transaction as not locally prepareable
> i.e. clear TwoPhaseReady.  I think that this could all be figured out
> in a much less onerous way: if we ever perform a modification of a
> foreign table, have nodeModifyTable.c either mark the transaction
> non-preparable by setting XACT_FLAGS_FDWNOPREPARE if the foreign
> server is not 2PC capable, or otherwise add the appropriate
> information to MyFdwConnections, which can then be renamed to indicate
> that it contains only information about preparable stuff.  Then you
> don't need each individual FDW to be responsible about calling some
> new function; the core code just does the right thing automatically.

I could not get this comment. Did you mean that the foreign
transaction on not 2PC-capable foreign server should be end in the
same way as before (i.g. by XactCallback)?

Currently, because there is not FDW API to end foreign transaction,
almost FDWs use XactCallbacks to end the transaction. But after
introduced new FDW APIs, I think it's better to use FDW APIs to end
transactions rather than using XactCallbacks. Otherwise we end up with
having FDW APIs for 2PC (prepare and resolve) and XactCallback for
ending the transaction, which would be hard to understand. So I've
changed the foreign transaction management so that core code
explicitly asks FDW to end/prepare a foreign transaction instead of
ending it by individual FDWs. After introduced new FDW APIs, core code
can have the information of all foreign servers involved with the
transaction and call each APIs at appropriate timing.

>
> +        if (!fdw_conn->end_foreign_xact(fdw_conn->serverid, fdw_conn->userid,
> +                                        fdw_conn->umid, true))
> +            elog(WARNING, "could not commit transaction on server %s",
> +                 fdw_conn->servername);
>
> First, as I noted above, this assumes that the local transaction can't
> fail after this point, which is certainly false -- if nothing else,
> think about a plug pull.  Second, it assumes that the right thing to
> do would be to throw a WARNING, which I think is false: if this is
> running in the pre-commit phase, it's not too late to switch to the
> abort path, and certainly if we make changes on only 1 server and the
> commit on that server fails, we should be rolling back, not committing
> with a warning.  Third, if we did need to restrict ourselves to
> warnings here, that's probably impractical.  This function needs to do
> network I/O, which is not a no-fail operation, and even if it works,
> the remote side can fail in any arbitrary way.
>
>
> +        /*
> +         * Between FdwXactRegisterFdwXact call above till this
> backend hears back
> +         * from foreign server, the backend may abort the local transaction
> +         * (say, because of a signal). During abort processing, it will send
> +         * an ABORT message to the foreign server. If the foreign server has
> +         * not prepared the transaction, the message will succeed. If the
> +         * foreign server has prepared transaction, it will throw an error,
> +         * which we will ignore and the prepared foreign transaction will be
> +         * resolved by a foreign transaction resolver.
> +         */
> +        if (!fdw_conn->prepare_foreign_xact(fdw_conn->serverid,
> fdw_conn->userid,
> +                                            fdw_conn->umid, fdw_xact_id))
> +        {
>
> Again, I think this is an impractical API contract.  It assumes that
> prepare_foreign_xact can't throw errors, which is likely to make for a
> lot of implementation problems on the FDW side -- you can't even
> palloc.  You can't call any existing code you've already got that
> might throw an error for any reason.  You definitely can't do a
> syscache lookup.  You can't accept interrupts, even though you're
> doing network I/O that could hang.  The only reason to structure it
> this way is to avoid having a transaction that we think is prepared in
> our local bookkeeping when, on the remote side, it really isn't.  But
> that is an unavoidable problem, because the whole system could crash
> after the remote prepare has been done and before prepare_foreign_xact
> even returns control.  Or we could fail after XLogInsert() and before
> XLogFlush().
>
> AtEOXact_FdwXacts has the same problem.
>
> I think you can fix make this a lot cleaner if you make it part of the
> API contract that resolver shouldn't fail when asked to roll back a
> transaction that was never prepared.  Then it can work like this: just
> before we prepare the remote transaction, we note the information in
> shared memory and in the WAL.  If that fails, we just abort.  When the
> resolver sees that our xact is no longer running and did not commit,
> it concludes that we must have failed and tries to roll back all the
> remotely-prepared xacts.  If they don't exist, then the PREPARE
> failed; if they do, then we roll 'em back.  On the other hand, if all
> of the prepares succeed, then we commit.  Seeing that our XID is no
> longer running and did commit, the resolver tries to commit those
> remotely-prepared xacts.  In the commit case, we log a complaint if we
> don't find the xact, but in the abort case, it's got to be an expected
> outcome.  If we really wanted to get paranoid about this, we could log
> a WAL record after finishing all the prepares -- or even each prepare
> -- saying "hey, after this point the remote xact should definitely
> exist!".  And then the resolver could complaints a nonexistent remote
> xact when rolling back only if that record hasn't been issued yet.
> But to me that looks like overkill; the window between issuing that
> record and issuing the actual commit record would be very narrow, and
> in most cases they would be flushed together anyway.  We could of
> course force the new record to be separately flushed first, but that's
> just making everything slower for no gain.

In FdwXactResolveForeignTranasction(), resolver concludes the fate of
transaction by seeing the status of fdwxact entry and the state of
local transaction in clog. what I need to do is making that function
log a complaint in commit case if couldn't find the prepared
transaction, and not do that in abort case. Also, postgres_fdw don't
raise an error even if we could not find prepared transaction on
foreign server because it might have been resolved by other process.
But this is now responsible by FDW. I should change it to resolver
side. That is, FDW can raise error in ordinarily way but core code
should catch and process it.

>
> Note that this requires that we not truncate away portions of clog
> that contain commit status information about no-longer-running
> transactions that have unresolved FDW 2PC xacts, or at least that we
> issue a WAL record updating the state of the fdw_xact so that it
> doesn't refer to that portion of clog any more - e.g. by setting the
> XID to either FrozenTransactionId or InvalidTransactionId, though that
> would be a problem since it would break the unique-XID assumption.   I
> don't see the patch doing either of those things right now, although I
> may have missed it.  Note that here again the differences between
> local 2PC and FDW 2PC really make a difference.  Local 2PC has a
> PGPROC+PGXACT, so the regular snaphot-taking code suffices to prevent
> clog truncation, because the relevant XIDs are showing up in
> snapshots.  The PrescanPreparedTransactions stuff only needs to nail
> things down until we reach consistency, and then the regular
> mechanisms take over.  We have no such easy out for this system.
>

You're right. Perhaps we can deal with it by PrescanFdwXacts until
reach consistent point, and then have vac_update_datfrozenxid check
local xids of un-resolved fdwxact to determine the new datfrozenxid.
Since the local xids of un-resolved fdwxacts would not be relevant
with vacuuming, we don't need to include it to snapshot and
GetOldestXmin etc. Also we hint to resolve fdwxact when near
wraparound.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Reply via email to