On Wed, Sep 27, 2017 at 12:11 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Tue, Sep 26, 2017 at 9:50 PM, Robert Haas <robertmh...@gmail.com> wrote:
>> On Tue, Sep 26, 2017 at 5:06 AM, Masahiko Sawada <sawada.m...@gmail.com> 
>> wrote:
>>> Based on the review comment from Robert, I'm planning to do the big
>>> change to the architecture of this patch so that a backend process
>>> work together with a dedicated background worker that is responsible
>>> for resolving the foreign transactions. For the usage of this feature,
>>> it will be almost the same as what this patch has been doing except
>>> for adding a new GUC paramter that controls the number of resovler
>>> process launch. That is, we can have multiple resolver process to keep
>>> latency down.
>> Multiple resolver processes is useful but gets a bit complicated.  For
>> example, if process 1 has a connection open to foreign server A and
>> process 2 does not, and a request arrives that needs to be handled on
>> foreign server A, what happens?  If process 1 is already busy doing
>> something else, probably we want process 2 to try to open a new
>> connection to foreign server A and handle the request.  But if process
>> 1 and 2 are both idle, ideally we'd like 1 to get that request rather
>> than 2.  That seems a bit difficult to get working though.  Maybe we
>> should just ignore such considerations in the first version.
> I understood. I keep it simple in the first version.

While a resolver process is useful for resolving transaction later, it
seems performance effective to try to resolve the prepared foreign
transaction, in post-commit phase, in the same backend which prepared
those for two reasons 1. the backend already has a connection to that
foreign server 2. it has just run some commands to completion on that
foreign server, so it's highly likely that a COMMIT PREPARED would
succeed too. If we let a resolver process do that, we will spend time
in 1. signalling resolver process 2. setting up a connection to the
foreign server and 3. by the time resolver process tries to resolve
the prepared transaction the foreign server may become unavailable,
thus delaying the resolution.

Said that, I agree that post-commit phase doesn't have a transaction
of itself, and thus any catalog lookup, error reporting is not
possible. We will need some different approach here, which may not be
straight forward. So, we may need to delay this optimization for v2. I
think we have discussed this before, but I don't find a mail off-hand.

>>> * Resovler processes
>>> 1. Fetch PGPROC entry from the shmem queue and get its XID (say, XID-a).
>>> 2. Get the fdw_xact_state entry from shmem hash by XID-a.
>>> 3. Iterate fdw_xact entries using the index, and resolve the foreign
>>> transactions.
>>> 3-a. If even one foreign transaction failed to resolve, raise an error.
>>> 4. Change the waiting backend state to FDWXACT_COMPLETED and release it.
>> Comments:
>> - Note that any error we raise here won't reach the user; this is a
>> background process.  We don't want to get into a loop where we just
>> error out repeatedly forever -- at least not if there's any other
>> reasonable choice.
> Thank you for the comments.
> Agreed.

We should probably log an error message in the server log, so that
DBAs are aware of such a failure. Is that something you are
considering to do?

>> - I suggest that we ought to track the status for each XID separately
>> on each server rather than just track the XID status overall.  That
>> way, if transaction resolution fails on one server, we don't keep
>> trying to reconnect to the others.
> Agreed. In the current patch we manage fdw_xact entries that track the
> status for each XID separately on each server. I'm going to use the
> same mechanism. The resolver process get an target XID from shmem
> queue and get the all fdw_xact entries associated with the XID from
> the fdw_xact array in shmem. But since the scanning the whole fdw_xact
> entries could be slow because the number of entry of fdw_xact array
> could be a large number (e.g, max_connections * # of foreign servers),
>  I'm considering to have a linked list of the all fdw_xact entries
> associated with same XID, and to have a shmem hash pointing to the
> first fdw_xact entry of the linked lists for each XID. That way, we
> can find the target fdw_xact entries from the array in O(1).

If we want to do something like this, would it be useful to use a data
structure similar to what is used for maintaining subtrasactions? Just
a thought.

>> - If we go to resolve a remote transaction and find that no such
>> remote transaction exists, what should we do?  I'm inclined to think
>> that we should regard that as if we had succeeded in resolving the
>> transaction.  Certainly, if we've retried the server repeatedly, it
>> might be that we previously succeeded in resolving the transaction but
>> then the network connection was broken before we got the success
>> message back from the remote server.  But even if that's not the
>> scenario, I think we should assume that the DBA or some other system
>> resolved it and therefore we don't need to do anything further.  If we
>> assume anything else, then we just go into an infinite error loop,
>> which isn't useful behavior.  We could log a message, though (for
>> example, LOG: unable to resolve foreign transaction ... because it
>> does not exist).
> Agreed.

Yes. I think the current patch takes care of this, except probably the
error message.

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to