On Thu, May 4, 2017 at 8:08 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Thu, May 4, 2017 at 7:13 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>>> - For bonus points, give pgfdw_exec_query() an optional timeout
>>> argument, and set it to 30 seconds or so when we're doing abort
>>> cleanup.  If the timeout succeeds, it errors out (which again
>>> re-enters the abort path, dropping the connection and forcing outer
>>> transaction levels to abort as well).
>> Why do we need this 30 seconds timeout for abort cleanup failures?
>> Isn't it sufficient if we propagate the error only on failures?
> Well, the problem is that right now we're waiting for vast amounts of
> time when the remote connection dies.  First, the user notices that
> the command is running for too long and hits ^C.  At that point, the
> connection is probably already dead, and the user's been waiting for a
> while already.  Then, we wait for PQcancel() to time out.  Then, we
> wait for PQexec() to time out.  The result of that, as Suraj mentioned
> in the original email, is that it takes about 16 minutes for the
> session to recover when the connection dies, even with keepalive
> settings and connection timeout set all the way to the minimum.  The
> goal here is to make it take a more reasonable time to recover.
> Without modifying libpq, we can't do anything about the need to wait
> for PQcancel() to time out, but we can improve the rest of it.  If we
> removed that 30-second timeout, we would win in the case where either
> ABORT TRANSACTION or ROLLBACK; RELEASE eventually succeeds but takes
> more than 30 seconds.  In that case, the change you are proposing
> would allow us to reuse the connection instead of dropping it, and
> would prevent a forced toplevel abort when subtransactions are in use.
> However, the cost would be that when one of those commands never
> succeeds, we would wait a lot longer before finishing a transaction
> abort.

As soon as the first command fails due to timeout, we will set
'abort_cleanup_failure' which will make a toplevel transaction to
abort and also won't allow other statements to execute.  The patch is
trying to enforce a 30-second timeout after statement execution, so it
has to anyway wait till the command execution completes (irrespective
of whether the command succeeds or fail).  It is quite possible I am
missing some point, is it possible for you to tell in somewhat more
detail in which exact case 30-second timeout will allow us to abort
the toplevel transaction if we already do that in the case of
statement failure case?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to