Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw

Kyotaro HORIGUCHI Wed, 26 Apr 2017 01:32:10 -0700

At Tue, 25 Apr 2017 15:43:59 +0530, Ashutosh Bapat 
<ashutosh.ba...@enterprisedb.com> wrote in 
<cafjfpre6uf3-nu-orteaj0_ckb3mmipq5ahj_swdguv7sjs...@mail.gmail.com>
> On Tue, Apr 25, 2017 at 1:31 PM, Kyotaro HORIGUCHI
> <horiguchi.kyot...@lab.ntt.co.jp> wrote:
> >>
> >> The logs above show that 34 seconds elapsed between starting to abort
> >> the transaction and knowing that the foreign server is unreachable. It
> >> looks like it took that much time for the local server to realise that
> >> the foreign server is not reachable. Looking at PQcancel code, it
> >> seems to be trying to connect to the foreign server to cancel the
> >> query. But somehow it doesn't seem to honor connect_timeout setting.
> >> Is that expected?
> >
> > Yes, and No. I think PQcancel requires connection timeout, but I
> > think it is not necessariry the same with that of a foreign
> > server.
> 
> Since connect_timeout is property of foreign server, it should be
> honored by any connection made to that server from local server
> including the one by PQcancel().


The two connections have different characteristics. A query
(client) connection should be finally established. On the other
we may give up to establish a cancel connection in certain cases.
For example, when a network stall is anticipated, we can rather
choose to trash the session on it than waiting for the timeout.

> >> Irrespective of what PQcancel does, it looks like postgres_fdw should
> >> just slam the connection if query is being aborted because of
> >> statement_timeout. But then pgfdw_xact_callback() doesn't seem to have
> >> a way to know whether this ABORT is because of user's request to
> >> cancel the query, statement timeout, an abort because of some other
> >> error or a user requested abort. Except statement timeout (may be
> >> user's request to cancel the query?), it should try to keep the
> >> connection around to avoid any future reconnection. But I am not able
> >> to see how can we provide that information to pgfdw_xact_callback().
> >
> > Expiration of statement_timeout doesn't mean a stall of foreign
> > connections. If we are to keep connections by, for example, a
> > cancel request from client, we also should keep them on
> > statememt_timeout because it is not necessariry triggered by a
> > stall of foreign connection.
> 
> When statement_timeout completes, we don't want to spend more time in
> trying to cancel queries: esp when there are many foreign server, each
> consuming some "timeout" time OR even trying to send Abort transaction
> statement. Instead, we should slam those down. I consider this to be
> different from query cancellation since query cancellation doesn't
> have a hard bound on time, although we would like to cancel the
> running query as fast as possible. Rethinking about it, probably we
> should slam down the connection in case of query cancel as well.

Yeah, both are rather unusual, in other words, we don't expect it
to happen twice or more in a short interval. With that
prerequisite, I think slamming all down is optimal. But
defenitely not for the commanded rollback case.

> > I think we can detect a stall of the channel where the foreign
> > connections are on by a cancel request with a very short timeout,
> > although it is a bit incorrect.
> >
> > I reconsider this problem and my proposal for this issue is as
> > the follows.
> >
> > - Foreign servers have a new options 'stall_detection_threshold'
> >   in milliseconds, maybe defaults to connect_timeout of the
> >   foreign server setting. For many foreign servers in a local
> >   network, it could be lowered to several tens of milliseconds.
> 
> A connect_timeout less than 2 seconds is not encouraged.
> https://www.postgresql.org/docs/devel/static/libpq-connect.html. So,
> we can not set stall_detection_threshold to be smaller than 2 seconds.

It seems coming from DNS lookup timeout so users can set more
short timeout. Anyway the several-tens or hundres millisecond is
an extreme example. But 2 seconds for every connection would be
too long. Anyway the resolution is 1 second..

> statement_timeout however is set in milliseconds, so 2 seconds per
> connection would be quite a lot compared to statement_timeout setting.
> Waiting to cancel query for 2 seconds when the statement_timeout
> itself is 2 seconds would mean the query would be cancelled after 4
> seconds, which is kind of funny.

Yes, we can keep connections in the cases but in turn it can
complel a user to wait for incrediblly long time for a cancel
request. We could reduce the time to wait with some complex stuff
but I don't think we can cancel queries to hundreds of foreign
servers in a second without asynchronisity or parallelism. It
seems to me unreasonablly complex.

In short, I'd like to just discard all connections by the two
kind of triggers.

I suppose that we can inform the cause to the transaction
callback as an event, but I haven't realize how
AbortCurrentTransaction can detect the cause. Letting
ProcessInterrupts set an additional global variable such like
requested_immediate_abort could work.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw

Reply via email to