At Tue, 25 Apr 2017 15:43:59 +0530, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> wrote in <cafjfpre6uf3-nu-orteaj0_ckb3mmipq5ahj_swdguv7sjs...@mail.gmail.com> > On Tue, Apr 25, 2017 at 1:31 PM, Kyotaro HORIGUCHI > <horiguchi.kyot...@lab.ntt.co.jp> wrote: > >> > >> The logs above show that 34 seconds elapsed between starting to abort > >> the transaction and knowing that the foreign server is unreachable. It > >> looks like it took that much time for the local server to realise that > >> the foreign server is not reachable. Looking at PQcancel code, it > >> seems to be trying to connect to the foreign server to cancel the > >> query. But somehow it doesn't seem to honor connect_timeout setting. > >> Is that expected? > > > > Yes, and No. I think PQcancel requires connection timeout, but I > > think it is not necessariry the same with that of a foreign > > server. > > Since connect_timeout is property of foreign server, it should be > honored by any connection made to that server from local server > including the one by PQcancel().
The two connections have different characteristics. A query (client) connection should be finally established. On the other we may give up to establish a cancel connection in certain cases. For example, when a network stall is anticipated, we can rather choose to trash the session on it than waiting for the timeout. > >> Irrespective of what PQcancel does, it looks like postgres_fdw should > >> just slam the connection if query is being aborted because of > >> statement_timeout. But then pgfdw_xact_callback() doesn't seem to have > >> a way to know whether this ABORT is because of user's request to > >> cancel the query, statement timeout, an abort because of some other > >> error or a user requested abort. Except statement timeout (may be > >> user's request to cancel the query?), it should try to keep the > >> connection around to avoid any future reconnection. But I am not able > >> to see how can we provide that information to pgfdw_xact_callback(). > > > > Expiration of statement_timeout doesn't mean a stall of foreign > > connections. If we are to keep connections by, for example, a > > cancel request from client, we also should keep them on > > statememt_timeout because it is not necessariry triggered by a > > stall of foreign connection. > > When statement_timeout completes, we don't want to spend more time in > trying to cancel queries: esp when there are many foreign server, each > consuming some "timeout" time OR even trying to send Abort transaction > statement. Instead, we should slam those down. I consider this to be > different from query cancellation since query cancellation doesn't > have a hard bound on time, although we would like to cancel the > running query as fast as possible. Rethinking about it, probably we > should slam down the connection in case of query cancel as well. Yeah, both are rather unusual, in other words, we don't expect it to happen twice or more in a short interval. With that prerequisite, I think slamming all down is optimal. But defenitely not for the commanded rollback case. > > I think we can detect a stall of the channel where the foreign > > connections are on by a cancel request with a very short timeout, > > although it is a bit incorrect. > > > > I reconsider this problem and my proposal for this issue is as > > the follows. > > > > - Foreign servers have a new options 'stall_detection_threshold' > > in milliseconds, maybe defaults to connect_timeout of the > > foreign server setting. For many foreign servers in a local > > network, it could be lowered to several tens of milliseconds. > > A connect_timeout less than 2 seconds is not encouraged. > https://www.postgresql.org/docs/devel/static/libpq-connect.html. So, > we can not set stall_detection_threshold to be smaller than 2 seconds. It seems coming from DNS lookup timeout so users can set more short timeout. Anyway the several-tens or hundres millisecond is an extreme example. But 2 seconds for every connection would be too long. Anyway the resolution is 1 second.. > statement_timeout however is set in milliseconds, so 2 seconds per > connection would be quite a lot compared to statement_timeout setting. > Waiting to cancel query for 2 seconds when the statement_timeout > itself is 2 seconds would mean the query would be cancelled after 4 > seconds, which is kind of funny. Yes, we can keep connections in the cases but in turn it can complel a user to wait for incrediblly long time for a cancel request. We could reduce the time to wait with some complex stuff but I don't think we can cancel queries to hundreds of foreign servers in a second without asynchronisity or parallelism. It seems to me unreasonablly complex. In short, I'd like to just discard all connections by the two kind of triggers. I suppose that we can inform the cause to the transaction callback as an event, but I haven't realize how AbortCurrentTransaction can detect the cause. Letting ProcessInterrupts set an additional global variable such like requested_immediate_abort could work. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers