At Tue, 18 Apr 2017 09:12:07 +0530, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> wrote in <cafjfprcrjohuzacnon1zkp7c6fo2qd5hnfhct6t_39sgts_...@mail.gmail.com> > On Tue, Apr 18, 2017 at 8:12 AM, Kyotaro HORIGUCHI > <horiguchi.kyot...@lab.ntt.co.jp> wrote: > > At Mon, 17 Apr 2017 17:50:58 +0530, Ashutosh Bapat > > <ashutosh.ba...@enterprisedb.com> wrote in > > <cafjfprdcww4h0a-zrl-eiaekkpj8o0gr2m1fwz1usesrfrm...@mail.gmail.com> > >> On Mon, Apr 17, 2017 at 1:53 PM, Kyotaro HORIGUCHI > >> <horiguchi.kyot...@lab.ntt.co.jp> wrote: > >> > At Thu, 13 Apr 2017 13:04:12 -0400, Robert Haas <robertmh...@gmail.com> > >> > wrote in > >> > <CA+TgmoaxnNmuONgP=bxjojrgbnmpti6ms8oswzbc2yq2ueu...@mail.gmail.com> > >> >> On Thu, Apr 21, 2016 at 10:53 AM, Robert Haas <robertmh...@gmail.com> > >> >> wrote: > >> >> > On Thu, Apr 21, 2016 at 8:44 AM, Michael Paquier > >> >> > <michael.paqu...@gmail.com> wrote: > >> >> >> On Thu, Apr 21, 2016 at 5:22 PM, Etsuro Fujita > >> >> >> <fujita.ets...@lab.ntt.co.jp> wrote: > >> >> >>> Attached is an updated version of the patch, which modified > >> >> >>> Michael's > >> >> >>> version of the patch, as I proposed in [1] (see "Other changes:"). > >> >> >>> I > >> >> >>> modified comments for pgfdw_get_result/pgfdw_exec_query also, > >> >> >>> mainly because > >> >> >>> words like "non-blocking mode" there seems confusing (note that we > >> >> >>> have > >> >> >>> PQsetnonbloking). > >> >> >> > >> >> >> OK, so that is what I sent except that the comments mentioning PG_TRY > >> >> >> are moved to their correct places. That's fine for me. Thanks for > >> >> >> gathering everything in a single patch and correcting it. > >> >> > > >> >> > I have committed this patch. Thanks for working on this. Sorry for > >> >> > the delay. > >> >> > >> >> This 9.6-era patch, as it turns out, has a problem, which is that we > >> >> now respond to an interrupt by sending a cancel request and a > >> >> NON-interruptible ABORT TRANSACTION command to the remote side. If > >> >> the reason that the user is trying to interrupt is that the network > >> >> connection has been cut, they interrupt the original query only to get > >> >> stuck in a non-interruptible wait for ABORT TRANSACTION. That is > >> >> non-optimal. > >> > > >> > Agreed. > >> > > >> >> It is not exactly clear to me how to fix this. Could we get by with > >> >> just slamming the remote connection shut, instead of sending an > >> >> explicit ABORT TRANSACTION? The remote side ought to treat a > >> >> disconnect as equivalent to an ABORT anyway, I think, but maybe our > >> >> local state would get confused. (I have not checked.) > >> >> > >> >> Thoughts? > >> > > >> > Perhaps we will get stuck at query cancellation before ABORT > >> > TRANSACTION in the case. A connection will be shut down when > >> > anything wrong (PQstatus(conn) != CONNECTION_OK and so) on > >> > aborting local transactoin . So I don't think fdw gets confused > >> > or sholdn't be confused by shutting down there. > >> > > >> > The most significant issue I can see is that the same thing > >> > happens in the case of graceful ABORT TRANSACTION. It could be a > >> > performance issue. > >> > > >> > We could set timeout here but maybe we can just slamming the > >> > connection down instead of sending a query cancellation. It is > >> > caused only by timeout or interrupts so I suppose it is not a > >> > problem *with a few connections*. > >> > > >> > > >> > Things are a bit diffent with hundreds of connections. The > >> > penalty of reconnection would be very high in the case. > >> > > >> > If we are not willing to pay such high penalty, maybe we are to > >> > manage busy-idle time of each connection and trying graceful > >> > abort if it is short enough, maybe having a shoft timeout. > >> > > >> > Furthermore, if most or all of the hundreds of connections get > >> > stuck, such timeout will accumulate up like a mountain... > >> > >> Even when the transaction is aborted because a user cancels a query, > >> we do want to preserve the connection, if possible, to avoid > > > > Yes. > > > >> reconnection. If the request to cancel the query itself fails, we > >> should certainly drop the connection. Here's the patch to do that. > > > > A problem I think on this would be that we still try to make > > another connection for canceling and it would stall for several > > minutes per connection on a packet stall, which should be done in > > a second on ordinary circumstances. Perhaps we might want here is > > async canceling with timeout. > > I am not sure what do you mean "make another connection for > cancelling.". Do you mean to say that PQcancel would make another > connection?
Yes. It will take about 3 minutes on standard settings when no ACK returned. I thought that this discussion is based on such situation. > The patch proposed simply improves the condition when PQcancel has > returned a failure. Right now we ignore that failure and try to ABORT > the transaction, which is most probably going to get stalled or fail > to be dispatched. With the patch such a connection will be reset. Ah, I understand that. It is surely an improvement since it avoids useless ABORT TRANSACTION that is known to stall. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers