On Fri, May 5, 2017 at 1:01 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: > I have tried to verify above point and it seems to me in such a > situation the transaction status will be either PQTRANS_INTRANS or > PQTRANS_INERROR. I have tried to mimic "out of memory" failure by > making PQmakeEmptyPGresult return NULL (malloc failure). AFAIU, it > will make the state as PQTRANS_IDLE only when the statement is > executed successfully.
Hrm. OK, you might be right, but still I don't see why having the 30 second timeout is bad. People don't want a transaction rollback to hang for a long period of time waiting for ABORT TRANSACTION to run if we could've just dropped the connection to get the same effect. Sure, the next transaction will then have to reconnect, but that takes a few tens of milliseconds; if you wait for one extra second to avoid that outcome, you come out behind even if you do manage to avoid the reconnect. Now for a subtransaction you could argue that it's worth being more patient, because forcing a toplevel abort sucks. But what are the chances that, after failing to respond to a ROLLBACK; RELEASE SAVEPOINT within 30 seconds, the remote side is going to wake up again and start behaving normally? I'd argue that it's not terribly likely and most people aren't going to want to wait for multiple minutes to see whether it does, but I suppose we could impose the 30 second timeout only at the toplevel and let subtransactions wait however long the operating system takes to time out. >>> Also, does it matter if pgfdw_subxact_callback fails >>> during execution of "ROLLBACK TO SAVEPOINT", because anyway user needs >>> to execute rollback to savepoint command before proceeding and that >>> should be good enough? >> >> I don't understand this. If pgfdw_subxact_callback doesn't roll the >> remote side back to the savepoint, how is it going to get done later? >> There's no code in postgres_fdw other than that code to issue the >> rollback. >> > It is done via toplevel transaction ABORT. I think how it works is > after "ROLLBACK TO SAVEPOINT" failure, any next execution is going to > fail. Let us say if we issue Commit after failure, it will try to > execute RELEASE SAVEPOINT which will fail and lead to toplevel > transaction ABORT. Now if the toplevel transaction ABORT also fails, > it will drop the connection. Mmmph. I guess that would work, but I think it's better to track it explicitly. I tried this (bar is a foreign table): begin; select * from bar; savepoint s1; select * from bar; savepoint s2; select * from bar; savepoint s3; select * from bar; commit; On the remote side, the commit statement produces this: 2017-05-05 09:28:52.597 EDT  LOG: statement: RELEASE SAVEPOINT s4 2017-05-05 09:28:52.597 EDT  LOG: statement: RELEASE SAVEPOINT s3 2017-05-05 09:28:52.597 EDT  LOG: statement: RELEASE SAVEPOINT s2 2017-05-05 09:28:52.597 EDT  LOG: statement: COMMIT TRANSACTION But that's obviously silly. Somebody could easily come along and optimize that by getting rid of the RELEASE SAVEPOINT commands which are completely unnecessary if we're going to commit the toplevel transaction anyway, and then the argument that you're making wouldn't be true any more. It seems smart to me to explicitly track whether the remote side is known to be in a bad state rather than relying on the future sequence of commands to be such that we'll get the right result anyway. > Agreed, in such a situation toplevel transaction abort is the only way > out. However, it seems to me that will anyway happen in current code > even if we don't try to force it via abort_cleanup_failure. I > understand that it might be better to force it as you have done in the > patch, but not sure if it is good to drop the connection where > previously it could have done without it (for ex. if the first > statement Rollback To Savepoint fails, but ABORT succeeds). I feel it > is worth considering whether such a behavior difference is okay as you > have proposed to backpatch it. Well, I think the whole point of this patch is that if one command on a connection fails, the chances of future commands succeeding are pretty poor. It's not impossible, but it's not very likely either. To recap, the problem at present is that if network connectivity is interrupted and the statement is then killed by statement_timeout or a local cancel request, we try to send a cancel request to the remote side, which takes a while to time out. Even if that fails, we then send an ABORT TRANSACTION, hoping against hope that it will succeed despite the failure of the cancel request. The user ends up waiting for the cancel request to time out and then for the ABORT TRANSACTION to time out, which takes 15-16 minutes no matter how you set the keepalive settings and no matter how many times you hit ^C on the local side. Now, if you want to allow for the possibility that some future operation might succeed never mind past failures, then you're basically saying that trying the ABORT TRANSACTION even though the query cancel has failed is the correct behavior. And if you don't want the ABORT TRANSACTION to have a 30 second timeout either, then you're saying we should wait however long it takes the operating system to detect the failure, which again is the current behavior. If we rip those things out of the patch, then there's not much left of the patch. The only thing that would left would be making the wait for ABORT TRANSACTION interruptible, so that if you hit ^C *again* after you've already canceled the query, we then give up on waiting for the ABORT TRANSACTION. I guess that would be an improvement over the status quo, but it's not really a solution. People don't expect to have to send multiple cancel requests to kill the same query in a timely fashion, and if the commands are being sent by an application rather than by a human being it probably won't happen. Do you want to propose a competing patch to fix this problem? I'm happy to listen to suggestions. But I think if you're determined to say "let's rely entirely on the OS timeouts rather than having any of our own" and you're also determined to say "let's not assume that a failure of one operation means that the connection is dead", then you're basically arguing that there's nothing really broken here, and I don't agree with that. Aborting a transaction shouldn't take a quarter of an hour. You could perhaps argue that it's better to make only one of the two changes I'm proposing and not both, but I don't really see it. If you make the abort_cleanup_failure changes but don't apply the 30 second timeout, then suppose the cancel request works but the connection is dead and the ABORT TRANSACTION command takes five minutes to time out. Well, if you'd had the timeout, you would have just closed the connection after 30 seconds and the only thing it would have cost you is a reconnect the next time that foreign server was used. Instead, you waited 5 minutes in the hope of saving a reconnect cycle that probably would have taken tens of milliseconds or less. Blech. On the other hand, if you make the 30 second timeout changes but not the abort_cleanup_failure changes, then suppose network connectivity is entirely interrupted. Then you'll wait however long it takes for the cancel to succeed plus another 30 seconds for the ABORT TRANSACTION, whereas with the patch as proposed you only wait for the first one. Also blech. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers