On Fri, May 5, 2017 at 1:01 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> I have tried to verify above point and it seems to me in such a
> situation the transaction status will be either PQTRANS_INTRANS or
> PQTRANS_INERROR.  I have tried to mimic "out of memory" failure by
> making PQmakeEmptyPGresult return NULL (malloc failure).  AFAIU, it
> will make the state as PQTRANS_IDLE only when the statement is
> executed successfully.

Hrm.  OK, you might be right, but still I don't see why having the 30
second timeout is bad.  People don't want a transaction rollback to
hang for a long period of time waiting for ABORT TRANSACTION to run if
we could've just dropped the connection to get the same effect.  Sure,
the next transaction will then have to reconnect, but that takes a few
tens of milliseconds; if you wait for one extra second to avoid that
outcome, you come out behind even if you do manage to avoid the
reconnect.  Now for a subtransaction you could argue that it's worth
being more patient, because forcing a toplevel abort sucks.  But what
are the chances that, after failing to respond to a ROLLBACK; RELEASE
SAVEPOINT within 30 seconds, the remote side is going to wake up again
and start behaving normally?  I'd argue that it's not terribly likely
and most people aren't going to want to wait for multiple minutes to
see whether it does, but I suppose we could impose the 30 second
timeout only at the toplevel and let subtransactions wait however long
the operating system takes to time out.

>>> Also, does it matter if pgfdw_subxact_callback fails
>>> during execution of "ROLLBACK TO SAVEPOINT", because anyway user needs
>>> to execute rollback to savepoint command before proceeding and that
>>> should be good enough?
>> I don't understand this.  If pgfdw_subxact_callback doesn't roll the
>> remote side back to the savepoint, how is it going to get done later?
>> There's no code in postgres_fdw other than that code to issue the
>> rollback.
> It is done via toplevel transaction ABORT.  I think how it works is
> after "ROLLBACK TO SAVEPOINT" failure, any next execution is going to
> fail.  Let us say if we issue Commit after failure, it will try to
> execute RELEASE SAVEPOINT which will fail and lead to toplevel
> transaction ABORT.  Now if the toplevel transaction ABORT also fails,
> it will drop the connection.

Mmmph.  I guess that would work, but I think it's better to track it
explicitly.  I tried this (bar is a foreign table):

select * from bar;
savepoint s1;
select * from bar;
savepoint s2;
select * from bar;
savepoint s3;
select * from bar;

On the remote side, the commit statement produces this:

2017-05-05 09:28:52.597 EDT [80263] LOG:  statement: RELEASE SAVEPOINT s4
2017-05-05 09:28:52.597 EDT [80263] LOG:  statement: RELEASE SAVEPOINT s3
2017-05-05 09:28:52.597 EDT [80263] LOG:  statement: RELEASE SAVEPOINT s2
2017-05-05 09:28:52.597 EDT [80263] LOG:  statement: COMMIT TRANSACTION

But that's obviously silly.  Somebody could easily come along and
optimize that by getting rid of the RELEASE SAVEPOINT commands which
are completely unnecessary if we're going to commit the toplevel
transaction anyway, and then the argument that you're making wouldn't
be true any more.  It seems smart to me to explicitly track whether
the remote side is known to be in a bad state rather than relying on
the future sequence of commands to be such that we'll get the right
result anyway.

> Agreed, in such a situation toplevel transaction abort is the only way
> out.  However, it seems to me that will anyway happen in current code
> even if we don't try to force it via abort_cleanup_failure.  I
> understand that it might be better to force it as you have done in the
> patch, but not sure if it is good to drop the connection where
> previously it could have done without it (for ex. if the first
> statement Rollback To Savepoint fails, but ABORT succeeds).  I feel it
> is worth considering whether such a behavior difference is okay as you
> have proposed to backpatch it.

Well, I think the whole point of this patch is that if one command on
a connection fails, the chances of future commands succeeding are
pretty poor.  It's not impossible, but it's not very likely either.
To recap, the problem at present is that if network connectivity is
interrupted and the statement is then killed by statement_timeout or a
local cancel request, we try to send a cancel request to the remote
side, which takes a while to time out.  Even if that fails, we then
send an ABORT TRANSACTION, hoping against hope that it will succeed
despite the failure of the cancel request.  The user ends up waiting
for the cancel request to time out and then for the ABORT TRANSACTION
to time out, which takes 15-16 minutes no matter how you set the
keepalive settings and no matter how many times you hit ^C on the
local side.

Now, if you want to allow for the possibility that some future
operation might succeed never mind past failures, then you're
basically saying that trying the ABORT TRANSACTION even though the
query cancel has failed is the correct behavior.  And if you don't
want the ABORT TRANSACTION to have a 30 second timeout either, then
you're saying we should wait however long it takes the operating
system to detect the failure, which again is the current behavior.  If
we rip those things out of the patch, then there's not much left of
the patch.  The only thing that would left would be making the wait
for ABORT TRANSACTION interruptible, so that if you hit ^C *again*
after you've already canceled the query, we then give up on waiting
for the ABORT TRANSACTION.  I guess that would be an improvement over
the status quo, but it's not really a solution.  People don't expect
to have to send multiple cancel requests to kill the same query in a
timely fashion, and if the commands are being sent by an application
rather than by a human being it probably won't happen.

Do you want to propose a competing patch to fix this problem?  I'm
happy to listen to suggestions.  But I think if you're determined to
say "let's rely entirely on the OS timeouts rather than having any of
our own" and you're also determined to say "let's not assume that a
failure of one operation means that the connection is dead", then
you're basically arguing that there's nothing really broken here, and
I don't agree with that.  Aborting a transaction shouldn't take a
quarter of an hour.

You could perhaps argue that it's better to make only one of the two
changes I'm proposing and not both, but I don't really see it.  If you
make the abort_cleanup_failure changes but don't apply the 30 second
timeout, then suppose the cancel request works but the connection is
dead and the ABORT TRANSACTION command takes five minutes to time out.
Well, if you'd had the timeout, you would have just closed the
connection after 30 seconds and the only thing it would have cost you
is a reconnect the next time that foreign server was used.  Instead,
you waited 5 minutes in the hope of saving a reconnect cycle that
probably would have taken tens of milliseconds or less.  Blech.  On
the other hand, if you make the 30 second timeout changes but not the
abort_cleanup_failure changes, then suppose network connectivity is
entirely interrupted.  Then you'll wait however long it takes for the
cancel to succeed plus another 30 seconds for the ABORT TRANSACTION,
whereas with the patch as proposed you only wait for the first one.
Also blech.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to