On Fri, May 5, 2017 at 11:15 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> I am not saying to rip those changes.  Your most of the mail stresses
> about the 30-second timeout which you have added in the patch to
> detect timeouts during failures (rollback processing).  I have no
> objection to that part of the patch except that still there is a race
> condition around PQsendQuery and you indicate that it is better to
> deal it as a separate patch to which I agree.


> The point of which I am
> not in total agreement is about the failure other than the time out.
> +pgfdw_exec_cleanup_query(PGconn *conn, const char *query)
> {
> ..
> + /* Get the result of the query. */
> + if (pgfdw_get_cleanup_result(conn, endtime, &result))
> + return false;
> +
> + /* Issue a warning if not successful. */
> + if (PQresultStatus(result) != PGRES_COMMAND_OK)
> + {
> + pgfdw_report_error(WARNING, result, conn, true, query);
> + return false;
> + }
> ..
> }
> In the above code, if there is a failure due to timeout then it will
> return from first statement (pgfdw_get_cleanup_result), however, if
> there is any other failure, then it will issue a warning and return
> false.  I am not sure if there is a need to return false in the second
> case, because even without that it will work fine (and there is a
> chance that it won't drop the connection), but maybe your point is
> better to be consistent for all kind of error handling in this path.

There are three possible queries that can be executed by
pgfdw_exec_cleanup_query; let's consider them individually.

have no alternative but to close the connection.  If we don't, then
the next local connection that tries to use this connection will
probably also fail, because it will find the connection inside an
aborted transaction rather than not in a transaction.  If we do close
the connection, the next transaction will try to reconnect and
everything will be fine.

2. DEALLOCATE ALL.  If DEALLOCATE ALL fails, we do not have to close
the connection, but the reason why we're running that statement in the
first place is because we've potentially lost track of which
statements are prepared on the remote side.  If we don't drop the
connection, we'll still be confused about that.  Reconnecting will fix

local toplevel transaction is doomed.  It would be wrong to try to
commit on the remote side, because there could be remote changes made
by the aborted subtransaction which must not survive, and the ROLLBACK
TO SAVEPOINT %d command is essential in order for us to get rid of
them, and that has already failed.  So we must eventually abort the
remote transaction, which we can do either by closing the connection
or by trying to execute ABORT TRANSACTION.  The former, however, is
quick, and the latter is very possibly going to involve a long hang,
so closing the connection seems better.

In all of these cases, the failure of one of these statements seems to
me to *probably* mean that the remote side is just dead, in which case
dropping the connection is the only option.  But even if the remote
side is still alive and the statement failed for some other reason --
say somebody changed kwlist.h so that ABORT now has to be spelled
ABURT -- returning true rather than false just causes us to end up
with a remote connection whose state is out of step with the local
server state, and such a connection is not going to be usable.  I
think part of your point is that in case 3, we might still get back to
a usable connection if ABORT TRANSACTION is successfully executed
later, but that's not very likely and, even if it would have happened,
reconnecting instead doesn't really leave us any worse off.

> No, I am not saying any of those.  I hope the previous point clarifies
> what I want to say.

Apologies, but I still cannot see quite what you are getting at.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to