Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw

Amit Kapila Fri, 05 May 2017 16:12:19 -0700

On Fri, May 5, 2017 at 9:32 PM, Robert Haas <[email protected]> wrote:
> On Fri, May 5, 2017 at 11:15 AM, Amit Kapila <[email protected]> wrote:
>> I am not saying to rip those changes.  Your most of the mail stresses
>> about the 30-second timeout which you have added in the patch to
>> detect timeouts during failures (rollback processing).  I have no
>> objection to that part of the patch except that still there is a race
>> condition around PQsendQuery and you indicate that it is better to
>> deal it as a separate patch to which I agree.
>
> OK.
>
>> The point of which I am
>> not in total agreement is about the failure other than the time out.
>>
>> +pgfdw_exec_cleanup_query(PGconn *conn, const char *query)
>> {
>> ..
>> + /* Get the result of the query. */
>> + if (pgfdw_get_cleanup_result(conn, endtime, &result))
>> + return false;
>> +
>> + /* Issue a warning if not successful. */
>> + if (PQresultStatus(result) != PGRES_COMMAND_OK)
>> + {
>> + pgfdw_report_error(WARNING, result, conn, true, query);
>> + return false;
>> + }
>> ..
>> }
>>
>> In the above code, if there is a failure due to timeout then it will
>> return from first statement (pgfdw_get_cleanup_result), however, if
>> there is any other failure, then it will issue a warning and return
>> false.  I am not sure if there is a need to return false in the second
>> case, because even without that it will work fine (and there is a
>> chance that it won't drop the connection), but maybe your point is
>> better to be consistent for all kind of error handling in this path.
>
> There are three possible queries that can be executed by
> pgfdw_exec_cleanup_query; let's consider them individually.
>
> 1. ABORT TRANSACTION.  If ABORT TRANSACTION fails, I think that we
> have no alternative but to close the connection.  If we don't, then
> the next local connection that tries to use this connection will
> probably also fail, because it will find the connection inside an
> aborted transaction rather than not in a transaction.  If we do close
> the connection, the next transaction will try to reconnect and
> everything will be fine.
>
> 2. DEALLOCATE ALL.  If DEALLOCATE ALL fails, we do not have to close
> the connection, but the reason why we're running that statement in the
> first place is because we've potentially lost track of which
> statements are prepared on the remote side.  If we don't drop the
> connection, we'll still be confused about that.  Reconnecting will fix
> it.
>


There is a comment in the code which explains why currently we ignore
errors from DEALLOCATE ALL.

* DEALLOCATE ALL only exists in 8.3 and later, so this
* constrains how old a server postgres_fdw can
* communicate with.  We intentionally ignore errors in
* the DEALLOCATE, so that we can hobble along to some
* extent with older servers (leaking prepared statements
* as we go; but we don't really support update operations
* pre-8.3 anyway).

It is not entirely clear to me whether it is only for Commit case or
in general.  From the code, it appears that the comment holds true for
both commit and abort.  If it is for both, then after this patch, the
above comment will not be valid and you might want to update it in
case we want to proceed with your proposed changes for DEALLOCATE ALL
statement.

> 3. ROLLBACK TO SAVEPOINT %d; RELEASE SAVEPOINT %d.  If this fails, the
> local toplevel transaction is doomed.  It would be wrong to try to
> commit on the remote side, because there could be remote changes made
> by the aborted subtransaction which must not survive, and the ROLLBACK
> TO SAVEPOINT %d command is essential in order for us to get rid of
> them, and that has already failed.  So we must eventually abort the
> remote transaction, which we can do either by closing the connection
> or by trying to execute ABORT TRANSACTION.  The former, however, is
> quick, and the latter is very possibly going to involve a long hang,
> so closing the connection seems better.
>
> In all of these cases, the failure of one of these statements seems to
> me to *probably* mean that the remote side is just dead, in which case
> dropping the connection is the only option.  But even if the remote
> side is still alive and the statement failed for some other reason --
> say somebody changed kwlist.h so that ABORT now has to be spelled
> ABURT -- returning true rather than false just causes us to end up
> with a remote connection whose state is out of step with the local
> server state, and such a connection is not going to be usable.  I
> think part of your point is that in case 3, we might still get back to
> a usable connection if ABORT TRANSACTION is successfully executed
> later,
>

Yes.

> but that's not very likely and, even if it would have happened,
> reconnecting instead doesn't really leave us any worse off.
>

I see your point.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw

Reply via email to