Michael Paquier <mich...@paquier.xyz> writes: > On Fri, Aug 30, 2024 at 08:47:46PM +0000, Tom Lane wrote: >> Make postgres_fdw's query_cancel test less flaky.
> Based on this recent report from the buildfarm, there is still > something happening here: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2025-03-07%2004%3A31%3A37 > In details: > -- This would take very long if not canceled: > SELECT count(*) FROM ft1 a CROSS JOIN ft1 b CROSS JOIN ft1 c CROSS JOIN ft1 > d; > ERROR: canceling statement due to statement timeout > +WARNING: could not send cancel request: cancel request timed out > COMMIT; I only promised to make it less flaky, not zero flaky ;-). But this is an interesting failure, because it's different from the previous symptoms. Before we were getting >> +WARNING: could not get result of cancel request due to timeout but here it's failing to send the cancel request in the first place. Grepping the buildfarm logs finds one prior occurrence of this in the last three months, also on drongo: sysname | branch | snapshot | stage | l ---------+--------+---------------------+-----------------------------------------------------+---------------------------------------------------------------------- drongo | HEAD | 2025-02-01 13:35:14 | postgres_fdwInstallCheck-C | +WARNING: could not send cancel request: cancel request timed out\r drongo | HEAD | 2025-03-07 04:31:37 | postgres_fdwInstallCheck-English_United-States.1252 | +WARNING: could not send cancel request: cancel request timed out\r (2 rows) Looking at the code, we must have gotten past PQcancelStart, and then we're waiting vainly for PQcancelPoll to report success. The timeout for that is 30 seconds, so it's hard to credit that we didn't wait long enough. Something must have gone wrong that we couldn't detect. It would be interesting to know what conn->status was, so that we'd have some clue how far the cancel request got; though I have relatively little hope that that would actually yield a fix. Do you think it'd be reasonable to add PQcancelStatus() to the "timed out" message? It looks like we don't have any infrastructure for printing ConnStatusType, so we'd have to just print it as an integer; but since this ought to be a can't-happen case I think that'd be all right. regards, tom lane