At Tue, 1 Feb 2022 23:51:54 +0900, Fujii Masao <masao.fu...@oss.nttdata.com> wrote in > This logic sounds complicated to me. I'm afraid that FDW developers > may a bit easily misunderstand the logic and make the bug in their > FDW.
> Isn't it simpler to just disable the timeout in core whenever the > transaction ends whether committed or aborted, like statement_timeout > is disabled after each command? For example, call something like > DisableForeignCheckTimeout() in CommitTransaction() etc. > This approach seems to assume that FDW must manage all the > ForeignServer information so that the callback can return it. Is this > assumption valid for all the FDW? FWIW, I'm not sure this feature necessarily requires core support dedicated to FDWs. The core have USER_TIMEOUT feature already and FDWs are not necessarily connection based. It seems better if FDWs can implement health check feature without core support and it seems possible. Or at least the core feature should be more generic and simpler. Why don't we just expose InTransactionHealthCheckCallbacks or something and operating functions on it? > How about making FDW trigger a query cancel interrupt by signaling > SIGINT to the backend, instead? Mmm. AFAICS the running command will stop with "canceling statement due to user request", which is a hoax. We need a more decent message there. I understand that the motive of this patch is "to avoid wasted long local work when fdw-connection dies". In regard to the workload in your first mail, it is easily avoided by ending the transaction as soon as remote access ends. This feature doesn't work for the case "begin; <long local query>; <fdw access>". But the same measure also works in that case. So the only case where this feature is useful is "begin; <fdw-access>; <some long work>; <fdw-access>; end;". But in the first place how frequently do you expecting remote-connection close happens? If that happens so frequently, you might need to recheck the system health before implementing this feature. Since it is correctly detected when something really went wrong, I feel that it is a bit too complex for the usefulness especially for the core part. In conclusion, as my humble opinion I would like to propose to reduce this feature to: - Just periodically check health (in any aspect) of all live connections regardless of the session state. - If an existing connection is found to be dead, just try canceling the query (or sending query cancel). One issue with it is how to show the decent message for the query cancel, but maybe we can have a global variable that suggests the reason for the cancel. regards. -- Kyotaro Horiguchi NTT Open Source Software Center