Michael Paquier <mich...@paquier.xyz> writes: > Based on the way the code is written on HEAD, this would be the > correct assumption. Now, calling PQgetCancel() would return NULL for > a connection that we actually ignore in the code a couple of lines > above when it has PGINVALID_SOCKET. So it seems to me that the > suggestion of using "cancelconn", which would be the first valid > connection, rather than always the first connection, which may be > using an invalid socket, is correct, so as we always have our hands > on a way to cancel a command.
I came across this commit (52144b6fc) while writing release notes, and I have to seriously question whether it's right yet. The thing that needs to be asked is, if we get a SIGINT in a program using this logic, why would we propagate a cancel to just one of the controlled sessions and not all of them? It looks to me like the original concept was that slot zero would be a "master" connection, such that canceling just that one would have a useful effect. Maybe the current users of parallel_slot.c still use it like that, but I bet it's more likely that the connections are all doing independent things and you really gotta cancel them all if you want out. I suppose maybe this commit improved matters: if you are running N jobs then typing control-C N times (not too quickly) might eventually get you out, by successively canceling the lowest-numbered surviving connection. Previously you could have pounded the key all day and not gotten rid of any but the zero'th task. OTOH, if the connections don't exit but just go back to idle, which seems pretty possible, then it's not clear we've moved the needle at all. Anyway I think this needs rewritten, not just tweaked. The cancel.c infrastructure is really next to useless here since it is only designed with one connection in mind. I'm inclined to think we should only expect the signal handler to set CancelRequested, and then manually issue cancels to all live connections when we see that become set. I'm not proposing reverting 52144b6fc, because I doubt it made anything worse; but I'm thinking of leaving it out of the release notes, because I'm unsure it had any user-visible effect at all. It doesn't look to me like we'd ever get to wait_on_slots unless all the connections are known busy. regards, tom lane