We're using SIGQUIT to signal immediate shutdown request. Upon receiving SIGQUIT, postmaster in turn kills all the child processes with SIGQUIT and exits.

This is a problem when child processes use system(3) to call other programs. We use system(3) in two places: to execute archive_command and restore_command. Fujii Masao identified this with pg_standby back in November:

http://archives.postgresql.org/message-id/3f0b79eb0811280156s78a3730en73aca49b6e95d...@mail.gmail.com
and recently discussed here
http://archives.postgresql.org/message-id/3f0b79eb0902260919l2675aaafq10e5b2d49ebfa...@mail.gmail.com

I'm starting a new thread to bring this to attention of those who haven't been following the hot standby stuff. pg_standby has a particular problem because it traps SIGQUIT to mean "end recovery, promote standby to master", which it shouldn't do IMHO. But ignoring that for a moment, the problem is generic.

SIGQUIT by default dumps core. That's not what we want to happen on immediate shutdown. All PostgreSQL processes trap SIGQUIT to exit immediately instead, but external commands will dump core. system(3) ignores SIGQUIT, so we can't trap it in the parent process; it is always relayed to the child.

There's a few options on how to fix that:

1. Implement a custom version of system(3) using fork+exec that let's us trap SIGQUIT and send e.g SIGTERM or SIGINT to the child instead. It might be a bit tricky to get this right in a portable way; Windows would certainly need a completely separate implementation.

2. Use a signal other than SIGQUIT for immediate shutdown of child processes. We can't change the signal sent to postmaster for backwards-compatibility reasons, but the signal sent by postmaster to child processes we could change. We've already used all signals in normal backends, but perhaps we could rearrange them.

3. Use SIGINT instead of SIGQUIT for immediate shutdown of the two child processes that use system(3): the archiver process and the startup process. Neither of them use SIGINT currently. SIGINT is ignored by system(3), like SIGQUIT, but the default action is to terminate the process rather than core dump. Unfortunately pg_standby traps SIGINT too to mean "promote to master", but we could change it to use SIGUSR1 instead for that purpose. If someone has a script that uses "killall -INT pg_standby" to promote a standby server to master, it would need to be changed. Looking at the manual page of pg_standby, however, it seems that the kill-method of triggering a promotion isn't documented, so with a notice in release notes we could do that.

I'm leaning towards option 3, but I wonder if anyone sees a better solution.

This is all for CVS HEAD. In back-branches, I think we should just remove the signal handler for SIGQUIT from pg_standby and leave it at that. If you perform an immediate shutdown, you can get a core dump from archive_command or restore_command, but that's a minor inconvenience.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to