I have observed the following situation a few times now (weeks or months
apart), most recently with 8.3.7.  Some postgres child process crashes.
The postmaster notices and sends SIGQUIT to all other children.  Once
all other children have exited, it would enter recovery.  But for some
reason, some children are not processing the SIGQUIT signal and are
basically just stuck.  That means the whole database system is then
stuck and won't continue without manual intervention.  If I go in
manually and SIGKILL the offending processes, everything proceeds
normally, recovery finishes, and the system is up again.

I haven't had the chance yet to analyze why the SIGQUIT signals are
getting stuck.  Be that as it may, it appears there are no provisions
for this case.  I couldn't find any documentation or previous reports on
this sort of thing.  One might imagine a feature where the postmaster
resorts to throwing SIGKILLs around after a while, similar to how init
scripts are sometimes set up.  But perhaps manual intervention is the
way to go.

Comments?


-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Reply via email to