I am currently looking at a frozen system: a backend crashed during XLOG write (which I was deliberately provoking, via running it out of disk space), and now the postmaster is unable to recover because it's waiting around for a checkpoint process that it had launched milliseconds before the crash. The checkpoint process, unfortunately, is not going to quit anytime soon because it's hung up trying to get a spinlock that the crashing backend left locked. Eventually the checkpoint process will time out the spinlock and abort (but please note that this is true only because I insisted --- Vadim wanted to have infinite timeouts on the WAL spinlocks. I think this is good evidence that that's a bad idea). However, while sitting here looking at it I can't help wondering whether the checkpoint process shouldn't have responded to the SIGTERM that the postmaster sent it when the other backend crashed. Is it really such a good idea for the checkpoint process to ignore SIGTERM? While we're at it: is it really such a good idea to use elog(STOP) all over the place in the WAL stuff? If XLogFileInit had chosen to exit with elog(FATAL), then we would have released the spinlock on the way out of the failing backend, and the checkpointer wouldn't be stuck. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html