On 28.02.2011 14:04, Nikhil Sontakke wrote:
I believe we have a case where not holding off interrupts while doing a malloc() can cause a deadlock due to system or libc level locking. In this case, a pg_ctl stop in fast mode was resorted to and that caused a backend to handle the interrupt when it was inside the malloc call. Now as part of the abort processing, in the subtransaction cleanup code path, this same backend tried to clear memory contexts, leading to an eventual free() call. The free() call tried to take the same lock which was already held by malloc() earlier resulting into a deadlock!
Our signal handlers shouldn't try to do anything that complicated. die(), which handles SIGTERM caused by fast shutdown in backends, doesn't do abort processing itself. It just sets a global variable.
Unless ImmediateInterruptOK is set, but it's only set around a few blocking system calls where it is safe to do so. (Checks...) Actually, md5_crypt_verify() looks suspicious, it does "ImmediateInterruptOK = true", and then calls palloc() and pfree().
Will try to get the call stack if needed.
Yes, please. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers