Tom Lane wrote:
2) if a SIGTERM happens to arrive while btbulkdelete is running, the next CHECK_FOR_INTERRUPTS will do elog(FATAL), causing elog.c to do proc_exit(0), leaving the vacuum still recorded as active in the shared memory array maintained by _bt_start_vacuum/_bt_end_vacuum. The PG_TRY block in btbulkdelete doesn't get a chance to clean up.
I skimmed through all users of PG_TRY/CATCH in the backend to check if there's other problems like that looming. There's one that looks dangerous in pg_start_backup() in xlog.c. forcePageWrites flag in shared memory is cleared in a PG_CATCH block. It's not as severe, though, as it can be cleared manually by calling pg_stop_backup(), and only leads to degraded performance.
(3) eventually, either we try to re-vacuum the same index or accumulation of bogus active entries overflows the array. Either way, _bt_start_vacuum throws an error, which btbulkdelete PG_CATCHes, leading to_bt_end_vacuum trying to re-acquire the LWLock already taken by _bt_start_vacuum, meaning that the process hangs up. And then so does anything else that needs to take that LWLock...
I also looked for other occurances of point (3), but couldn't find any, so I guess we're now safe from it.
Point (3) is already fixed in CVS, but point (2) is a lot nastier. What it essentially says is that trying to clean up shared-memory state in a PG_TRY block is unsafe: you can't be certain you'll get to do it. Now this is not a big deal during normal SIGTERM or SIGQUIT database shutdown, because we're going to abandon the shared memory segment anyway. However, if we ever want to support individual session kill via SIGTERM, it's a problem. Even if we were not interested in someday considering that a supported feature, it seems that dealing with random SIGTERMs is needed for robustness in at least some environments.
Agreed. We should do our best to be safe from SIGTERMs, even if we don't consider it supported.
AFAICS, there are basically two ways we might try to approach this: Plan A: establish the rule that you mustn't try to clean up shared memory state in a PG_CATCH block. Anything you need to do like that has to be handled by an on_shmem_exit hook function, so it will be called during a FATAL exit. (Or maybe you can do it in PG_CATCH for normal ERROR cases, but you need a backing on_shmem_exit hook to clean up for FATAL.) Plan B: change the handling of FATAL errors so that they are thrown like normal errors, and the proc_exit call happens only when we get out to the outermost control level in postgres.c. This would mean that PG_CATCH blocks get a chance to clean up before the FATAL exit happens. The problem with that is that a non-cooperative PG_CATCH block might think it could "recover" from the error, and then the exit does not happen at all. We'd need a coding rule that PG_CATCH blocks *must* re-throw FATAL errors, which seems at least as ugly as Plan A. In particular, all three of the external-interpreter PLs are willing to return errors into the external interpreter, and AFAICS we'd be entirely at the mercy of the user-written Perl or Python or Tcl code whether it re-throws the error or not. So Plan B seems unacceptably fragile. Does anyone see a way to fix it, or perhaps a Plan C with a totally different idea? Plan A seems pretty ugly but it's the best I can come up with.
Yeah, plan A seems like the way to go. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq