Tom Lane wrote:
2) if a SIGTERM happens to arrive while btbulkdelete is running,
the next CHECK_FOR_INTERRUPTS will do elog(FATAL), causing elog.c
to do proc_exit(0), leaving the vacuum still recorded as active in
the shared memory array maintained by _bt_start_vacuum/_bt_end_vacuum.
The PG_TRY block in btbulkdelete doesn't get a chance to clean up.

I skimmed through all users of PG_TRY/CATCH in the backend to check if there's other problems like that looming. There's one that looks dangerous in pg_start_backup() in xlog.c. forcePageWrites flag in shared memory is cleared in a PG_CATCH block. It's not as severe, though, as it can be cleared manually by calling pg_stop_backup(), and only leads to degraded performance.

(3) eventually, either we try to re-vacuum the same index or
accumulation of bogus active entries overflows the array.
Either way, _bt_start_vacuum throws an error, which btbulkdelete
PG_CATCHes, leading to_bt_end_vacuum trying to re-acquire the LWLock
already taken by _bt_start_vacuum, meaning that the process hangs up.
And then so does anything else that needs to take that LWLock...

I also looked for other occurances of point (3), but couldn't find any, so I guess we're now safe from it.

Point (3) is already fixed in CVS, but point (2) is a lot nastier.
What it essentially says is that trying to clean up shared-memory
state in a PG_TRY block is unsafe: you can't be certain you'll
get to do it.  Now this is not a big deal during normal SIGTERM or
SIGQUIT database shutdown, because we're going to abandon the shared
memory segment anyway.  However, if we ever want to support individual
session kill via SIGTERM, it's a problem.  Even if we were not
interested in someday considering that a supported feature, it seems
that dealing with random SIGTERMs is needed for robustness in at least
some environments.

Agreed. We should do our best to be safe from SIGTERMs, even if we don't consider it supported.

AFAICS, there are basically two ways we might try to approach this:

Plan A: establish the rule that you mustn't try to clean up shared
memory state in a PG_CATCH block.  Anything you need to do like that
has to be handled by an on_shmem_exit hook function, so it will be
called during a FATAL exit.  (Or maybe you can do it in PG_CATCH for
normal ERROR cases, but you need a backing on_shmem_exit hook to
clean up for FATAL.)

Plan B: change the handling of FATAL errors so that they are thrown
like normal errors, and the proc_exit call happens only when we get
out to the outermost control level in postgres.c.  This would mean
that PG_CATCH blocks get a chance to clean up before the FATAL exit
happens.  The problem with that is that a non-cooperative PG_CATCH
block might think it could "recover" from the error, and then the exit
does not happen at all.  We'd need a coding rule that PG_CATCH blocks
*must* re-throw FATAL errors, which seems at least as ugly as Plan A.
In particular, all three of the external-interpreter PLs are willing
to return errors into the external interpreter, and AFAICS we'd be
entirely at the mercy of the user-written Perl or Python or Tcl code
whether it re-throws the error or not.

So Plan B seems unacceptably fragile.  Does anyone see a way to fix it,
or perhaps a Plan C with a totally different idea?  Plan A seems pretty
ugly but it's the best I can come up with.

Yeah, plan A seems like the way to go.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Reply via email to