Hi Andres,

> Widening the perspective a bit, our current approach for such
> error-handling /
> shutdown functions seems ... not maintainable. In particular we have a
> substantial number of top-level sigsetjmp() handlers that have slightly
> different recovery codepaths.  When adding a new process type, how is one
> supposed to even know what function one needs to call?
>
> PostgresMain() has this comment:
>                 /*
>                  * NOTE: if you are tempted to add more code in this
> if-block,
>                  * consider the high probability that it should be in
>                  * AbortTransaction() instead.  The only stuff done
> directly here
>                  * should be stuff that is guaranteed to apply *only* for
> outer-level
>                  * error recovery, such as adjusting the FE/BE protocol
> status.
>                  */
>
> But a) that clearly hasn't worked, given how long the following block is,
> and
> b) can't really work that well, because we have plenty processes that don't
> use transaction, therefore putting cleanup in AbortTransaction() doesn't go
> that far.
>

I don't quite know how it should look like, but it seems pretty obvious that
> it should be more unified than it is.  My gut feeling is that we should
> have a
> single error recovery function that has a flags argument guiding which
> subsystems should be reset and which shouldn't.
>
>
+1 for the idea. I am interested in writing a patch for the same if you
would like.

As you mentioned, these code blocks under the if
(sigsetjmp(local_sigjmp_buf, 1) != 0)
statement have a very similar set of function calls, depending on the
process type—whether
it's an auxiliary process, background process, or client backend.
There are also similarities across these types; for instance, all of them
register a ProcKill
callback as an on_shmem_exit callback..

Currently, we perform LWLockReleaseAll() at the before_shmem_exit stage,
such as in the ShutdownAuxiliaryProcess() callback for auxiliary processes
and
ShutdownPostgres() for client backends.
However, there are some inconsistencies:
1.  The client backend does not call LWLockReleaseAll() until ProcKill(),
if it is not
in a transaction.
2. The background processes and AutovacuumLauncher do not register a
before_shmem_callback
for calling LWLockReleaseAll() but may invoke it directly within the
sigsetjmp() blocks.
3. Some auxiliary processes also call LWLockReleaseAll() early in the
sigsetjmp() block.


> Seems we should just reorder the sequence in ProcKill() to first call
> LWLockReleaseAll()


It would be too late to call it in ProcKill, since dsm_backend_shutdown()
would
detach segments containing the LWLocks before this. Using a
before_shmem_exit callback or
handling it in the initial steps of sigsetjmp might be more suitable.

Thank you,
Rahila Syed

Reply via email to