Hi Andres,
> Widening the perspective a bit, our current approach for such > error-handling / > shutdown functions seems ... not maintainable. In particular we have a > substantial number of top-level sigsetjmp() handlers that have slightly > different recovery codepaths. When adding a new process type, how is one > supposed to even know what function one needs to call? > > PostgresMain() has this comment: > /* > * NOTE: if you are tempted to add more code in this > if-block, > * consider the high probability that it should be in > * AbortTransaction() instead. The only stuff done > directly here > * should be stuff that is guaranteed to apply *only* for > outer-level > * error recovery, such as adjusting the FE/BE protocol > status. > */ > > But a) that clearly hasn't worked, given how long the following block is, > and > b) can't really work that well, because we have plenty processes that don't > use transaction, therefore putting cleanup in AbortTransaction() doesn't go > that far. > I don't quite know how it should look like, but it seems pretty obvious that > it should be more unified than it is. My gut feeling is that we should > have a > single error recovery function that has a flags argument guiding which > subsystems should be reset and which shouldn't. > > +1 for the idea. I am interested in writing a patch for the same if you would like. As you mentioned, these code blocks under the if (sigsetjmp(local_sigjmp_buf, 1) != 0) statement have a very similar set of function calls, depending on the process type—whether it's an auxiliary process, background process, or client backend. There are also similarities across these types; for instance, all of them register a ProcKill callback as an on_shmem_exit callback.. Currently, we perform LWLockReleaseAll() at the before_shmem_exit stage, such as in the ShutdownAuxiliaryProcess() callback for auxiliary processes and ShutdownPostgres() for client backends. However, there are some inconsistencies: 1. The client backend does not call LWLockReleaseAll() until ProcKill(), if it is not in a transaction. 2. The background processes and AutovacuumLauncher do not register a before_shmem_callback for calling LWLockReleaseAll() but may invoke it directly within the sigsetjmp() blocks. 3. Some auxiliary processes also call LWLockReleaseAll() early in the sigsetjmp() block. > Seems we should just reorder the sequence in ProcKill() to first call > LWLockReleaseAll() It would be too late to call it in ProcKill, since dsm_backend_shutdown() would detach segments containing the LWLocks before this. Using a before_shmem_exit callback or handling it in the initial steps of sigsetjmp might be more suitable. Thank you, Rahila Syed
