Hi, On 2024-01-23 22:00:01 +0300, Alexander Lakhin wrote: > 23.01.2024 20:30, Andres Freund wrote: > > I don't think that's viable and would cause more problems than it solves, > > it'd > > make us think that we might have an old postgres process hanging around that > > needs to be terminted before we can start up. And I simply don't see the > > point > > - we already record whether we crashed in the control file, no? > > With an Assert injected in walsender.c (as in [1]) and test > 012_subtransactions.pl modified to finish just after the first > "$node_primary->stop;", I see: > pg_controldata -D > src/test/recovery/tmp_check/t_012_subtransactions_primary_data/pgdata/ > Database cluster state: shut down > > But the assertion undoubtedly failed: > grep TRAP src/test/recovery/tmp_check/log/* > src/test/recovery/tmp_check/log/012_subtransactions_primary.log:TRAP: failed > Assert("0"), File: "walsender.c", Line: 2688, PID: 142201
Yea, because it's after checkpointer has changed the state to "shutdowned". I think we could add additional states, to be set by postmaster, instead of checkpointer, for this purpose. > As to the need to terminate a process, which is supposedly hanging around, > I think, this situation doesn't differ in general from what we have after > kill -9... So? Making it more likely for postgres failing to restart successfully, because the pid has been reused, is bad. Greetings, Andres Freund