Re: Reorder shutdown sequence, to flush pgstats later

2025-01-29 Thread Bertrand Drouvot
Hi, On Mon, Jan 27, 2025 at 03:07:01PM +, Bertrand Drouvot wrote: > On Fri, Jan 24, 2025 at 03:06:17PM -0500, Andres Freund wrote: > > but in case of an immediate *restart* (via pg_ctl restart -m immediate) > > we'll > > not have such hint. postmaster.c doesn't know it's an immediate restart,

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-27 Thread Bertrand Drouvot
Hi, On Fri, Jan 24, 2025 at 03:06:17PM -0500, Andres Freund wrote: > On 2025-01-15 08:38:33 +, Bertrand Drouvot wrote: > > Fair enough. I'll look at the remaining pieces once this stuff goes in. > > Cool. I did try writing the change, but it does indeed seem better as a > separate patch. Besi

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-26 Thread Bertrand Drouvot
Hi, On Sat, Jan 25, 2025 at 11:44:54AM -0500, Andres Freund wrote: > Hi, > > On 2025-01-24 15:06:17 -0500, Andres Freund wrote: > > Unless somebody argues against, I'm planning to push all but the last later > > today, wait for the buildfarm to settle, and then push the last. This is a > > depen

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-25 Thread Andres Freund
Hi, On 2025-01-24 15:06:17 -0500, Andres Freund wrote: > Unless somebody argues against, I'm planning to push all but the last later > today, wait for the buildfarm to settle, and then push the last. This is a > dependency of multiple other things, so it'd be good to get it in soon. I did push a

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-24 Thread Andres Freund
Hi, Attached is a new version of this patchset: - HandleFatalError now doesn't expect to be called if already in FatalError or ImmediateShutdown and contains a comment and assertions to that effect - Fatal errors when in pmState > PM_WAIT_BACKENDS go to PM_WAIT_DEAD_END directly. The previou

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-15 Thread Bertrand Drouvot
Hi, On Tue, Jan 14, 2025 at 06:55:03PM -0500, Andres Freund wrote: > Hi, > > On 2025-01-14 13:06:31 +, Bertrand Drouvot wrote: > > On Tue, Jan 14, 2025 at 12:58:44AM -0500, Andres Freund wrote: > > > The current code has the weird behaviour of going through > > > PM_WAIT_BACKENDS. I > > > le

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-14 Thread Andres Freund
Hi, On 2025-01-14 13:06:31 +, Bertrand Drouvot wrote: > On Tue, Jan 14, 2025 at 12:58:44AM -0500, Andres Freund wrote: > > The current code has the weird behaviour of going through PM_WAIT_BACKENDS. > > I > > left it like that for now. In fact more paths do so now, because we did so > > for

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-14 Thread Bertrand Drouvot
Hi, On Tue, Jan 14, 2025 at 12:58:44AM -0500, Andres Freund wrote: > Hi, > > On 2025-01-13 12:20:39 +, Bertrand Drouvot wrote: > > > We have > > > multiple copies of code to go to FatalError, that doesn't seem great. > > > > + * FIXME: This should probably not have a copy fo the code to > >

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-13 Thread Andres Freund
Hi, On 2025-01-13 12:20:39 +, Bertrand Drouvot wrote: > > We have > > multiple copies of code to go to FatalError, that doesn't seem great. > > + * FIXME: This should probably not have a copy fo the code to > + * transition into FatalError state. > + */ > + ereport(LOG, > + (errm

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-13 Thread Andres Freund
Hi, On 2025-01-13 12:20:39 +, Bertrand Drouvot wrote: > On Fri, Jan 10, 2025 at 02:07:25PM -0500, Andres Freund wrote: > > There wasn't > > any code to treat receiving PMSIGNAL_XLOG_IS_SHUTDOWN in the wrong state as > > fatal. I think this needs to be treated the same way we treat not being

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-13 Thread Bertrand Drouvot
Hi, On Fri, Jan 10, 2025 at 02:07:25PM -0500, Andres Freund wrote: > However, I think 0007 needs a bit more work. > > One thing that worries me a bit is that using SIGINT for triggering the > shutdown checkpoint could somehow be unintentionally triggered? Previously > checkpointer would effective

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-10 Thread Andres Freund
Hi, I've pushed 0001-0005. I think 0006 isn't far from ready. However, I think 0007 needs a bit more work. One thing that worries me a bit is that using SIGINT for triggering the shutdown checkpoint could somehow be unintentionally triggered? Previously checkpointer would effectively have ignor

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-10 Thread Andres Freund
Hi, Thanks for the reviews! On 2025-01-09 12:01:05 +, Bertrand Drouvot wrote: > On Wed, Jan 08, 2025 at 02:26:15PM -0500, Andres Freund wrote: > === 2 > > + PM_WAIT_XLOG_ARCHIVAL, /* waiting for archiver and walsenders to > > > I don't love PM_WAIT_XLOG_ARCHIVAL, but I can't think

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-10 Thread Andres Freund
Hi, On 2025-01-09 16:50:45 +0300, Nazir Bilal Yavuz wrote: > On Wed, 8 Jan 2025 at 22:26, Andres Freund wrote: > === 0005 > > > I think this is much better than before. I don't love PM_WAIT_XLOG_ARCHIVAL, > > but I can't think of anything better. > > I liked this, I think it is good enough. >

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-09 Thread Nazir Bilal Yavuz
Hi, Thanks for working on this! On Wed, 8 Jan 2025 at 22:26, Andres Freund wrote: > > I'm currently to plan the four patches relatively soon, unless somebody speaks > up, of course. They seem fairly uncontroversial. The renaming of the phases > doesn't need to wait much longer, I think. > > The

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-09 Thread Bertrand Drouvot
Hi, On Wed, Jan 08, 2025 at 02:26:15PM -0500, Andres Freund wrote: > I'm currently to plan the four patches relatively soon, unless somebody speaks > up, of course. They seem fairly uncontroversial. The renaming of the phases > doesn't need to wait much longer, I think. Thanks for the patches.

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-09 Thread Bertrand Drouvot
Hi, On Wed, Jan 08, 2025 at 02:32:24PM -0500, Andres Freund wrote: > On 2025-01-08 14:48:21 +, Bertrand Drouvot wrote: > > === 2 > > > > + default: > > + /* all signals sent by postmaster should be listed > > here */ > > + Assert(false

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-08 Thread Andres Freund
Hi, On 2025-01-08 14:48:21 +, Bertrand Drouvot wrote: > On Wed, Jan 08, 2025 at 03:10:03PM +0200, Heikki Linnakangas wrote: > > On 08/01/2025 04:10, Andres Freund wrote: > > > 0002: Make logging of postmaster signalling child processes more > > > consistent > > > > > >I found it somew

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-08 Thread Andres Freund
Hi, On 2025-01-08 15:10:03 +0200, Heikki Linnakangas wrote: > On 08/01/2025 04:10, Andres Freund wrote: > > I instead opted to somewhat rearrange the shutdown sequence: > > - I don't love the naming of the various PMState values (existing and new), > >but a larger renaming should probably be d

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-08 Thread Bertrand Drouvot
Hi, On Wed, Jan 08, 2025 at 03:10:03PM +0200, Heikki Linnakangas wrote: > On 08/01/2025 04:10, Andres Freund wrote: > > > elog(DEBUG1, "updating PMState from %d/%s to %d/%s", > > pmState, pmstate_name(pmState), newState, > > pmstate_name(newState)); > > I think the state names

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-08 Thread Heikki Linnakangas
On 08/01/2025 04:10, Andres Freund wrote: I instead opted to somewhat rearrange the shutdown sequence: This commit changes the shutdown sequence so checkpointer is signalled to trigger writing the shutdown checkpoint without terminating it. Once checkpointer wrote the checkpoint i

Reorder shutdown sequence, to flush pgstats later

2025-01-07 Thread Andres Freund
Hi, In the AIO patchset I just hit a corner case in which IO workers can emit stats ([1]). That in turn can trigger an assertion failure, because by the time the IO workers are shut down, checkpointer already has exited and written out the stats. In this case the relevant IO has completed, but not