Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2021-07-12 Thread Thomas Munro
On Fri, Jun 11, 2021 at 1:18 PM Tom Lane wrote: > Heikki Linnakangas writes: > > On 09/04/2021 07:01, Thomas Munro wrote: > >> This seems to work on Linux, macOS, FreeBSD and OpenBSD (and I assume > >> any other BSD). Can anyone tell me if it works on illumos, AIX or > >> HPUX, and if not, how

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2021-06-10 Thread Tom Lane
Heikki Linnakangas writes: > On 09/04/2021 07:01, Thomas Munro wrote: >> This seems to work on Linux, macOS, FreeBSD and OpenBSD (and I assume >> any other BSD). Can anyone tell me if it works on illumos, AIX or >> HPUX, and if not, how to fix it or disable the feature gracefully? >> For now the

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2021-06-10 Thread Heikki Linnakangas
On 09/04/2021 07:01, Thomas Munro wrote: On Wed, Mar 31, 2021 at 7:02 PM Thomas Munro wrote: On Fri, Mar 12, 2021 at 7:55 PM Thomas Munro wrote: On Thu, Mar 11, 2021 at 7:34 PM Michael Paquier wrote: Wow. This probably means that we would be able to get rid of USE_POSTMASTER_DEATH_SIGNAL?

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2021-04-08 Thread Thomas Munro
On Wed, Mar 31, 2021 at 7:02 PM Thomas Munro wrote: > On Fri, Mar 12, 2021 at 7:55 PM Thomas Munro wrote: > > On Thu, Mar 11, 2021 at 7:34 PM Michael Paquier wrote: > > > Wow. This probably means that we would be able to get rid of > > > USE_POSTMASTER_DEATH_SIGNAL? > > > > We'll still need

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2021-03-31 Thread Thomas Munro
On Fri, Mar 12, 2021 at 7:55 PM Thomas Munro wrote: > On Thu, Mar 11, 2021 at 7:34 PM Michael Paquier wrote: > > Wow. This probably means that we would be able to get rid of > > USE_POSTMASTER_DEATH_SIGNAL? > > We'll still need it, because there'd still be systems with no signal: > NetBSD,

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2021-03-23 Thread Fujii Masao
On 2021/03/23 14:49, Fujii Masao wrote: On 2021/03/23 10:52, Thomas Munro wrote: On Tue, Mar 23, 2021 at 2:44 PM Fujii Masao wrote: I found 0001 patch was committed in de829ddf23, and which added new wait event WalrcvExit. This name seems not consistent with other wait events. I'm

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2021-03-22 Thread Fujii Masao
On 2021/03/23 10:52, Thomas Munro wrote: On Tue, Mar 23, 2021 at 2:44 PM Fujii Masao wrote: I found 0001 patch was committed in de829ddf23, and which added new wait event WalrcvExit. This name seems not consistent with other wait events. I'm thinking it's better to rename it to

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2021-03-22 Thread Thomas Munro
On Tue, Mar 23, 2021 at 2:44 PM Fujii Masao wrote: > I found 0001 patch was committed in de829ddf23, and which added new > wait event WalrcvExit. This name seems not consistent with other wait > events. I'm thinking it's better to rename it to WalReceiverExit. Thought? > Patch attached. Agreed,

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2021-03-22 Thread Fujii Masao
On 2021/03/02 10:10, Thomas Munro wrote: On Tue, Mar 2, 2021 at 12:00 AM Thomas Munro wrote: On Mon, Nov 16, 2020 at 8:56 PM Michael Paquier wrote: No objections with the two changes from pg_usleep() to WaitLatch() so they could be applied separately first. I thought about committing

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2021-03-11 Thread Thomas Munro
On Thu, Mar 11, 2021 at 7:34 PM Michael Paquier wrote: > On Thu, Mar 11, 2021 at 04:37:39PM +1300, Thomas Munro wrote: > > Michael, when you said "That's pretty hack-ish, still efficient" in > > reference to this code: > > > >> - if (IsUnderPostmaster && !PostmasterIsAlive()) > >> + if

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2021-03-10 Thread Michael Paquier
On Thu, Mar 11, 2021 at 04:37:39PM +1300, Thomas Munro wrote: > Michael, when you said "That's pretty hack-ish, still efficient" in > reference to this code: > >> - if (IsUnderPostmaster && !PostmasterIsAlive()) >> + if (IsUnderPostmaster && >> +#ifndef USE_POSTMASTER_DEATH_SIGNAL >> +

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2021-03-10 Thread Thomas Munro
On Tue, Mar 2, 2021 at 2:10 PM Thomas Munro wrote: > ... One question I haven't > got to the bottom of: is it a problem for the startup process that CVs > use CHECK_FOR_INTERRUPTS()? This was a red herring. The startup process already reaches CFI() via various paths, as I figured out pretty

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2021-03-01 Thread Thomas Munro
On Tue, Mar 2, 2021 at 12:00 AM Thomas Munro wrote: > On Mon, Nov 16, 2020 at 8:56 PM Michael Paquier wrote: > > No objections with the two changes from pg_usleep() to WaitLatch() so > > they could be applied separately first. > > I thought about committing that first part, and got as far as >

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2021-03-01 Thread Thomas Munro
On Mon, Nov 16, 2020 at 8:56 PM Michael Paquier wrote: > On Thu, Sep 24, 2020 at 05:55:17PM +1200, Thomas Munro wrote: > > Right, RestoreArchivedFile() uses system(), so I guess it can hang > > around for a long time after unexpected postmaster exit on every OS if > > the command waits. To

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2020-11-15 Thread Michael Paquier
On Thu, Sep 24, 2020 at 05:55:17PM +1200, Thomas Munro wrote: > Right, RestoreArchivedFile() uses system(), so I guess it can hang > around for a long time after unexpected postmaster exit on every OS if > the command waits. To respond to various kinds of important > interrupts, I suppose that'd

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2020-09-23 Thread Thomas Munro
On Thu, Sep 24, 2020 at 2:39 AM Fujii Masao wrote: > Does this patch work fine with warm-standby case using pg_standby? > IIUC the startup process doesn't call WaitLatch() in that case, so ISTM that, > with the patch, it cannot detect the postmaster death immediately. Right,

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2020-09-23 Thread Fujii Masao
On 2020/09/23 12:47, Thomas Munro wrote: On Wed, Sep 23, 2020 at 2:27 PM David Rowley wrote: I've gone as far as running the recovery tests on the v3-0001 patch using a Windows machine. They pass: Thanks! I pushed that one, because it was effectively a bug fix (WaitLatch() without a

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2020-09-22 Thread Thomas Munro
On Wed, Sep 23, 2020 at 2:27 PM David Rowley wrote: > I've gone as far as running the recovery tests on the v3-0001 patch > using a Windows machine. They pass: Thanks! I pushed that one, because it was effectively a bug fix (WaitLatch() without a latch was supposed to work). I'll wait longer

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2020-09-22 Thread David Rowley
On Sun, 20 Sep 2020 at 09:29, Thomas Munro wrote: > > Although I know from CI that this builds and passes "make check" on > Windows, I'm hoping to attract some review of the 0001 patch from a > Windows person, and confirmation that it passes "check-world" (or at > least src/test/recovery check)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2020-09-19 Thread Thomas Munro
On Sat, Sep 19, 2020 at 6:07 AM Fujii Masao wrote: > - pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE); > - pg_usleep(100L);/* 1000 ms */ > - pgstat_report_wait_end(); > + WaitLatch(NULL, WL_EXIT_ON_PM_DEATH | WL_TIMEOUT, 1000, >

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2020-09-18 Thread Fujii Masao
On 2020/09/18 9:30, Thomas Munro wrote: On Thu, Sep 17, 2020 at 10:47 PM Heikki Linnakangas wrote: Hmm, so for speedy response to postmaster death, you're relying on the loops to have other postmaster-death checks besides HandleStartupProcInterrupts(), in the form of WL_EXIT_ON_PM_DEATH.

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2020-09-17 Thread Thomas Munro
On Thu, Sep 17, 2020 at 10:47 PM Heikki Linnakangas wrote: > Hmm, so for speedy response to postmaster death, you're relying on the > loops to have other postmaster-death checks besides > HandleStartupProcInterrupts(), in the form of WL_EXIT_ON_PM_DEATH. That > seems a bit fragile, at the very

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2020-09-17 Thread Heikki Linnakangas
On 17/09/2020 13:31, Thomas Munro wrote: On Thu, Sep 17, 2020 at 10:19 PM Heikki Linnakangas wrote: If you put the counter in HandleStartupProcInterrupts(), it could be a long wait if the startup process is e.g. waiting for WAL to arrive in the loop in WaitForWALToBecomeAvailable(), or in

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2020-09-17 Thread Thomas Munro
On Thu, Sep 17, 2020 at 10:19 PM Heikki Linnakangas wrote: > On 17/09/2020 12:48, Thomas Munro wrote: > > So I think we should do > > something like what Heikki originally proposed to lower the frequency > > of checks, on systems where we don't have the ability to skip the > > check completely.

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2020-09-17 Thread Heikki Linnakangas
On 17/09/2020 12:48, Thomas Munro wrote: Hello, In commits 9f095299 and f98b8476 we improved recovery performance on Linux and FreeBSD but we didn't help other operating systems. David just confirmed for me that commenting out the PostmasterIsAlive() call in the main recovery loop speeds up

PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

2020-09-17 Thread Thomas Munro
Hello, In commits 9f095299 and f98b8476 we improved recovery performance on Linux and FreeBSD but we didn't help other operating systems. David just confirmed for me that commenting out the PostmasterIsAlive() call in the main recovery loop speeds up crash recovery considerably on his Windows