Re: Weird failure with latches in curculio on v15

2023-03-01 Thread Robert Haas
On Tue, Feb 28, 2023 at 11:29 PM Nathan Bossart wrote: > On Sat, Feb 25, 2023 at 11:00:31AM -0800, Andres Freund wrote: > > TBH, I think the current archive and restore module APIs aren't useful. I > > think it was a mistake to add archive modules without having demonstrated > > that > > one can

Re: Weird failure with latches in curculio on v15

2023-03-01 Thread Stephen Frost
Greetings, * Nathan Bossart (nathandboss...@gmail.com) wrote: > On Sat, Feb 25, 2023 at 11:00:31AM -0800, Andres Freund wrote: > > TBH, I think the current archive and restore module APIs aren't useful. I > > think it was a mistake to add archive modules without having demonstrated > > that > >

Re: Weird failure with latches in curculio on v15

2023-02-28 Thread Nathan Bossart
On Sat, Feb 25, 2023 at 11:00:31AM -0800, Andres Freund wrote: > TBH, I think the current archive and restore module APIs aren't useful. I > think it was a mistake to add archive modules without having demonstrated that > one can do something useful with them that the restore_command didn't

Re: Weird failure with latches in curculio on v15

2023-02-25 Thread Andres Freund
Hi, On 2023-02-19 20:06:24 +0530, Robert Haas wrote: > On Sun, Feb 19, 2023 at 2:45 AM Andres Freund wrote: > > To me that seems even simpler? Nothing but the archiver is supposed to > > create > > .done files and nothing is supposed to remove .ready files without archiver > > having created

Re: Weird failure with latches in curculio on v15

2023-02-23 Thread Nathan Bossart
On Wed, Feb 22, 2023 at 09:48:10PM +1300, Thomas Munro wrote: > On Tue, Feb 21, 2023 at 5:50 PM Nathan Bossart > wrote: >> On Tue, Feb 21, 2023 at 09:03:27AM +0900, Michael Paquier wrote: >> > Perhaps beginning a new thread with a patch and a summary would be >> > better at this stage? Another

Re: Weird failure with latches in curculio on v15

2023-02-22 Thread Thomas Munro
On Tue, Feb 21, 2023 at 5:50 PM Nathan Bossart wrote: > On Tue, Feb 21, 2023 at 09:03:27AM +0900, Michael Paquier wrote: > > Perhaps beginning a new thread with a patch and a summary would be > > better at this stage? Another thing I am wondering is if it could be > > possible to test that

Re: Weird failure with latches in curculio on v15

2023-02-20 Thread Nathan Bossart
On Tue, Feb 21, 2023 at 09:03:27AM +0900, Michael Paquier wrote: > On Tue, Feb 14, 2023 at 09:47:55AM -0800, Nathan Bossart wrote: >> Here is a new version of the stopgap/back-branch fix for restore_command. >> This is more or less a rebased version of v4 with an added stderr message >> as Andres

Re: Weird failure with latches in curculio on v15

2023-02-20 Thread Thomas Munro
On Tue, Feb 21, 2023 at 1:03 PM Michael Paquier wrote: > Perhaps beginning a new thread with a patch and a summary would be > better at this stage? Another thing I am wondering is if it could be > possible to test that rather reliably. I have been playing with a few > scenarios like holding the

Re: Weird failure with latches in curculio on v15

2023-02-20 Thread Michael Paquier
On Tue, Feb 14, 2023 at 09:47:55AM -0800, Nathan Bossart wrote: > Here is a new version of the stopgap/back-branch fix for restore_command. > This is more or less a rebased version of v4 with an added stderr message > as Andres suggested upthread. So, this thread has moved around many subjects,

Re: Weird failure with latches in curculio on v15[

2023-02-20 Thread Stephen Frost
Greetings, * Michael Paquier (mich...@paquier.xyz) wrote: > On Sun, Feb 19, 2023 at 08:06:24PM +0530, Robert Haas wrote: > > I mean, my idea was to basically just have one big callback: > > ArchiverModuleMainLoopCB(). Which wouldn't return, or perhaps, would > > only return when archiving was

Re: Weird failure with latches in curculio on v15[

2023-02-20 Thread Michael Paquier
On Sun, Feb 19, 2023 at 08:06:24PM +0530, Robert Haas wrote: > I mean, my idea was to basically just have one big callback: > ArchiverModuleMainLoopCB(). Which wouldn't return, or perhaps, would > only return when archiving was totally caught up and there was nothing > more to do right now. And

Re: Weird failure with latches in curculio on v15

2023-02-19 Thread Robert Haas
On Sun, Feb 19, 2023 at 2:45 AM Andres Freund wrote: > To me that seems even simpler? Nothing but the archiver is supposed to create > .done files and nothing is supposed to remove .ready files without archiver > having created the .done files. So the archiver process can scan > archive_status

Re: Weird failure with latches in curculio on v15

2023-02-18 Thread Andres Freund
Hi, On 2023-02-18 15:51:06 +0530, Robert Haas wrote: > On Thu, Feb 16, 2023 at 10:02 PM Andres Freund wrote: > > But there's nothing inherent in that. We know for certain which files we're > > going to archive. And we don't need to work one-by-one. The archiver could > > just start multiple

Re: Weird failure with latches in curculio on v15

2023-02-18 Thread Robert Haas
On Thu, Feb 16, 2023 at 10:02 PM Andres Freund wrote: > But there's nothing inherent in that. We know for certain which files we're > going to archive. And we don't need to work one-by-one. The archiver could > just start multiple subprocesses at the same time. But what if it doesn't want to

Re: Weird failure with latches in curculio on v15

2023-02-18 Thread Robert Haas
On Thu, Feb 16, 2023 at 10:28 PM Nathan Bossart wrote: > > Hmm. So in this design, the archiver doesn't really do the archiving > > any more, because the interface makes that impossible. It has to use a > > separate background worker process for that, full stop. > > > > I don't think that's a

Re: Weird failure with latches in curculio on v15

2023-02-16 Thread Nathan Bossart
On Thu, Feb 16, 2023 at 03:08:14PM +0530, Robert Haas wrote: > On Thu, Feb 9, 2023 at 10:53 PM Nathan Bossart > wrote: >> I've been thinking about this, actually. I'm wondering if we could provide >> a list of files to the archiving callback (configurable via a variable in >>

Re: Weird failure with latches in curculio on v15

2023-02-16 Thread Andres Freund
Hi, On 2023-02-16 15:18:57 +0530, Robert Haas wrote: > On Fri, Feb 10, 2023 at 12:59 AM Andres Freund wrote: > > I don't think it's that hard to imagine problems. To be reasonably fast, a > > decent restore implementation will have to 'restore ahead'. Which also > > provides ample things to go

Re: Weird failure with latches in curculio on v15

2023-02-16 Thread Robert Haas
On Fri, Feb 10, 2023 at 12:59 AM Andres Freund wrote: > I'm somewhat concerned about that too, but perhaps from a different > angle. First, I think we don't do our users a service by defaulting the > in-core implementation to something that doesn't scale to even a moderately > busy server. +1.

Re: Weird failure with latches in curculio on v15

2023-02-16 Thread Robert Haas
On Thu, Feb 9, 2023 at 10:53 PM Nathan Bossart wrote: > I've been thinking about this, actually. I'm wondering if we could provide > a list of files to the archiving callback (configurable via a variable in > ArchiveModuleState), and then have the callback return a list of files that > are

Re: Weird failure with latches in curculio on v15

2023-02-14 Thread Nathan Bossart
Here is a new version of the stopgap/back-branch fix for restore_command. This is more or less a rebased version of v4 with an added stderr message as Andres suggested upthread. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 20edca59834c7755bfddb070fb9db3f59dc6ff96 Mon Sep

Re: Weird failure with latches in curculio on v15

2023-02-09 Thread Andres Freund
Hi, On 2023-02-09 11:12:21 -0500, Robert Haas wrote: > On Thu, Feb 9, 2023 at 10:51 AM Tom Lane wrote: > > I'm fairly concerned about the idea of making it common for people > > to write their own main loop for the archiver. That means that, if > > we have a bug fix that requires the archiver

Re: Weird failure with latches in curculio on v15

2023-02-09 Thread Nathan Bossart
On Thu, Feb 09, 2023 at 11:12:21AM -0500, Robert Haas wrote: > On Thu, Feb 9, 2023 at 10:51 AM Tom Lane wrote: >> If we think we need primitives to let the archiver hooks get all >> the pending files, or whatever, by all means add those. But don't >> cede fundamental control of the archiver.

Re: Weird failure with latches in curculio on v15

2023-02-09 Thread Robert Haas
On Thu, Feb 9, 2023 at 10:51 AM Tom Lane wrote: > I'm fairly concerned about the idea of making it common for people > to write their own main loop for the archiver. That means that, if > we have a bug fix that requires the archiver to do X, we will not > just be patching our own code but trying

Re: Weird failure with latches in curculio on v15

2023-02-09 Thread Tom Lane
Robert Haas writes: > I think that we could certainly, as Michael suggests, have people > provide their own background worker rather than having the archiver > invoke the user-supplied code directly. As long as the functions that > you need in order to get the necessary information can be called

Re: Weird failure with latches in curculio on v15

2023-02-09 Thread Robert Haas
On Wed, Feb 8, 2023 at 7:24 PM Nathan Bossart wrote: > On Thu, Feb 09, 2023 at 08:56:24AM +0900, Michael Paquier wrote: > > On Wed, Feb 08, 2023 at 02:25:54PM -0800, Nathan Bossart wrote: > >> These are all good points. Perhaps there could be a base archiver > >> implementation that

Re: Weird failure with latches in curculio on v15

2023-02-08 Thread Nathan Bossart
On Thu, Feb 09, 2023 at 08:56:24AM +0900, Michael Paquier wrote: > On Wed, Feb 08, 2023 at 02:25:54PM -0800, Nathan Bossart wrote: >> These are all good points. Perhaps there could be a base archiver >> implementation that shell_archive uses (and that other modules could use if >> desired, which

Re: Weird failure with latches in curculio on v15

2023-02-08 Thread Michael Paquier
On Wed, Feb 08, 2023 at 02:25:54PM -0800, Nathan Bossart wrote: > These are all good points. Perhaps there could be a base archiver > implementation that shell_archive uses (and that other modules could use if > desired, which might be important for backward compatibility with the > existing

Re: Weird failure with latches in curculio on v15

2023-02-08 Thread Nathan Bossart
On Wed, Feb 08, 2023 at 04:24:15PM -0500, Robert Haas wrote: > On Wed, Feb 8, 2023 at 12:43 PM Nathan Bossart > wrote: >> I think this could be a good approach if we decide not to bake too much >> into PostgreSQL itself (e.g., such as creating multiple archive workers >> that each call out to

Re: Weird failure with latches in curculio on v15

2023-02-08 Thread Robert Haas
On Wed, Feb 8, 2023 at 12:43 PM Nathan Bossart wrote: > I think this could be a good approach if we decide not to bake too much > into PostgreSQL itself (e.g., such as creating multiple archive workers > that each call out to the module). Archive module authors would > effectively need to write

Re: Weird failure with latches in curculio on v15

2023-02-08 Thread Nathan Bossart
On Wed, Feb 08, 2023 at 10:22:24AM -0500, Robert Haas wrote: > I felt like the archive modules work was a step forward when we did, > because basic_archive does some things that you're not likely to get > right if you do it on your own. And a similar approach to > restore_command might be also be

Re: Weird failure with latches in curculio on v15

2023-02-08 Thread Robert Haas
On Sun, Feb 5, 2023 at 7:46 PM Nathan Bossart wrote: > Got it. I suspect we'll want to do something similar for archive modules > eventually, too. +1. I felt like the archive modules work was a step forward when we did, because basic_archive does some things that you're not likely to get right

Re: Weird failure with latches in curculio on v15

2023-02-05 Thread Nathan Bossart
On Sun, Feb 05, 2023 at 04:07:50PM -0800, Andres Freund wrote: > On 2023-02-05 15:57:47 -0800, Nathan Bossart wrote: >> I agree that the shell overhead isn't the main performance issue, >> but it's unclear to me how much of this should be baked into >> PostgreSQL. > > I don't know fully either.

Re: Weird failure with latches in curculio on v15

2023-02-05 Thread Andres Freund
Hi, On 2023-02-05 15:57:47 -0800, Nathan Bossart wrote: > > For the segment files, we'd likely need a parameter to indicate whether > > the restore is random or not. > > Wouldn't this approach still require each module to handle restoring ahead > of time? Yes, to some degree at least. I was

Re: Weird failure with latches in curculio on v15

2023-02-05 Thread Nathan Bossart
On Sun, Feb 05, 2023 at 03:01:57PM -0800, Andres Freund wrote: > I think at the very least you'd want to have a separate callback for > restoring segments than for restoring other files. But more likely a > separate callback for each type of file to be restored. > > For the timeline history case

Re: Weird failure with latches in curculio on v15

2023-02-05 Thread Michael Paquier
On Sun, Feb 05, 2023 at 09:49:57AM +0900, Michael Paquier wrote: > Yes, at this stage a revert of the refactoring with shell_restore.c is > the best path forward. Done that now, as of 2f6e15a. -- Michael signature.asc Description: PGP signature

Re: Weird failure with latches in curculio on v15

2023-02-05 Thread Andres Freund
Hi, On 2023-02-05 14:19:38 -0800, Nathan Bossart wrote: > On Sun, Feb 05, 2023 at 09:49:57AM +0900, Michael Paquier wrote: > > - Should we include archive_cleanup_command into the recovery modules > > at all? We've discussed offloading that from the checkpointer, and it > > makes the failure

Re: Weird failure with latches in curculio on v15

2023-02-05 Thread Nathan Bossart
On Sun, Feb 05, 2023 at 09:49:57AM +0900, Michael Paquier wrote: > - Should we include archive_cleanup_command into the recovery modules > at all? We've discussed offloading that from the checkpointer, and it > makes the failure handling trickier when it comes to unexpected GUC > configurations,

Re: Weird failure with latches in curculio on v15

2023-02-05 Thread Andres Freund
Hi, On 2023-02-04 10:03:54 -0800, Nathan Bossart wrote: > On Sat, Feb 04, 2023 at 03:30:29AM -0800, Andres Freund wrote: > > That's kind of my problem with these changes. They try to introduce new > > abstraction layers, but don't provide real abstraction, because they're > > very tightly bound

Re: Weird failure with latches in curculio on v15

2023-02-04 Thread Michael Paquier
On Sat, Feb 04, 2023 at 10:03:54AM -0800, Nathan Bossart wrote: > Okay. Michael, why don't we revert the shell_restore stuff for now? Once > the archive modules interface changes and the fix for this > SIGTERM-during-system() problem are in, I will work through this feedback > and give recovery

Re: Weird failure with latches in curculio on v15

2023-02-04 Thread Nathan Bossart
On Sat, Feb 04, 2023 at 03:30:29AM -0800, Andres Freund wrote: > That's kind of my problem with these changes. They try to introduce new > abstraction layers, but don't provide real abstraction, because they're > very tightly bound to the way the functions were called before the > refactoring.

Re: Weird failure with latches in curculio on v15

2023-02-04 Thread Andres Freund
Hi, On 2023-02-02 11:06:19 +0900, Michael Paquier wrote: > > - the error message for a failed restore command seems to have gotten > > worse: > > could not restore file \"%s\" from archive: %s" > > -> > > "%s \"%s\": %s", commandName, command > > IMO, we don't lose any context with this

Re: Weird failure with latches in curculio on v15

2023-02-04 Thread Andres Freund
Hi, On 2023-02-03 10:54:17 -0800, Nathan Bossart wrote: > @@ -146,7 +146,25 @@ ExecuteRecoveryCommand(const char *command, const char > *commandName, >*/ > fflush(NULL); > pgstat_report_wait_start(wait_event_info); > + > + /* > + * PreRestoreCommand() is used to tell

Re: Weird failure with latches in curculio on v15

2023-02-03 Thread Michael Paquier
On Fri, Feb 03, 2023 at 10:54:17AM -0800, Nathan Bossart wrote: > 0001 is just v1-0001 from upthread. This moves Pre/PostRestoreCommand to > surround only the call to system(). I think this should get us closer to > pre-v15 behavior. + if (exitOnSigterm) + PreRestoreCommand(); + rc

Re: Weird failure with latches in curculio on v15

2023-02-03 Thread Nathan Bossart
On Thu, Feb 02, 2023 at 11:44:22PM -0800, Andres Freund wrote: > On 2023-02-03 02:24:03 -0500, Tom Lane wrote: >> Andres Freund writes: >> > A workaround for the back branches could be to have a test in >> > StartupProcShutdownHandler() that tests if MyProcPid == getpid(), and >> > not do the

Re: Weird failure with latches in curculio on v15

2023-02-03 Thread Andres Freund
Hi, On February 3, 2023 9:19:23 AM GMT+01:00, Thomas Munro wrote: >On Fri, Feb 3, 2023 at 9:09 PM Andres Freund wrote: >> Thinking about popen() suggests that we have a similar problem with COPY >> FROM PROGRAM as we have in pgarch (i.e. not as bad as the startup >> process issue, but still

Re: Weird failure with latches in curculio on v15

2023-02-03 Thread Thomas Munro
On Fri, Feb 3, 2023 at 9:09 PM Andres Freund wrote: > Thinking about popen() suggests that we have a similar problem with COPY > FROM PROGRAM as we have in pgarch (i.e. not as bad as the startup > process issue, but still not great, due to > procsignal_sigusr1_handler()). A small mercy: while we

Re: Weird failure with latches in curculio on v15

2023-02-03 Thread Andres Freund
Hi, On 2023-02-03 02:50:38 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2023-02-03 02:24:03 -0500, Tom Lane wrote: > >> setsid(2) is required since SUSv2, so I'm not sure which systems > >> are of concern here ... other than Redmond's of course. > > > I was thinking of windows, yes. > >

Re: Weird failure with latches in curculio on v15

2023-02-02 Thread Andres Freund
Hi, On 2023-02-03 20:34:36 +1300, Thomas Munro wrote: > What if we block signals, fork, then in the child, install the default > SIGTERM handler, then unblock, and then exec the shell? Yep. I was momentarily wondering why we'd even need to unblock signals, but while exec (et al) reset the

Re: Weird failure with latches in curculio on v15

2023-02-02 Thread Tom Lane
Andres Freund writes: > On 2023-02-03 02:24:03 -0500, Tom Lane wrote: >> setsid(2) is required since SUSv2, so I'm not sure which systems >> are of concern here ... other than Redmond's of course. > I was thinking of windows, yes. But given the lack of fork(2), Windows requires a completely

Re: Weird failure with latches in curculio on v15

2023-02-02 Thread Tom Lane
Thomas Munro writes: > On Fri, Feb 3, 2023 at 8:35 PM Andres Freund wrote: >> we don't have a hard dependency on setsid() > FTR There are no Unixes without setsid()... Yeah. What I just got done reading in SUSv2 (1997) is "Derived from the POSIX.1-1988 standard". We need not concern

Re: Weird failure with latches in curculio on v15

2023-02-02 Thread Andres Freund
Hi, On 2023-02-03 02:24:03 -0500, Tom Lane wrote: > Andres Freund writes: > > Ugh, I think I might understand what's happening: > > > The signal arrives just after the fork() (within system()). Because we > > have all our processes configure themselves as process group leaders, > > and we

Re: Weird failure with latches in curculio on v15

2023-02-02 Thread Thomas Munro
On Fri, Feb 3, 2023 at 8:35 PM Andres Freund wrote: > we don't have a > hard dependency on setsid() FTR There are no Unixes without setsid()... HAVE_SETSID was only left in the tree because we were discussing whether to replace it with !defined(WIN32) or whether that somehow made things more

Re: Weird failure with latches in curculio on v15

2023-02-02 Thread Andres Freund
Hi, On 2023-02-02 14:39:19 -0800, Nathan Bossart wrote: > Actually, this still doesn't really explain why we need to exit immediately > in the SIGTERM handler for restore_command. We already have handling for > when the command indicates it exited due to SIGTERM, so it should be no > problem if

Re: Weird failure with latches in curculio on v15

2023-02-02 Thread Thomas Munro
On Fri, Feb 3, 2023 at 8:24 PM Tom Lane wrote: > Andres Freund writes: > > Ugh, I think I might understand what's happening: > > > The signal arrives just after the fork() (within system()). Because we > > have all our processes configure themselves as process group leaders, > > and we signal

Re: Weird failure with latches in curculio on v15

2023-02-02 Thread Tom Lane
Andres Freund writes: > Ugh, I think I might understand what's happening: > The signal arrives just after the fork() (within system()). Because we > have all our processes configure themselves as process group leaders, > and we signal the entire process group (c.f. signal_child()), both the >

Re: Weird failure with latches in curculio on v15

2023-02-02 Thread Andres Freund
Hi, On 2023-02-02 10:23:21 +0900, Michael Paquier wrote: > On Wed, Feb 01, 2023 at 10:18:27AM -0800, Andres Freund wrote: > > On 2023-02-01 12:27:19 -0500, Tom Lane wrote: > >> Andres Freund writes: > >> The main thing that system() brings to the table is platform-specific > >> knowledge of

Re: Weird failure with latches in curculio on v15

2023-02-02 Thread Nathan Bossart
On Thu, Feb 02, 2023 at 02:39:19PM -0800, Nathan Bossart wrote: > Maybe we could just > remove this exit-in-SIGTERM-handler business... I've spent some time testing this. It seems to work pretty well, but only if I keep the exit-on-SIGTERM logic in shell_restore(). Without that, I'm seeing

Re: Weird failure with latches in curculio on v15

2023-02-02 Thread Nathan Bossart
On Thu, Feb 02, 2023 at 02:01:13PM -0800, Nathan Bossart wrote: > I've been digging into the history here. This e-mail seems to have the > most context [0]. IIUC this was intended to prevent "fast" shutdowns from > escalating to "immediate" shutdowns because the restore command died >

Re: Weird failure with latches in curculio on v15

2023-02-02 Thread Nathan Bossart
On Thu, Feb 02, 2023 at 04:14:54PM -0500, Robert Haas wrote: > + /* > +* When exitOnSigterm is set and we are in the startup process, use > the > +* special wrapper for system() that enables exiting immediately upon > +* receiving SIGTERM. This ensures we can break

Re: Weird failure with latches in curculio on v15

2023-02-02 Thread Robert Haas
On Thu, Feb 2, 2023 at 3:10 PM Nathan Bossart wrote: > Hm. I don't know if we want to encourage further use of > in_restore_command since it seems to be prone to misuse. Here's a v2 that > demonstrateѕ Tom's idea (bikeshedding on names and comments is welcome). I > personally like this

Re: Weird failure with latches in curculio on v15

2023-02-02 Thread Nathan Bossart
On Thu, Feb 02, 2023 at 01:24:15PM +0900, Michael Paquier wrote: > On Wed, Feb 01, 2023 at 09:34:44PM -0500, Tom Lane wrote: >> I was vaguely wondering about removing both of those functions >> in favor of an integrated function that does a system() call >> with those things before and after it. >

Re: Weird failure with latches in curculio on v15

2023-02-01 Thread Michael Paquier
On Wed, Feb 01, 2023 at 09:34:44PM -0500, Tom Lane wrote: > Michael Paquier writes: >> Hmm. Isn't that something that we should also document in startup.c >> where both routines are defined? If we begin to use >> PreRestoreCommand() and PostRestoreCommand() in more code paths in the >> future,

Re: Weird failure with latches in curculio on v15

2023-02-01 Thread Tom Lane
Michael Paquier writes: > Hmm. Isn't that something that we should also document in startup.c > where both routines are defined? If we begin to use > PreRestoreCommand() and PostRestoreCommand() in more code paths in the > future, that could be again an issue. I was vaguely wondering about

Re: Weird failure with latches in curculio on v15

2023-02-01 Thread Michael Paquier
On Wed, Feb 01, 2023 at 02:35:55PM -0800, Nathan Bossart wrote: > Here is a first draft for the proposed stopgap fix. If we want to proceed > with this, I can provide patches for the back branches. > + /* > + * PreRestoreCommand() is used to tell the SIGTERM handler for the > startup >

Re: Weird failure with latches in curculio on v15

2023-02-01 Thread Michael Paquier
On Wed, Feb 01, 2023 at 10:18:27AM -0800, Andres Freund wrote: > On 2023-02-01 12:27:19 -0500, Tom Lane wrote: >> Andres Freund writes: >> The main thing that system() brings to the table is platform-specific >> knowledge of where the shell is. I'm not very sure that we want to >> wire in

Re: Weird failure with latches in curculio on v15

2023-02-01 Thread Nathan Bossart
On Wed, Feb 01, 2023 at 09:58:06AM -0800, Nathan Bossart wrote: > On Wed, Feb 01, 2023 at 08:58:01AM -0800, Andres Freund wrote: >> On 2023-02-01 10:12:26 -0500, Tom Lane wrote: >>> The fundamental issue is that we have no good way to break out >>> of system(), and I think the original idea was

Re: Weird failure with latches in curculio on v15

2023-02-01 Thread Andres Freund
Hi, On 2023-02-01 12:27:19 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2023-02-01 12:08:24 -0500, Robert Haas wrote: > >> I like the idea of not relying on system(). In most respects, doing > >> fork() + exec() ourselves seems superior. We can control where the > >> output goes, what

Re: Weird failure with latches in curculio on v15

2023-02-01 Thread Nathan Bossart
On Wed, Feb 01, 2023 at 08:58:01AM -0800, Andres Freund wrote: > On 2023-02-01 10:12:26 -0500, Tom Lane wrote: >> The fundamental issue is that we have no good way to break out >> of system(), and I think the original idea was that >> in_restore_command would be set *only* for the duration of the

Re: Weird failure with latches in curculio on v15

2023-02-01 Thread Tom Lane
Andres Freund writes: > On 2023-02-01 12:08:24 -0500, Robert Haas wrote: >> I like the idea of not relying on system(). In most respects, doing >> fork() + exec() ourselves seems superior. We can control where the >> output goes, what we do while waiting, etc. But system() runs the >> command

Re: Weird failure with latches in curculio on v15

2023-02-01 Thread Andres Freund
Hi, On 2023-02-01 12:08:24 -0500, Robert Haas wrote: > On Wed, Feb 1, 2023 at 11:58 AM Andres Freund wrote: > > > 9a740f81e clearly made things a lot worse, but it wasn't great > > > before. Can we see a way forward to removing the problem entirely? > > > > Yea, I think we can - we should stop

Re: Weird failure with latches in curculio on v15

2023-02-01 Thread Robert Haas
On Wed, Feb 1, 2023 at 11:58 AM Andres Freund wrote: > > 9a740f81e clearly made things a lot worse, but it wasn't great > > before. Can we see a way forward to removing the problem entirely? > > Yea, I think we can - we should stop relying on system(). If we instead > run the command properly as

Re: Weird failure with latches in curculio on v15

2023-02-01 Thread Andres Freund
Hi, On 2023-02-01 10:12:26 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2023-02-01 16:21:16 +1300, Thomas Munro wrote: > >> It's always in proc_exit() in StartupProcShutdownHandler(), a SIGTERM > >> handler which is allowed to call that while in_restore_command is > >> true. > > > Ugh,

Re: Weird failure with latches in curculio on v15

2023-02-01 Thread Nathan Bossart
On Wed, Feb 01, 2023 at 10:12:26AM -0500, Tom Lane wrote: > Andres Freund writes: >> On 2023-02-01 16:21:16 +1300, Thomas Munro wrote: >>> It's always in proc_exit() in StartupProcShutdownHandler(), a SIGTERM >>> handler which is allowed to call that while in_restore_command is >>> true. > >>

Re: Weird failure with latches in curculio on v15

2023-02-01 Thread Tom Lane
Andres Freund writes: > On 2023-02-01 16:21:16 +1300, Thomas Munro wrote: >> It's always in proc_exit() in StartupProcShutdownHandler(), a SIGTERM >> handler which is allowed to call that while in_restore_command is >> true. > Ugh, no wonder we're getting crashes. This whole business seems bogus

Re: Weird failure with latches in curculio on v15

2023-02-01 Thread Andres Freund
Hi, On 2023-02-01 16:21:16 +1300, Thomas Munro wrote: > My database off assertion failures has four like that, all 15 and master: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio=2023-02-01%2001:05:17 >

Re: Weird failure with latches in curculio on v15

2023-01-31 Thread Thomas Munro
My database off assertion failures has four like that, all 15 and master: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio=2023-02-01%2001:05:17 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio=2023-01-11%2011:16:40

Re: Weird failure with latches in curculio on v15

2023-01-31 Thread Andres Freund
Hi, On 2023-02-01 10:53:17 +0900, Michael Paquier wrote: > While browsing the buildfarm, I have noticed this failure on curcilio: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio=2023-02-01%2001%3A05%3A17 > > The test that has reported a failure is the check on the archive >

Weird failure with latches in curculio on v15

2023-01-31 Thread Michael Paquier
Hi all, While browsing the buildfarm, I have noticed this failure on curcilio: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio=2023-02-01%2001%3A05%3A17 The test that has reported a failure is the check on the archive module callback: # Failed test 'check shutdown callback of