Andres recently reminded me of some loose ends in archive modules [0], so
I'm starting a dedicated thread to address his feedback.

The first one is the requirement that archive module authors create their
own exception handlers if they want to make use of ERROR.  Ideally, there
would be a handler in pgarch.c so that authors wouldn't need to deal with
this.  I do see some previous dicussion about this [1] in which I expressed
concerns about memory management.  Looking at this again, I may have been
overthinking it.  IIRC I was thinking about creating a memory context that
would be switched into for only the archiving callback (and reset
afterwards), but that might not be necessary.  Instead, we could rely on
module authors to handle this.  One example is basic_archive, which
maintains its own memory context.  Alternatively, authors could simply
pfree() anything that was allocated.

Furthermore, by moving the exception handling to pgarch.c, module authors
can begin using PG_TRY, etc. in their archiving callbacks, which simplifies
things a bit.  I've attached a work-in-progress patch for this change.

On Fri, Feb 17, 2023 at 11:41:32AM -0800, Andres Freund wrote:
> On 2023-02-16 13:58:10 -0800, Nathan Bossart wrote:
>> On Thu, Feb 16, 2023 at 01:17:54PM -0800, Andres Freund wrote:
>> > I'm quite baffled by:
>> >            /* Close any files left open by copy_file() or compare_files() 
>> > */
>> >            AtEOSubXact_Files(false, InvalidSubTransactionId, 
>> > InvalidSubTransactionId);
>> > 
>> > in basic_archive_file(). It seems *really* off to call AtEOSubXact_Files()
>> > completely outside the context of a transaction environment. And it only 
>> > does
>> > the thing you want because you pass parameters that aren't actually valid 
>> > in
>> > the normal use in AtEOSubXact_Files().  I really don't understand how 
>> > that's
>> > supposed to be ok.
>> 
>> Hm.  Should copy_file() and compare_files() have PG_FINALLY blocks that
>> attempt to close the files instead?  What would you recommend?
> 
> I don't fully now, it's not entirely clear to me what the goals here were. I
> think you'd likely need to do a bit of infrastructure work to do this
> sanely. So far we just didn't have the need to handle files being released in
> a way like you want to do there.
> 
> I suspect a good direction would be to use resource owners. Add a separate set
> of functions that release files on resource owner release. Most of the
> infrastructure is there already, for temporary files
> (c.f. OpenTemporaryFile()).
> 
> Then that resource owner could be reset in case of error.
> 
> 
> I'm not even sure that erroring out is a reasonable way to implement
> copy_file(), compare_files(), particularly because you want to return via a
> return code from basic_archive_files().

To initialize this thread, I'll provide a bit more background.
basic_archive makes use of copy_file(), and it introduces a function called
compare_files() that is used to check whether two files have the same
content.  These functions make use of OpenTransientFile() and
CloseTransientFile().  In basic_archive's sigsetjmp() block, there's a call
to AtEOSubXact_Files() to make sure we close any files that are open when
there is an ERROR.  IIRC I was following the example set by other processes
that make use of the AtEOXact* functions in their sigsetjmp() blocks.
Looking again, I think AtEOXact_Files() would also work for basic_archive's
use-case.  That would at least avoid the hack of using
InvalidSubTransactionId for the second and third arguments.

>From the feedback quoted above, it sounds like improving this further will
require a bit of infrastructure work.  I haven't looked too deeply into
this yet.

[0] https://postgr.es/m/20230216192956.mhi6uiakchkolpki%40awork3.anarazel.de
[1] https://postgr.es/m/20220202224433.GA1036711%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
index cd852888ce..2c51d2b100 100644
--- a/contrib/basic_archive/basic_archive.c
+++ b/contrib/basic_archive/basic_archive.c
@@ -172,7 +172,6 @@ basic_archive_configured(ArchiveModuleState *state)
 static bool
 basic_archive_file(ArchiveModuleState *state, const char *file, const char *path)
 {
-	sigjmp_buf	local_sigjmp_buf;
 	MemoryContext oldcontext;
 	BasicArchiveData *data = (BasicArchiveData *) state->private_data;
 	MemoryContext basic_archive_context = data->context;
@@ -184,51 +183,22 @@ basic_archive_file(ArchiveModuleState *state, const char *file, const char *path
 	 */
 	oldcontext = MemoryContextSwitchTo(basic_archive_context);
 
-	/*
-	 * Since the archiver operates at the bottom of the exception stack,
-	 * ERRORs turn into FATALs and cause the archiver process to restart.
-	 * However, using ereport(ERROR, ...) when there are problems is easy to
-	 * code and maintain.  Therefore, we create our own exception handler to
-	 * catch ERRORs and return false instead of restarting the archiver
-	 * whenever there is a failure.
-	 */
-	if (sigsetjmp(local_sigjmp_buf, 1) != 0)
+	PG_TRY();
+	{
+		/* Archive the file! */
+		basic_archive_file_internal(file, path);
+	}
+	PG_CATCH();
 	{
-		/* Since not using PG_TRY, must reset error stack by hand */
-		error_context_stack = NULL;
-
-		/* Prevent interrupts while cleaning up */
-		HOLD_INTERRUPTS();
-
-		/* Report the error and clear ErrorContext for next time */
-		EmitErrorReport();
-		FlushErrorState();
-
 		/* Close any files left open by copy_file() or compare_files() */
-		AtEOSubXact_Files(false, InvalidSubTransactionId, InvalidSubTransactionId);
+		AtEOXact_Files(false);
 
-		/* Reset our memory context and switch back to the original one */
-		MemoryContextSwitchTo(oldcontext);
+		/* Reset our memory context */
 		MemoryContextReset(basic_archive_context);
 
-		/* Remove our exception handler */
-		PG_exception_stack = NULL;
-
-		/* Now we can allow interrupts again */
-		RESUME_INTERRUPTS();
-
-		/* Report failure so that the archiver retries this file */
-		return false;
+		PG_RE_THROW();
 	}
-
-	/* Enable our exception handler */
-	PG_exception_stack = &local_sigjmp_buf;
-
-	/* Archive the file! */
-	basic_archive_file_internal(file, path);
-
-	/* Remove our exception handler */
-	PG_exception_stack = NULL;
+	PG_END_TRY();
 
 	/* Reset our memory context and switch back to the original one */
 	MemoryContextSwitchTo(oldcontext);
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 46af349564..5ca8457b13 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -501,6 +501,8 @@ pgarch_ArchiverCopyLoop(void)
 static bool
 pgarch_archiveXlog(char *xlog)
 {
+	sigjmp_buf	local_sigjmp_buf;
+	MemoryContext oldcontext = CurrentMemoryContext;
 	char		pathname[MAXPGPATH];
 	char		activitymsg[MAXFNAMELEN + 16];
 	bool		ret;
@@ -511,7 +513,49 @@ pgarch_archiveXlog(char *xlog)
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	ret = ArchiveCallbacks->archive_file_cb(archive_module_state, xlog, pathname);
+	/*
+	 * Since the archiver operates at the bottom of the exception stack,
+	 * ERRORs turn into FATALs and cause the archiver process to restart.
+	 * However, using ereport(ERROR, ...) when there are problems is easy to
+	 * code and maintain.  Therefore, we create our own exception handler to
+	 * catch ERRORs and return false instead of restarting the archiver
+	 * whenever there is a failure.
+	 */
+	if (sigsetjmp(local_sigjmp_buf, 1) != 0)
+	{
+		/* Since not using PG_TRY, must reset error stack by hand */
+		error_context_stack = NULL;
+
+		/* Prevent interrupts while cleaning up */
+		HOLD_INTERRUPTS();
+
+		/* Report the error and clear ErrorContext for next time */
+		EmitErrorReport();
+		MemoryContextSwitchTo(oldcontext);
+		FlushErrorState();
+
+		/* Remove our exception handler */
+		PG_exception_stack = NULL;
+
+		/* Now we can allow interrupts again */
+		RESUME_INTERRUPTS();
+
+		/* Report failure so that the archiver retries this file */
+		ret = false;
+	}
+	else
+	{
+		/* Enable our exception handler */
+		PG_exception_stack = &local_sigjmp_buf;
+
+		/* Archive the file! */
+		ret = ArchiveCallbacks->archive_file_cb(archive_module_state,
+												xlog, pathname);
+
+		/* Remove our exception handler */
+		PG_exception_stack = NULL;
+	}
+
 	if (ret)
 		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	else

Reply via email to