Hi,

On 2023-11-13 16:42:31 -0600, Nathan Bossart wrote:
> There seems to be no interest in this patch, so I plan to withdraw it from
> the commitfest system by the end of the month unless such interest
> materializes.

I think it might just have arrived too shortly before the feature freeze to be
worth looking at at the time, and then it didn't really re-raise attention
until now.  I'm so far behind on keeping up with the list that I rarely end up
looking far back for things I'd like to have answered... Sorry.

I think it's somewhat important to fix this - having a dedicated "recover from
error" implementation in a bunch of extension modules seems quite likely to
cause problems down the line, when another type of resource needs to be dealt
with after errors.  I think many non-toy implementations would e.g. need to
release lwlocks in case of errors (e.g. because they use a shared hashtable to
queue jobs for workers or such).


> On Fri, Feb 17, 2023 at 01:56:24PM -0800, Nathan Bossart wrote:
> > The first one is the requirement that archive module authors create their
> > own exception handlers if they want to make use of ERROR.  Ideally, there
> > would be a handler in pgarch.c so that authors wouldn't need to deal with
> > this.  I do see some previous dicussion about this [1] in which I expressed
> > concerns about memory management.  Looking at this again, I may have been
> > overthinking it.  IIRC I was thinking about creating a memory context that
> > would be switched into for only the archiving callback (and reset
> > afterwards), but that might not be necessary.  Instead, we could rely on
> > module authors to handle this.  One example is basic_archive, which
> > maintains its own memory context.  Alternatively, authors could simply
> > pfree() anything that was allocated.
> >
> > Furthermore, by moving the exception handling to pgarch.c, module authors
> > can begin using PG_TRY, etc. in their archiving callbacks, which simplifies
> > things a bit.  I've attached a work-in-progress patch for this change.
>
> I took another look at this, and I think І remembered what I was worried
> about with memory management.  One example is the built-in shell archiving.
> Presently, whenever there is an ERROR during archiving via shell, it gets
> bumped up to FATAL because the archiver operates at the bottom of the
> exception stack.  Consequently, there's no need to worry about managing
> memory contexts to ensure that palloc'd memory is cleared up after an
> error.  With the attached patch, we no longer call the archiving callback
> while we're at the bottom of the exception stack, so ERRORs no longer get
> bumped up to FATALs, and any palloc'd memory won't be freed.
>
> I see two main options for dealing with this.  One option is to simply have
> shell_archive (and any other archive modules out there) maintain its own
> memory context like basic_archive does.  This ends up requiring a whole lot
> of duplicate code between the two built-in modules, though.  Another option
> is to have the archiver manage a memory context that it resets after every
> invocation of the archiving callback, ERROR or not.

I think passing in a short-lived memory context is a lot nicer to deal with.


> This has the advantage of avoiding code duplication and simplifying things
> for the built-in modules, but any external modules that rely on palloc'd
> state being long-lived would need to be adjusted to manage their own
> long-lived context.  (This would need to be appropriately documented.)

Alternatively we could provide a longer-lived memory context in
ArchiveModuleState, set up by the genric infrastructure. That context would
obviously still need to be explicitly utilized by a module, but no duplicated
setup code would be required.




>  /*
>   * check_archive_directory
>   *
> @@ -172,67 +147,19 @@ basic_archive_configured(ArchiveModuleState *state)
>  static bool
>  basic_archive_file(ArchiveModuleState *state, const char *file, const char 
> *path)
>  {
> ...
> +     PG_TRY();
> +     {
> +             /* Archive the file! */
> +             basic_archive_file_internal(file, path);
> +     }
> +     PG_CATCH();
>       {
> -             /* Since not using PG_TRY, must reset error stack by hand */
> -             error_context_stack = NULL;
> -
> -             /* Prevent interrupts while cleaning up */
> -             HOLD_INTERRUPTS();
> -
> -             /* Report the error and clear ErrorContext for next time */
> -             EmitErrorReport();
> -             FlushErrorState();
> -
>               /* Close any files left open by copy_file() or compare_files() 
> */
> -             AtEOSubXact_Files(false, InvalidSubTransactionId, 
> InvalidSubTransactionId);
> -
> -             /* Reset our memory context and switch back to the original one 
> */
> -             MemoryContextSwitchTo(oldcontext);
> -             MemoryContextReset(basic_archive_context);
> -
> -             /* Remove our exception handler */
> -             PG_exception_stack = NULL;
> +             AtEOXact_Files(false);
>
> -             /* Now we can allow interrupts again */
> -             RESUME_INTERRUPTS();
> -
> -             /* Report failure so that the archiver retries this file */
> -             return false;
> +             PG_RE_THROW();
>       }

I think we should just have the AtEOXact_Files() in pgarch.c, then no
PG_TRY/CATCH is needed here. At the moment I think just about every possible
use of an archive modules would require using files, so there doesn't seem
much of a reason to not handle it in pgarch.c.

I'd probably reset a few other subsystems at the same time (there's probably
more):
- disable_all_timeouts()
- LWLockReleaseAll()
- ConditionVariableCancelSleep()
- pgstat_report_wait_end()
- ReleaseAuxProcessResources()


> @@ -511,7 +519,58 @@ pgarch_archiveXlog(char *xlog)
>       snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
>       set_ps_display(activitymsg);
>
> -     ret = ArchiveCallbacks->archive_file_cb(archive_module_state, xlog, 
> pathname);
> +     oldcontext = MemoryContextSwitchTo(archive_context);
> +
> +     /*
> +      * Since the archiver operates at the bottom of the exception stack,
> +      * ERRORs turn into FATALs and cause the archiver process to restart.
> +      * However, using ereport(ERROR, ...) when there are problems is easy to
> +      * code and maintain.  Therefore, we create our own exception handler to
> +      * catch ERRORs and return false instead of restarting the archiver
> +      * whenever there is a failure.
> +      */
> +     if (sigsetjmp(local_sigjmp_buf, 1) != 0)
> +     {
> +             /* Since not using PG_TRY, must reset error stack by hand */
> +             error_context_stack = NULL;
> +
> +             /* Prevent interrupts while cleaning up */
> +             HOLD_INTERRUPTS();
> +
> +             /* Report the error and clear ErrorContext for next time */
> +             EmitErrorReport();
> +             MemoryContextSwitchTo(oldcontext);
> +             FlushErrorState();
> +
> +             /* Flush any leaked data */
> +             MemoryContextReset(archive_context);
> +
> +             /* Remove our exception handler */
> +             PG_exception_stack = NULL;
> +
> +             /* Now we can allow interrupts again */
> +             RESUME_INTERRUPTS();
> +
> +             /* Report failure so that the archiver retries this file */
> +             ret = false;
> +     }
> +     else
> +     {
> +             /* Enable our exception handler */
> +             PG_exception_stack = &local_sigjmp_buf;
> +
> +             /* Archive the file! */
> +             ret = ArchiveCallbacks->archive_file_cb(archive_module_state,
> +                                                                             
>                 xlog, pathname);
> +
> +             /* Remove our exception handler */
> +             PG_exception_stack = NULL;
> +
> +             /* Reset our memory context and switch back to the original one 
> */
> +             MemoryContextSwitchTo(oldcontext);
> +             MemoryContextReset(archive_context);
> +     }

It could be worth setting up an errcontext providing the module and file
that's being processed. I personally find that at least as important as
setting up a ps string detailing the log file... But I guess that could be a
separate patch.


It'd be nice to add a comment explaining why pgarch_archiveXlog() is the right
place to handle errors.

Greetings,

Andres Freund


Reply via email to