Robert Haas wrote: > On Mon, Aug 14, 2017 at 1:12 PM, Alvaro Herrera > <alvhe...@2ndquadrant.com> wrote: > > Yeah, the problem that lwlocks aren't released is because the launcher > > is not in a transaction at that point, so AbortCurrentTransaction() > > doesn't release locks like it normally would. The simplest fix (in the > > attached 0001 patch) is to add a LWLockReleaseAll() call to the jmp > > block, though I wonder if there should be some other cleanup functions > > called from there, or whether perhaps it'd be a better strategy to have > > the launcher run in a transaction at all times. > > Well, if you're going to do aborts outside of a transaction, just > adding an LWLockReleaseAll() isn't really sufficient. You need to > look at something like CheckpointerMain() and figure out which of > those push-ups are needed here as well. Probably at least > ConditionVariableCancelSleep(), pgstat_report_wait_end(), > AbortBufferIO(), and UnlockBuffers() -- quite possibly some of those > other AtEOXact calls as well.
Agreed. I think a saner answer is to create a single function that does it all, and use it in all places that need it, rather than keep adding more copies of the same thing. Attached revised version of patch does things that way; I think it's good cleanup. (I had to make the ResourceOwnerRelease call be conditional on there being a resowner; seems okay to me.) I put the new cleanup routine in xact.c, which is not exactly the perfect place (had to add access/xact.h to a couple of files), but the fact that the new routine is a cut-down version of another routine in xact.c makes it clear to me that that's where it belongs. I first put it in bootstrap, alongside AuxiliaryProcessMain, but after a while that seemed wrong. > > The other problem is that there's no attempt to handle a failed DSA > > creation/attachment. The second patch just adds a PG_TRY block that > > sets a flag not to try the DSA calls again if the first one fails. It > > throws a single ERROR line, then autovacuum continues without > > workitem support. > > Yeah, and the other question -- which Thomas asked before you > originally committed originally, and which I just now asked again is > "Why in the world are you using DSA for this at all?". There are > serious problems with that which both he and I have pointed out, and > you haven't explained why it's a good idea at any point, AFAICT. The main reason is that I envision that the workitem stuff will be used for other things in the future than just brin summarization, and it seemed a lame idea to just use a fixed-size memory area in the standard autovacuum shared memory area. I think unbounded growth is of course going to be bad. The current coding doesn't allow for any growth beyond the initial fixed size, but it's easier to extend the system from the current point rather than wholly changing shared memory usage pattern while at it. I thought I *had* responded to Thomas in that thread, BTW. > Among those problems: > > 1. It doesn't work if dynamic_shared_memory_type=none. That's OK for > an optional subsystem, but autovacuum is not very optional. Autovacuum as a whole continues to work if there's no dynamic shared memory; it's just the workitems stuff that stops working if there's no DSA. (After fixing the bug that makes it crash in the case of dynamic_shared_memory_type=none, of course). > 2. It allows unbounded bloat if there's no limit on the number of work > items and is pointless is there is since you could then just use the > main shared memory segment. Yeah, there are probably better strategies than just growing the memory area every time another entry is needed. Work-items as a whole need a lot more development from the current point. > I really think you should respond to those concerns, not just push a > minimal fix. We'll just continue to develop things from the current point. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 7a6c80ffb7c610640d9cede8535ceb79a55ddb8f Mon Sep 17 00:00:00 2001 From: Alvaro Herrera <alvhe...@alvh.no-ip.org> Date: Mon, 14 Aug 2017 13:54:57 -0300 Subject: [PATCH v2 1/2] Release lwlocks in autovacuum launcher error handling path For regular processes, lwlock release is handling via AbortCurrentTransaction(), which autovacuum already does. However, autovacuum launcher sometimes obtains lock outside of any transaction, in which case AbortCurrentTransaction is a no-op. Continuing after error recovery would block if we tried to obtain an lwlock that we failed to release. Reported-by: Robert Haas Discussion: https://postgr.es/m/ca+tgmobqvbz4k_+rsmim9herkpy3vs5xnbkl95gsenwijzp...@mail.gmail.com --- src/backend/access/transam/xact.c | 35 +++++++++++++++++++++++++++++++++++ src/backend/bootstrap/bootstrap.c | 4 +--- src/backend/postmaster/autovacuum.c | 7 ++++++- src/backend/postmaster/bgwriter.c | 21 +++------------------ src/backend/postmaster/checkpointer.c | 23 +++-------------------- src/backend/postmaster/walwriter.c | 22 +++------------------- src/backend/replication/walsender.c | 7 +------ src/include/access/xact.h | 1 + 8 files changed, 53 insertions(+), 67 deletions(-) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index a8fbd043ae..34d15c75d6 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -3136,6 +3136,41 @@ AbortCurrentTransaction(void) } /* + * AbortInAuxiliaryProcess + * + * Cleanup and release resources during abort processing in auxiliary + * processes. + */ +void +AbortInAuxiliaryProcess(void) +{ + /* + * These operations are really just a minimal subset of + * AbortTransaction(). We don't have very many resources to worry about + * in auxiliary processes, but we do have LWLocks, buffers, and temp files. + */ + LWLockReleaseAll(); + ConditionVariableCancelSleep(); + pgstat_report_wait_end(); + AbortBufferIO(); + UnlockBuffers(); + + /* buffer pins are released here: */ + if (CurrentResourceOwner) + { + ResourceOwnerRelease(CurrentResourceOwner, + RESOURCE_RELEASE_BEFORE_LOCKS, + false, true); + /* we needn't bother with the other ResourceOwnerRelease phases */ + } + + AtEOXact_Buffers(false); + AtEOXact_SMgr(); + AtEOXact_Files(); + AtEOXact_HashTables(false); +} + +/* * PreventTransactionChain * * This routine is to be called by statements that must not run inside diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c index b3f0b3cc92..1d0b997234 100644 --- a/src/backend/bootstrap/bootstrap.c +++ b/src/backend/bootstrap/bootstrap.c @@ -543,9 +543,7 @@ bootstrap_signals(void) static void ShutdownAuxiliaryProcess(int code, Datum arg) { - LWLockReleaseAll(); - ConditionVariableCancelSleep(); - pgstat_report_wait_end(); + AbortInAuxiliaryProcess(); } /* ---------------------------------------------------------------- diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 00b1e823af..9a42a766b3 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -521,8 +521,13 @@ AutoVacLauncherMain(int argc, char *argv[]) /* Report the error to the server log */ EmitErrorReport(); - /* Abort the current transaction in order to recover */ + /* + * Aborting the current transaction releases process resources; do + * generic abort processing for the cases were the launcher is not in a + * transaction. + */ AbortCurrentTransaction(); + AbortInAuxiliaryProcess(); /* * Now return to normal top-level context and clear ErrorContext for diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c index 9ad74ee977..2f6f9add61 100644 --- a/src/backend/postmaster/bgwriter.c +++ b/src/backend/postmaster/bgwriter.c @@ -38,6 +38,7 @@ #include <sys/time.h> #include <unistd.h> +#include "access/xact.h" #include "access/xlog.h" #include "access/xlog_internal.h" #include "libpq/pqsignal.h" @@ -182,24 +183,8 @@ BackgroundWriterMain(void) /* Report the error to the server log */ EmitErrorReport(); - /* - * These operations are really just a minimal subset of - * AbortTransaction(). We don't have very many resources to worry - * about in bgwriter, but we do have LWLocks, buffers, and temp files. - */ - LWLockReleaseAll(); - ConditionVariableCancelSleep(); - AbortBufferIO(); - UnlockBuffers(); - /* buffer pins are released here: */ - ResourceOwnerRelease(CurrentResourceOwner, - RESOURCE_RELEASE_BEFORE_LOCKS, - false, true); - /* we needn't bother with the other ResourceOwnerRelease phases */ - AtEOXact_Buffers(false); - AtEOXact_SMgr(); - AtEOXact_Files(); - AtEOXact_HashTables(false); + /* cleanup and release resources */ + AbortInAuxiliaryProcess(); /* * Now return to normal top-level context and clear ErrorContext for diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index e48ebd557f..e7291c38ef 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -41,6 +41,7 @@ #include <time.h> #include <unistd.h> +#include "access/xact.h" #include "access/xlog.h" #include "access/xlog_internal.h" #include "libpq/pqsignal.h" @@ -264,26 +265,8 @@ CheckpointerMain(void) /* Report the error to the server log */ EmitErrorReport(); - /* - * These operations are really just a minimal subset of - * AbortTransaction(). We don't have very many resources to worry - * about in checkpointer, but we do have LWLocks, buffers, and temp - * files. - */ - LWLockReleaseAll(); - ConditionVariableCancelSleep(); - pgstat_report_wait_end(); - AbortBufferIO(); - UnlockBuffers(); - /* buffer pins are released here: */ - ResourceOwnerRelease(CurrentResourceOwner, - RESOURCE_RELEASE_BEFORE_LOCKS, - false, true); - /* we needn't bother with the other ResourceOwnerRelease phases */ - AtEOXact_Buffers(false); - AtEOXact_SMgr(); - AtEOXact_Files(); - AtEOXact_HashTables(false); + /* cleanup and release resources */ + AbortInAuxiliaryProcess(); /* Warn any waiting backends that the checkpoint failed. */ if (ckpt_active) diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c index 7b89e02428..1498d3a53b 100644 --- a/src/backend/postmaster/walwriter.c +++ b/src/backend/postmaster/walwriter.c @@ -44,6 +44,7 @@ #include <signal.h> #include <unistd.h> +#include "access/xact.h" #include "access/xlog.h" #include "libpq/pqsignal.h" #include "miscadmin.h" @@ -162,25 +163,8 @@ WalWriterMain(void) /* Report the error to the server log */ EmitErrorReport(); - /* - * These operations are really just a minimal subset of - * AbortTransaction(). We don't have very many resources to worry - * about in walwriter, but we do have LWLocks, and perhaps buffers? - */ - LWLockReleaseAll(); - ConditionVariableCancelSleep(); - pgstat_report_wait_end(); - AbortBufferIO(); - UnlockBuffers(); - /* buffer pins are released here: */ - ResourceOwnerRelease(CurrentResourceOwner, - RESOURCE_RELEASE_BEFORE_LOCKS, - false, true); - /* we needn't bother with the other ResourceOwnerRelease phases */ - AtEOXact_Buffers(false); - AtEOXact_SMgr(); - AtEOXact_Files(); - AtEOXact_HashTables(false); + /* cleanup and release resources */ + AbortInAuxiliaryProcess(); /* * Now return to normal top-level context and clear ErrorContext for diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 03e1cf44de..1a62a174b0 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -290,9 +290,7 @@ InitWalSender(void) void WalSndErrorCleanup(void) { - LWLockReleaseAll(); - ConditionVariableCancelSleep(); - pgstat_report_wait_end(); + AbortInAuxiliaryProcess(); if (sendFile >= 0) { @@ -300,9 +298,6 @@ WalSndErrorCleanup(void) sendFile = -1; } - if (MyReplicationSlot != NULL) - ReplicationSlotRelease(); - ReplicationSlotCleanup(); replication_active = false; diff --git a/src/include/access/xact.h b/src/include/access/xact.h index ad5aad96df..4f8c6434e1 100644 --- a/src/include/access/xact.h +++ b/src/include/access/xact.h @@ -348,6 +348,7 @@ extern void ForceSyncCommit(void); extern void StartTransactionCommand(void); extern void CommitTransactionCommand(void); extern void AbortCurrentTransaction(void); +extern void AbortInAuxiliaryProcess(void); extern void BeginTransactionBlock(void); extern bool EndTransactionBlock(void); extern bool PrepareTransactionBlock(char *gid); -- 2.11.0
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers