Robert Haas wrote:

> Actually, now that you mention it, I think it *is* broken already, or
> more to the point, if you configure that value, the autovacuum
> launcher hangs up, because of the AutovacuumWorkItem stuff that Alvaro
> added.  When I just tested it, the AV launcher somehow ended up
> waiting for AutovacuumLock and I had to SIGQUIT the server to shut it
> down.  That's actually not really entirely the fault of
> dynamic_shared_memory_type = none, though, because the code in
> autovacuum.c does this:
> 
>                 AutoVacuumDSA = dsa_create(AutovacuumLock->tranche);
>                 /* make sure it doesn't go away even if we do */
>                 dsa_pin(AutoVacuumDSA);
>                 dsa_pin_mapping(AutoVacuumDSA);
> 
> Now, that's actually really broken because if dsa_create() throws an
> error of any kind, you're going to have already assigned the value to
> AutoVacuumDSA, but you will not have pinned the DSA or the DSA
> mapping.  There's evidently some additional bug here because I'd sorta
> expect this code to just go into an infinite loop in this case,
> failing over and over trying to reattach the segment, but evidently
> something even worse happening - perhaps the ERROR isn't releasing
> AutovacuumLock.

Yeah, the problem that lwlocks aren't released is because the launcher
is not in a transaction at that point, so AbortCurrentTransaction()
doesn't release locks like it normally would.  The simplest fix (in the
attached 0001 patch) is to add a LWLockReleaseAll() call to the jmp
block, though I wonder if there should be some other cleanup functions
called from there, or whether perhaps it'd be a better strategy to have
the launcher run in a transaction at all times.

The other problem is that there's no attempt to handle a failed DSA
creation/attachment.  The second patch just adds a PG_TRY block that
sets a flag not to try the DSA calls again if the first one fails.  It
throws a single ERROR line, then autovacuum continues without workitem
support.

I intend to give these patches further thought before pushing anything,
will update this thread no later than tomorrow 19:00 UTC-0300.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From d761ef2c2fef336fae908d802237715ab38cdf2c Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvhe...@alvh.no-ip.org>
Date: Mon, 14 Aug 2017 13:54:57 -0300
Subject: [PATCH 1/2] Release lwlocks in autovacuum launcher error handling
 path

For regular processes, lwlock release is handling via
AbortCurrentTransaction(), which autovacuum already does.  However,
autovacuum launcher sometimes obtains lock outside of any transaction,
in which case AbortCurrentTransaction is a no-op.  Continuing after
error recovery would block if we tried to obtain an lwlock that we
failed to release.

Reported-by: Robert Haas
Discussion: 
https://postgr.es/m/ca+tgmobqvbz4k_+rsmim9herkpy3vs5xnbkl95gsenwijzp...@mail.gmail.com
---
 src/backend/postmaster/autovacuum.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/src/backend/postmaster/autovacuum.c 
b/src/backend/postmaster/autovacuum.c
index 00b1e823af..ed1a288475 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -525,6 +525,12 @@ AutoVacLauncherMain(int argc, char *argv[])
                AbortCurrentTransaction();
 
                /*
+                * Release lwlocks.  Normally done as part of 
AbortCurrentTransaction,
+                * but launcher is not in a transaction at all times.
+                */
+               LWLockReleaseAll();
+
+               /*
                 * Now return to normal top-level context and clear 
ErrorContext for
                 * next time.
                 */
-- 
2.11.0

>From a56869c1da23f56d747d2d05695879d6659cb8ba Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvhe...@alvh.no-ip.org>
Date: Mon, 14 Aug 2017 13:57:50 -0300
Subject: [PATCH 2/2] Don't repeatedly try to attach to shared memory

If the first attempt fails, give up on it.

Reported-by: Robert Haas
Discussion: 
https://postgr.es/m/ca+tgmobqvbz4k_+rsmim9herkpy3vs5xnbkl95gsenwijzp...@mail.gmail.com
---
 src/backend/postmaster/autovacuum.c | 45 ++++++++++++++++++++++++-------------
 1 file changed, 30 insertions(+), 15 deletions(-)

diff --git a/src/backend/postmaster/autovacuum.c 
b/src/backend/postmaster/autovacuum.c
index ed1a288475..ac47b1ce1b 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -255,6 +255,7 @@ typedef enum
  * av_startingWorker pointer to WorkerInfo currently being started (cleared by
  *                                     the worker itself as soon as it's up 
and running)
  * av_dsa_handle       handle for allocatable shared memory
+ * av_dsa_failed       already tried to allocate shared memory and failed
  *
  * This struct is protected by AutovacuumLock, except for av_signal and parts
  * of the worker list (see above).  av_dsa_handle is readable unlocked.
@@ -268,6 +269,7 @@ typedef struct
        dlist_head      av_runningWorkers;
        WorkerInfo      av_startingWorker;
        dsa_handle      av_dsa_handle;
+       bool            av_dsa_failed;
        dsa_pointer av_workitems;
 } AutoVacuumShmemStruct;
 
@@ -621,22 +623,35 @@ AutoVacLauncherMain(int argc, char *argv[])
         * already exist as created by a previous launcher; and we may even be
         * already attached to it, if we're here after longjmp'ing above.
         */
-       if (!AutoVacuumShmem->av_dsa_handle)
+       if (!AutoVacuumShmem->av_dsa_failed)
        {
-               LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
-               AutoVacuumDSA = dsa_create(AutovacuumLock->tranche);
-               /* make sure it doesn't go away even if we do */
-               dsa_pin(AutoVacuumDSA);
-               dsa_pin_mapping(AutoVacuumDSA);
-               AutoVacuumShmem->av_dsa_handle = dsa_get_handle(AutoVacuumDSA);
-               /* delay array allocation until first request */
-               AutoVacuumShmem->av_workitems = InvalidDsaPointer;
-               LWLockRelease(AutovacuumLock);
-       }
-       else if (AutoVacuumDSA == NULL)
-       {
-               AutoVacuumDSA = dsa_attach(AutoVacuumShmem->av_dsa_handle);
-               dsa_pin_mapping(AutoVacuumDSA);
+               PG_TRY();
+               {
+                       if (!AutoVacuumShmem->av_dsa_handle)
+                       {
+                               LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
+                               AutoVacuumDSA = 
dsa_create(AutovacuumLock->tranche);
+                               /* make sure it doesn't go away even if we do */
+                               dsa_pin(AutoVacuumDSA);
+                               dsa_pin_mapping(AutoVacuumDSA);
+                               AutoVacuumShmem->av_dsa_handle = 
dsa_get_handle(AutoVacuumDSA);
+
+                               /* delay array allocation until first request */
+                               AutoVacuumShmem->av_workitems = 
InvalidDsaPointer;
+                               LWLockRelease(AutovacuumLock);
+                       }
+                       else if (AutoVacuumDSA == NULL)
+                       {
+                               AutoVacuumDSA = 
dsa_attach(AutoVacuumShmem->av_dsa_handle);
+                               dsa_pin_mapping(AutoVacuumDSA);
+                       }
+               }
+               PG_CATCH();
+               {
+                       AutoVacuumShmem->av_dsa_failed = true;
+                       PG_RE_THROW();
+               }
+               PG_END_TRY();
        }
 
        /* loop until shutdown request */
-- 
2.11.0

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to