Re: Inval reliability, especially for inplace updates

Noah Misch Fri, 01 Nov 2024 16:38:49 -0700

On Thu, Oct 31, 2024 at 09:20:52PM -0700, Noah Misch wrote:
> Here, one of the autovacuum workers had the guilty stack trace, appearing at
> the end of this message.  heap_inplace_update_and_unlock() calls
> CacheInvalidateHeapTupleInplace() while holding BUFFER_LOCK_EXCLUSIVE on a
> buffer of pg_class.  CacheInvalidateHeapTupleInplace() may call
> CatalogCacheInitializeCache(), which opens the cache's rel.  If there's not a
> valid relcache entry for the catcache's rel, we scan pg_class to make a valid
> relcache entry.  The ensuing hang makes sense.
> 
> Tomorrow, I'll think more about fixes.  Two that might work:
> 
> 1. Call CacheInvalidateHeapTupleInplace() before locking the buffer.  Each
>    time we need to re-find the tuple, discard the previous try's inplace
>    invals and redo CacheInvalidateHeapTupleInplace().  That's because
>    concurrent activity may have changed cache key fields like relname.


Attached.  With this, I got no hangs in 1.9h of your test procedure.  Without
the patch, I got fourteen hangs in the same amount of time.

> 2. Add some function that we call before locking the buffer.  Its charter is
>    to ensure PrepareToInvalidateCacheTuple() won't have to call
>    CatalogCacheInitializeCache().

The existing InitCatalogCachePhase2() could satisfy that charter.  I liked
this a little less than (1), for two reasons.  First, confirming that it
avoids deadlocks requires thinking through more of the catcache and relcache
procedures.  Second, InitCatalogCachePhase2() would init unnecessary caches.

>    I think nothing resets catcache to the
>    extent that CatalogCacheInitializeCache() must happen again, so this should
>    suffice regardless of concurrent sinval traffic, debug_discard_caches, etc.

The comment at CatalogCacheFlushCatalog() confirms that.

> What else is worth considering?  Any preferences among those?

3. Let a process take LW_SHARED if it already holds LW_EXCLUSIVE.

That would change outcomes well beyond inplace update, and it feels like a
bandage for doing things in a suboptimal order.  (1) or (2) would be an
improvement even if we did (3), since they would reduce I/O done while holding
BUFFER_LOCK_EXCLUSIVE.


This was a near miss to having a worst-in-years regression in a minor release,
so I'm proposing this sequence:

- Revert from non-master branches commits 8e7e672 (inplace180, "WAL-log
  inplace update before revealing it to other sessions.") and 243e9b4
  (inplace160, "For inplace update, send nontransactional invalidations.").

- Back-patch inplace230-index_update_stats-io-before-buflock to harden commit
  a07e03f (inplace110, "Fix data loss at inplace update after heap_update()").

- Push attached inplace240 to master.

- Make the commitfest entry a request for review of v17 inplace160+inplace240.
  After some amount of additional review and master bake time, the reverted
  patches would return to non-master branches.

If someone agrees or if nobody objects by 2024-11-02T15:00+0000, I'll make it
so.  That's not much time, but I want to minimize buildfarm members hanging
and maximize inplace230 bake time before the release wrap.


Less urgently, we likely should add defense in depth against adding
rarely-reached LWLock deadlocks.  Perhaps one or both of:

- Assert-fail if entering a conditional CatalogCacheInitializeCache() caller
  (e.g. SearchCatCacheInternal()) while holding a catalog buffer lock.

- elog(ERROR) instead of sleeping in LWLockAcquire() if the current process is
  the current lock holder.

Thanks,
nm

Author:     Noah Misch <n...@leadboat.com>
Commit:     Noah Misch <n...@leadboat.com>

    Fix inplace update buffer self-deadlock.
    
    A CacheInvalidateHeapTuple* callee might call
    CatalogCacheInitializeCache(), which needs a relcache entry.  Acquiring
    a valid relcache entry might scan pg_class.  Hence, to prevent
    undetected LWLock self-deadlock, CacheInvalidateHeapTuple* callers must
    not hold BUFFER_LOCK_EXCLUSIVE on buffers of pg_class.  Move the
    CacheInvalidateHeapTupleInplace() before the BUFFER_LOCK_EXCLUSIVE.
    Commit 243e9b40f1b2dd09d6e5bf91ebf6e822a2cd3704 introduced this
    regression.  No back-patch, since I've reverted the culprit from
    non-master branches.
    
    Reported by Alexander Lakhin.  Reviewed by FIXME.
    
    Discussion: 
https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec41145...@gmail.com

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1748eaf..9a31cdc 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6214,6 +6214,17 @@ heap_inplace_lock(Relation relation,
 
        Assert(BufferIsValid(buffer));
 
+       /*
+        * Construct shared cache inval if necessary.  Because we pass a tuple
+        * version without our own inplace changes or inplace changes other
+        * sessions complete while we wait for locks, inplace update mustn't
+        * change catcache lookup keys.  But we aren't bothering with index
+        * updates either, so that's true a fortiori.  After LockBuffer(), it
+        * would be too late, because this might reach a
+        * CatalogCacheInitializeCache() that locks "buffer".
+        */
+       CacheInvalidateHeapTupleInplace(relation, oldtup_ptr, NULL);
+
        LockTuple(relation, &oldtup.t_self, InplaceUpdateTupleLock);
        LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 
@@ -6309,6 +6320,7 @@ heap_inplace_lock(Relation relation,
        if (!ret)
        {
                UnlockTuple(relation, &oldtup.t_self, InplaceUpdateTupleLock);
+               ForgetInplace_Inval();
                InvalidateCatalogSnapshot();
        }
        return ret;
@@ -6345,14 +6357,6 @@ heap_inplace_update_and_unlock(Relation relation,
        dst = (char *) htup + htup->t_hoff;
        src = (char *) tuple->t_data + tuple->t_data->t_hoff;
 
-       /*
-        * Construct shared cache inval if necessary.  Note that because we only
-        * pass the new version of the tuple, this mustn't be used for any
-        * operations that could change catcache lookup keys.  But we aren't
-        * bothering with index updates either, so that's true a fortiori.
-        */
-       CacheInvalidateHeapTupleInplace(relation, tuple, NULL);
-
        /* Like RecordTransactionCommit(), log only if needed */
        if (XLogStandbyInfoActive())
                nmsgs = inplaceGetInvalidationMessages(&invalMessages,
@@ -6481,6 +6485,7 @@ heap_inplace_unlock(Relation relation,
 {
        LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
        UnlockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
+       ForgetInplace_Inval();
 }
 
 #define                FRM_NOOP                                0x0001
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 986850c..fc972ed 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -1202,6 +1202,18 @@ AtInplace_Inval(void)
 }
 
 /*
+ * ForgetInplace_Inval
+ *             Alternative to PreInplace_Inval()+AtInplace_Inval(): discard 
queued-up
+ *             invalidations.  This lets inplace update enumerate invalidations
+ *             optimistically, before locking the buffer.
+ */
+void
+ForgetInplace_Inval(void)
+{
+       inplaceInvalInfo = NULL;
+}
+
+/*
  * AtEOSubXact_Inval
  *             Process queued-up invalidation messages at end of 
subtransaction.
  *
diff --git a/src/include/utils/inval.h b/src/include/utils/inval.h
index 3390e7a..299cd75 100644
--- a/src/include/utils/inval.h
+++ b/src/include/utils/inval.h
@@ -30,6 +30,7 @@ extern void AtEOXact_Inval(bool isCommit);
 
 extern void PreInplace_Inval(void);
 extern void AtInplace_Inval(void);
+extern void ForgetInplace_Inval(void);
 
 extern void AtEOSubXact_Inval(bool isCommit);

Re: Inval reliability, especially for inplace updates

Reply via email to