On Wed, May 22, 2024 at 05:05:48PM -0700, Noah Misch wrote:
> https://postgr.es/m/20240512232923.aa.nmi...@google.com wrote:
> > Separable, nontrivial things not fixed in the attached patch stack:
> > 
> > - Inplace update uses transactional CacheInvalidateHeapTuple().  ROLLBACK of
> >   CREATE INDEX wrongly discards the inval, leading to the relhasindex=t loss
> >   still seen in inplace-inval.spec.  CacheInvalidateRelmap() does this 
> > right.
> 
> I plan to fix that like CacheInvalidateRelmap(): send the inval immediately,
> inside the critical section.  Send it in heap_xlog_inplace(), too.

> a. Within logical decoding, cease processing invalidations for inplace

I'm attaching the implementation.  This applies atop the v3 patch stack from
https://postgr.es/m/20240614003549.c2.nmi...@google.com, but the threads are
mostly orthogonal and intended for independent review.  Translating a tuple
into inval messages uses more infrastructure than relmapper, which needs just
a database ID.  Hence, this ended up more like a miniature of inval.c's
participation in the transaction commit sequence.

I waffled on whether to back-patch inplace150-inval-durability-atcommit.  The
consequences of that bug are plenty bad, but reaching them requires an error
between TransactionIdCommitTree() and AtEOXact_Inval().  I've not heard
reports of that, and I don't have a recipe for making it happen on demand.
For now, I'm leaning toward back-patch.  The main risk would be me overlooking
an LWLock deadlock scenario reachable from the new, earlier RelCacheInitLock
timing.  Alternatives for RelCacheInitLock:

- RelCacheInitLock before PreCommit_Notify(), because notify concurrency
  matters more than init file concurrency.  I chose this.
- RelCacheInitLock after PreCommit_Notify(), because PreCommit_Notify() uses a
  heavyweight lock, giving it less risk of undetected deadlock.
- Replace RelCacheInitLock with a heavyweight lock, and keep it before
  PreCommit_Notify().
- Fold PreCommit_Inval() back into AtCommit_Inval(), accepting that EIO in
  unlink_initfile() will PANIC.

Opinions on that?

The patch changes xl_heap_inplace of XLOG_HEAP_INPLACE.  For back branches, we
could choose between:

- Same change, no WAL version bump.  Standby must update before primary.  This
  is best long-term, but the transition is more disruptive.  I'm leaning
  toward this one, but the second option isn't bad:

- heap_xlog_inplace() could set the shared-inval-queue overflow signal on
  every backend.  This is more wasteful, but inplace updates might be rare
  enough (~once per VACUUM) to make it tolerable.

- Use LogStandbyInvalidations() just after XLOG_HEAP_INPLACE.  This isn't
  correct if one ends recovery between the two records, but you'd need to be
  unlucky to notice.  Noticing would need a procedure like the following.  A
  hot standby backend populates a relcache entry, then does DDL on the rel
  after recovery ends.

Future cleanup work could eliminate LogStandbyInvalidations() and the case of
!markXidCommitted && nmsgs != 0.  Currently, the src/test/regress suite still
reaches that case:

- AlterDomainDropConstraint() queues an inval even if !found; it can stop
  that.

- ON COMMIT DELETE ROWS nontransactionally rebuilds an index, which sends a
  relcache inval.  The point of that inval is, I think, to force access
  methods like btree and hash to reload the metapage copy that they store in
  rd_amcache.  Since no assigned XID implies no changes to the temp index, the
  no-XID case could simply skip the index rebuild.  (temp.sql reaches this
  with a read-only transaction that selects from an ON COMMIT DELETE ROWS
  table.  Realistic usage will tend not to do that.)  ON COMMIT DELETE ROWS
  has another preexisting problem for indexes, mentioned in a code comment.

Thanks,
nm
Author:     Noah Misch <n...@leadboat.com>
Commit:     Noah Misch <n...@leadboat.com>

    Fix comments in and referring to AtEOXact_RelationCache().
    
    The first point has prompted this change.  We can fix a bug by calling
    AtEOXact_Inval(true) earlier, in the COMMIT critical section.  Remove
    comment text that would have required AtEOXact_RelationCache() to move
    with it.  Full list of defects fixed:
    
    - Commit 8de3e410faa06ab20ec1aa6d0abb0a2c040261ba (2014-02) made
      relcache.c defer rebuilds if !IsTransactionState().  That removed the
      functional implications of swapping the order of
      AtEOXact_RelationCache() and AtEOXact_Inval().  Even without that
      change, we'd have other layers of defense.  At COMMIT:
    
      - Code that opened rels already closed them.
      - AtEOXact_RelationCache() essentially sets some non-pointer fields
        and calls RelationDestroyRelation() to pfree() rels for which we're
        committing both a CREATE and a DROP.  None of that needs a
        transaction.
      - AtEOXact_Inval() doesn't locally-execute messages.  The next
        AcceptInvalidationMessages() does that.
    
      At ABORT:
    
      - resowner.c already closed all rels.
      - AtEOXact_RelationCache() essentially sets some non-pointer fields
        and calls RelationDestroyRelation() to pfree() rels for which we're
        aborting a CREATE.
    
    - Commit f884dca4959f64bd47e78102d1dadd2c77d49201 (2019-05) removed
      "forced index lists".
    
    - The header comment listed just a subset of the function's activities.
      Commit fdd965d074d46765c295223b119ca437dbcac973 (back-patched to v9.6)
      added in_progress_list cleanup.  Commit
      e5550d5fec66aa74caad1f79b79826ec64898688 (2014-04) added
      EOXactTupleDescArray cleanup.
    
    - When commit d5b31cc32b0762fa8920a9c1f70af2f82fb0aaa5 (2013-01)
      introduced the eoxact_list mechanism, the function ceased crashing
      when called before initialization.
    
    - While "current transaction created any relations" is the activity with
      the most code here, that appeared at the end of a paragraph that
      started with pre-8.1 behavior.
    
    Back-patch to v12 (all supported versions), the plan for calling
    AtEOXact_Inval(true) earlier.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240523000548.58.nmi...@google.com

diff --git a/src/backend/access/transam/xact.c 
b/src/backend/access/transam/xact.c
index 9bda1aa..ee389fc 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2369,11 +2369,10 @@ CommitTransaction(void)
        AtEOXact_RelationCache(true);
 
        /*
-        * Make catalog changes visible to all backends.  This has to happen 
after
-        * relcache references are dropped (see comments for
-        * AtEOXact_RelationCache), but before locks are released (if anyone is
-        * waiting for lock on a relation we've modified, we want them to know
-        * about the catalog change before they start using the relation).
+        * Make catalog changes visible to all backends.  This has to happen
+        * before locks are released (if anyone is waiting for lock on a 
relation
+        * we've modified, we want them to know about the catalog change before
+        * they start using the relation).
         */
        AtEOXact_Inval(true);
 
diff --git a/src/backend/utils/cache/relcache.c 
b/src/backend/utils/cache/relcache.c
index cbf9cf2..ed4a8eb 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3218,17 +3218,6 @@ AssertPendingSyncs_RelationCache(void)
  * AtEOXact_RelationCache
  *
  *     Clean up the relcache at main-transaction commit or abort.
- *
- * Note: this must be called *before* processing invalidation messages.
- * In the case of abort, we don't want to try to rebuild any invalidated
- * cache entries (since we can't safely do database accesses).  Therefore
- * we must reset refcnts before handling pending invalidations.
- *
- * As of PostgreSQL 8.1, relcache refcnts should get released by the
- * ResourceOwner mechanism.  This routine just does a debugging
- * cross-check that no pins remain.  However, we also need to do special
- * cleanup when the current transaction created any relations or made use
- * of forced index lists.
  */
 void
 AtEOXact_RelationCache(bool isCommit)
@@ -3245,8 +3234,9 @@ AtEOXact_RelationCache(bool isCommit)
        in_progress_list_len = 0;
 
        /*
-        * Unless the eoxact_list[] overflowed, we only need to examine the rels
-        * listed in it.  Otherwise fall back on a hash_seq_search scan.
+        * Cleanup rels created in the current transaction.  Unless the
+        * eoxact_list[] overflowed, we only need to examine the rels listed in
+        * it.  Otherwise fall back on a hash_seq_search scan.
         *
         * For simplicity, eoxact_list[] entries are not deleted till end of
         * top-level transaction, even though we could remove them at
@@ -3307,7 +3297,8 @@ AtEOXact_cleanup(Relation relation, bool isCommit)
 
        /*
         * The relcache entry's ref count should be back to its normal
-        * not-in-a-transaction state: 0 unless it's nailed in cache.
+        * not-in-a-transaction state: 0 unless it's nailed in cache.  The
+        * ResourceOwner mechanism handles that.
         *
         * In bootstrap mode, this is NOT true, so don't check it --- the
         * bootstrap code expects relations to stay open across start/commit
@@ -3982,10 +3973,7 @@ RelationAssumeNewRelfilelocator(Relation relation)
  *             This initializes the relation descriptor cache.  At the time
  *             that this is invoked, we can't do database access yet (mainly
  *             because the transaction subsystem is not up); all we are doing
- *             is making an empty cache hashtable.  This must be done before
- *             starting the initialization transaction, because otherwise
- *             AtEOXact_RelationCache would crash if that transaction aborts
- *             before we can get the relcache set up.
+ *             is making an empty cache hashtable.
  */
 
 #define INITRELCACHESIZE               400
Author:     Noah Misch <n...@leadboat.com>
Commit:     Noah Misch <n...@leadboat.com>

    Remove comment about xl_heap_inplace "AT END OF STRUCT".
    
    Commit 2c03216d831160bedd72d45f712601b6f7d03f1c moved the tuple data
    from there to the buffer-0 data.  Back-patch to v12 (all supported
    versions), the plan for the next change to this struct.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240523000548.58.nmi...@google.com

diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 22a1747..42736f3 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -425,7 +425,6 @@ typedef struct xl_heap_confirm
 typedef struct xl_heap_inplace
 {
        OffsetNumber offnum;            /* updated tuple's offset on page */
-       /* TUPLE DATA FOLLOWS AT END OF STRUCT */
 } xl_heap_inplace;
 
 #define SizeOfHeapInplace      (offsetof(xl_heap_inplace, offnum) + 
sizeof(OffsetNumber))
Author:     Noah Misch <n...@leadboat.com>
Commit:     Noah Misch <n...@leadboat.com>

    Move SendSharedInvalidMessages() into the COMMIT critical section.
    
    If a backend commits transactional DDL without making associated invals
    global, other backends may, for example, perform I/O on the wrong
    relfilenode.  This change doesn't raise or lower protection for inplace
    updates or ON COMMIT DELETE ROWS.  Back-patch to v12 (all supported
    versions).  PGXN has no AtEOXact_Inval() references, and there's no
    known use case for calling it from an extension.  Hence, drop it from
    all branches.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240523000548.58.nmi...@google.com

diff --git a/src/backend/access/transam/twophase.c 
b/src/backend/access/transam/twophase.c
index bf451d4..a142868 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1630,7 +1630,7 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
         *
         * Relcache init file invalidation requires processing both before and
         * after we send the SI messages, only when committing.  See
-        * AtEOXact_Inval().
+        * PreCommit_Inval() and AtCommit_Inval().
         */
        if (isCommit)
        {
diff --git a/src/backend/access/transam/xact.c 
b/src/backend/access/transam/xact.c
index ee389fc..6f0f80c 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1358,14 +1358,25 @@ RecordTransactionCommit(void)
 
                /*
                 * Transactions without an assigned xid can contain invalidation
-                * messages (e.g. explicit relcache invalidations or catcache
-                * invalidations for inplace updates); standbys need to process 
those.
-                * We can't emit a commit record without an xid, and we don't 
want to
-                * force assigning an xid, because that'd be problematic for 
e.g.
-                * vacuum.  Hence we emit a bespoke record for the 
invalidations. We
-                * don't want to use that in case a commit record is emitted, 
so they
-                * happen synchronously with commits (besides not wanting to 
emit more
-                * WAL records).
+                * messages.  Inplace updates do so, and standbys need to 
process
+                * those invals.  We can't emit a commit record without an xid, 
and we
+                * don't want to force assigning an xid, because that'd be 
problematic
+                * for e.g. vacuum.  Hence we emit a bespoke record for the
+                * invalidations. We don't want to use that in case a commit 
record is
+                * emitted, so they happen synchronously with commits (besides 
not
+                * wanting to emit more WAL records).
+                *
+                * XXX Every known use of this capability is a defect.  Since 
an XID
+                * isn't controlling visibility of the change that prompted 
invals,
+                * other sessions need the inval even if this transactions 
aborts.
+                *
+                * ON COMMIT DELETE ROWS does a nontransactional index_build(), 
which
+                * queues a relcache inval.  Standbys don't need those invals, 
but
+                * we've not done the work to withhold them.  ON COMMIT DELETE 
ROWS
+                * can't cope with an error in index_build(), which is more 
likely
+                * than an error here.  ON COMMIT DELETE ROWS would need a 
deeper
+                * redesign to become safe against arbitrary errors.  
Meanwhile, the
+                * damage from this is limited to temp tables of one session.
                 */
                if (nmsgs != 0)
                {
@@ -1373,6 +1384,9 @@ RecordTransactionCommit(void)
                                                                        
RelcacheInitFileInval);
                        wrote_xlog = true;      /* not strictly necessary */
                }
+               START_CRIT_SECTION();
+               AtCommit_Inval();
+               END_CRIT_SECTION();
 
                /*
                 * If we didn't create XLOG entries, we're done here; otherwise 
we
@@ -1510,13 +1524,15 @@ RecordTransactionCommit(void)
                        TransactionIdAsyncCommitTree(xid, nchildren, children, 
XactLastRecEnd);
        }
 
-       /*
-        * If we entered a commit critical section, leave it now, and let
-        * checkpoints proceed.
-        */
+       /* If we entered a commit critical section, finish it. */
        if (markXidCommitted)
        {
+               /* Let checkpoints proceed. */
                MyProc->delayChkptFlags &= ~DELAY_CHKPT_START;
+
+               /* Make catalog changes visible to all backends. */
+               AtCommit_Inval();
+
                END_CRIT_SECTION();
        }
 
@@ -2271,6 +2287,15 @@ CommitTransaction(void)
        AtEOXact_LargeObject(true);
 
        /*
+        * Unlink relcache init files as needed.  If unlinking, acquire
+        * RelCacheInitLock until after commit.  Do this before
+        * PreCommit_Notify(), since notify concurrency is more important than
+        * DDL/connection establishment concurrency.  Due to that LWLock, don't
+        * move this before I/O-heavy smgrDoPendingSyncs().
+        */
+       PreCommit_Inval();
+
+       /*
         * Insert notifications sent by NOTIFY commands into the queue.  This
         * should be late in the pre-commit sequence to minimize time spent
         * holding the notify-insertion lock.  However, this could result in
@@ -2368,14 +2393,6 @@ CommitTransaction(void)
        /* Clean up the relation cache */
        AtEOXact_RelationCache(true);
 
-       /*
-        * Make catalog changes visible to all backends.  This has to happen
-        * before locks are released (if anyone is waiting for lock on a 
relation
-        * we've modified, we want them to know about the catalog change before
-        * they start using the relation).
-        */
-       AtEOXact_Inval(true);
-
        AtEOXact_MultiXact();
 
        ResourceOwnerRelease(TopTransactionResourceOwner,
@@ -2906,7 +2923,7 @@ AbortTransaction(void)
                                                         false, true);
                AtEOXact_Buffers(false);
                AtEOXact_RelationCache(false);
-               AtEOXact_Inval(false);
+               AtAbort_Inval();
                AtEOXact_MultiXact();
                ResourceOwnerRelease(TopTransactionResourceOwner,
                                                         RESOURCE_RELEASE_LOCKS,
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 603aa41..0772314 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -612,6 +612,8 @@ PrepareInvalidationState(void)
 {
        TransInvalidationInfo *myInfo;
 
+       Assert(IsTransactionState());
+
        if (transInvalInfo != NULL &&
                transInvalInfo->my_level == GetCurrentTransactionNestLevel())
                return;
@@ -862,14 +864,14 @@ AcceptInvalidationMessages(void)
 void
 PostPrepare_Inval(void)
 {
-       AtEOXact_Inval(false);
+       AtAbort_Inval();
 }
 
 /*
  * xactGetCommittedInvalidationMessages() is called by
  * RecordTransactionCommit() to collect invalidation messages to add to the
  * commit record. This applies only to commit message types, never to
- * abort records. Must always run before AtEOXact_Inval(), since that
+ * abort records. Must always run before AtCommit_Inval(), since that
  * removes the data we need to see.
  *
  * Remember that this runs before we have officially committed, so we
@@ -908,7 +910,7 @@ 
xactGetCommittedInvalidationMessages(SharedInvalidationMessage **msgs,
         * Collect all the pending messages into a single contiguous array of
         * invalidation messages, to simplify what needs to happen while 
building
         * the commit WAL message.  Maintain the order that they would be
-        * processed in by AtEOXact_Inval(), to ensure emulated behaviour in 
redo
+        * processed in by AtCommit_Inval(), to ensure emulated behaviour in 
redo
         * is as similar as possible to original.  We want the same bugs, if 
any,
         * not new ones.
         */
@@ -955,7 +957,8 @@ 
xactGetCommittedInvalidationMessages(SharedInvalidationMessage **msgs,
  * only at end-of-xact.
  *
  * Relcache init file invalidation requires processing both
- * before and after we send the SI messages. See AtEOXact_Inval()
+ * before and after we send the SI messages. See PreCommit_Inval()
+ * and AtCommit_Inval().
  */
 void
 ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs,
@@ -998,63 +1001,99 @@ 
ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs,
 }
 
 /*
- * AtEOXact_Inval
- *             Process queued-up invalidation messages at end of main 
transaction.
- *
- * If isCommit, we must send out the messages in our PriorCmdInvalidMsgs list
- * to the shared invalidation message queue.  Note that these will be read
- * not only by other backends, but also by our own backend at the next
- * transaction start (via AcceptInvalidationMessages).  This means that
- * we can skip immediate local processing of anything that's still in
- * CurrentCmdInvalidMsgs, and just send that list out too.
- *
- * If not isCommit, we are aborting, and must locally process the messages
- * in PriorCmdInvalidMsgs.  No messages need be sent to other backends,
- * since they'll not have seen our changed tuples anyway.  We can forget
- * about CurrentCmdInvalidMsgs too, since those changes haven't touched
- * the caches yet.
+ * PreCommit_Inval
+ *             Process queued-up invalidation before commit of main 
transaction.
+ *
+ * Call this after the last pre-commit action that could queue transactional
+ * invalidations.  (Direct SendSharedInvalidMessages() remains fine.)  Call
+ * this after any expensive processing, because we may hold an LWLock from
+ * here till end of xact.
+ *
+ * Tasks belong here if they are safe even if the xact later aborts.
+ * Currently, this just unlinks a file, failure of which does abort the xact.
+ */
+void
+PreCommit_Inval(void)
+{
+       /* This is a separate function just to run before the critical section. 
*/
+       Assert(CritSectionCount == 0);
+
+       if (transInvalInfo == NULL)
+               return;
+
+       /* Must be at top of stack */
+       Assert(transInvalInfo->my_level == 1 && transInvalInfo->parent == NULL);
+
+       /*
+        * Relcache init file invalidation requires processing both before and
+        * after we send the SI messages.
+        */
+       if (transInvalInfo->RelcacheInitFileInval)
+               RelationCacheInitFilePreInvalidate();
+}
+
+/*
+ * AtCommit_Inval
+ *             Process queued-up invalidations after commit of main 
transaction.
+ *
+ * Call this after TransactionIdCommitTree(), so consumers of the
+ * invalidations find any new rows.  Call it before locks are released (if
+ * anyone is waiting for lock on a relation we've modified, we want them to
+ * know about the catalog change before they start using the relation).
+ *
+ * We must send out the messages in our PriorCmdInvalidMsgs list to the shared
+ * invalidation message queue.  Note that these will be read not only by other
+ * backends, but also by our own backend at the next transaction start (via
+ * AcceptInvalidationMessages).  This means that we can skip immediate local
+ * processing of anything that's still in CurrentCmdInvalidMsgs, and just send
+ * that list out too.
  *
  * In any case, reset our state to empty.  We need not physically
  * free memory here, since TopTransactionContext is about to be emptied
  * anyway.
+ */
+void
+AtCommit_Inval(void)
+{
+       /* PANIC rather than let other backends use stale cache entries. */
+       Assert(CritSectionCount > 0);
+
+       if (transInvalInfo == NULL)
+               return;
+
+       AppendInvalidationMessages(&transInvalInfo->PriorCmdInvalidMsgs,
+                                                          
&transInvalInfo->CurrentCmdInvalidMsgs);
+
+       ProcessInvalidationMessagesMulti(&transInvalInfo->PriorCmdInvalidMsgs,
+                                                                        
SendSharedInvalidMessages);
+
+       if (transInvalInfo->RelcacheInitFileInval)
+               RelationCacheInitFilePostInvalidate();
+
+       /* Need not free anything explicitly */
+       transInvalInfo = NULL;
+}
+
+/*
+ * AtAbort_Inval
+ *             Process queued-up invalidation messages at abort of main 
transaction.
  *
- * Note:
- *             This should be called as the last step in processing a 
transaction.
+ * We must locally process the messages in PriorCmdInvalidMsgs.  No messages
+ * need be sent to other backends, since they'll not have seen our changed
+ * tuples anyway.  We can forget about CurrentCmdInvalidMsgs too, since those
+ * changes haven't touched the caches yet.
  */
 void
-AtEOXact_Inval(bool isCommit)
+AtAbort_Inval(void)
 {
-       /* Quick exit if no messages */
        if (transInvalInfo == NULL)
                return;
 
        /* Must be at top of stack */
        Assert(transInvalInfo->my_level == 1 && transInvalInfo->parent == NULL);
 
-       if (isCommit)
-       {
-               /*
-                * Relcache init file invalidation requires processing both 
before and
-                * after we send the SI messages.  However, we need not do 
anything
-                * unless we committed.
-                */
-               if (transInvalInfo->RelcacheInitFileInval)
-                       RelationCacheInitFilePreInvalidate();
-
-               AppendInvalidationMessages(&transInvalInfo->PriorCmdInvalidMsgs,
-                                                                  
&transInvalInfo->CurrentCmdInvalidMsgs);
-
-               
ProcessInvalidationMessagesMulti(&transInvalInfo->PriorCmdInvalidMsgs,
-                                                                               
 SendSharedInvalidMessages);
-
-               if (transInvalInfo->RelcacheInitFileInval)
-                       RelationCacheInitFilePostInvalidate();
-       }
-       else
-       {
-               
ProcessInvalidationMessages(&transInvalInfo->PriorCmdInvalidMsgs,
-                                                                       
LocalExecuteInvalidationMessage);
-       }
+       ProcessInvalidationMessages(&transInvalInfo->PriorCmdInvalidMsgs,
+                                                               
LocalExecuteInvalidationMessage);
 
        /* Need not free anything explicitly */
        transInvalInfo = NULL;
diff --git a/src/backend/utils/cache/syscache.c 
b/src/backend/utils/cache/syscache.c
index 50c9440..8147cf6 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -352,7 +352,7 @@ SearchSysCacheLocked1(int cacheId,
                /*
                 * If an inplace update just finished, ensure we process the 
syscache
                 * inval.  XXX this is insufficient: the inplace updater may 
not yet
-                * have reached AtEOXact_Inval().  See test at 
inplace-inval.spec.
+                * have reached AtCommit_Inval().  See test at 
inplace-inval.spec.
                 *
                 * If a heap_update() call just released its LOCKTAG_TUPLE, 
we'll
                 * probably find the old tuple and reach "tuple concurrently 
updated".
diff --git a/src/include/utils/inval.h b/src/include/utils/inval.h
index 24695fa..94d8b44 100644
--- a/src/include/utils/inval.h
+++ b/src/include/utils/inval.h
@@ -26,7 +26,9 @@ typedef void (*RelcacheCallbackFunction) (Datum arg, Oid 
relid);
 
 extern void AcceptInvalidationMessages(void);
 
-extern void AtEOXact_Inval(bool isCommit);
+extern void PreCommit_Inval(void);
+extern void AtCommit_Inval(void);
+extern void AtAbort_Inval(void);
 
 extern void AtEOSubXact_Inval(bool isCommit);
 
Author:     Noah Misch <n...@leadboat.com>
Commit:     Noah Misch <n...@leadboat.com>

    For inplace update, send nontransactional invalidations.
    
    The inplace update survives ROLLBACK.  The inval didn't, so another
    backend's DDL could then update the row without incorporating the
    inplace update.  In the test this fixes, a mix of CREATE INDEX and ALTER
    TABLE resulted in a table with an index, yet relhasindex=f.  That is a
    source of index corruption.
    
    Core code no longer needs XLOG_INVALIDATIONS, but this leaves removing
    it for future work.  Extensions could be relying on that mechanism, so
    that removal would not be back-patch material.  Back-patch to v12 (all
    supported versions).
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240523000548.58.nmi...@google.com

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 797bddf..d7e417f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6305,6 +6305,9 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
        Relation        relation = scan->heap_rel;
        uint32          oldlen;
        uint32          newlen;
+       int                     nmsgs = 0;
+       SharedInvalidationMessage *invalMessages = NULL;
+       bool            RelcacheInitFileInval = false;
 
        Assert(ItemPointerEquals(&oldtup->t_self, &tuple->t_self));
        oldlen = oldtup->t_len - htup->t_hoff;
@@ -6312,6 +6315,29 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
        if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
                elog(ERROR, "wrong tuple length");
 
+       /*
+        * Construct shared cache inval if necessary.  Note that because we only
+        * pass the new version of the tuple, this mustn't be used for any
+        * operations that could change catcache lookup keys.  But we aren't
+        * bothering with index updates either, so that's true a fortiori.
+        */
+       CacheInvalidateHeapTupleInplace(relation, tuple, NULL);
+
+       /* Like RecordTransactionCommit(), log only if needed */
+       if (XLogStandbyInfoActive())
+               nmsgs = inplaceGetInvalidationMessages(&invalMessages,
+                                                                               
           &RelcacheInitFileInval);
+
+       /*
+        * Unlink relcache init files as needed.  If unlinking, acquire
+        * RelCacheInitLock until after associated invalidations.  By doing this
+        * in advance, if we checkpoint and then crash between inplace
+        * XLogInsert() and inval, we don't rely on StartupXLOG() ->
+        * RelationCacheInitFileRemove().  That uses elevel==LOG, so replay 
would
+        * neglect to PANIC on EIO.
+        */
+       PreInplace_Inval();
+
        /* NO EREPORT(ERROR) from here till changes are logged */
        START_CRIT_SECTION();
 
@@ -6341,9 +6367,16 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
                XLogRecPtr      recptr;
 
                xlrec.offnum = ItemPointerGetOffsetNumber(&tuple->t_self);
+               xlrec.dbId = MyDatabaseId;
+               xlrec.tsId = MyDatabaseTableSpace;
+               xlrec.relcacheInitFileInval = RelcacheInitFileInval;
+               xlrec.nmsgs = nmsgs;
 
                XLogBeginInsert();
-               XLogRegisterData((char *) &xlrec, SizeOfHeapInplace);
+               XLogRegisterData((char *) &xlrec, MinSizeOfHeapInplace);
+               if (nmsgs != 0)
+                       XLogRegisterData((char *) invalMessages,
+                                                        nmsgs * 
sizeof(SharedInvalidationMessage));
 
                XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
                XLogRegisterBufData(0, (char *) htup + htup->t_hoff, newlen);
@@ -6355,22 +6388,23 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
                PageSetLSN(BufferGetPage(buffer), recptr);
        }
 
-       END_CRIT_SECTION();
-
        LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+
+       /*
+        * Send invalidations to shared queue.  SearchSysCacheLocked1() assumes 
we
+        * do this before UnlockTuple().
+        *
+        * If we're mutating a tuple visible only to this transaction, there's 
an
+        * equivalent transactional inval from the action that created the 
tuple,
+        * and this inval is superfluous.
+        */
+       AtInplace_Inval();
+
+       END_CRIT_SECTION();
        UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
        systable_endscan(scan);
 
-       /*
-        * Send out shared cache inval if necessary.  Note that because we only
-        * pass the new version of the tuple, this mustn't be used for any
-        * operations that could change catcache lookup keys.  But we aren't
-        * bothering with index updates either, so that's true a fortiori.
-        *
-        * XXX ROLLBACK discards the invalidation.  See test inplace-inval.spec.
-        */
-       if (!IsBootstrapProcessingMode())
-               CacheInvalidateHeapTuple(relation, tuple, NULL);
+       AcceptInvalidationMessages();   /* local processing of just-sent inval 
*/
 }
 
 /*
@@ -10268,6 +10302,12 @@ heap_xlog_inplace(XLogReaderState *record)
        }
        if (BufferIsValid(buffer))
                UnlockReleaseBuffer(buffer);
+
+       ProcessCommittedInvalidationMessages(xlrec->msgs,
+                                                                               
 xlrec->nmsgs,
+                                                                               
 xlrec->relcacheInitFileInval,
+                                                                               
 xlrec->dbId,
+                                                                               
 xlrec->tsId);
 }
 
 void
diff --git a/src/backend/access/rmgrdesc/heapdesc.c 
b/src/backend/access/rmgrdesc/heapdesc.c
index 5f5673e..f31cc3a 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
 
 #include "access/heapam_xlog.h"
 #include "access/rmgrdesc_utils.h"
+#include "storage/standbydefs.h"
 
 /*
  * NOTE: "keyname" argument cannot have trailing spaces or punctuation
@@ -253,6 +254,9 @@ heap_desc(StringInfo buf, XLogReaderState *record)
                xl_heap_inplace *xlrec = (xl_heap_inplace *) rec;
 
                appendStringInfo(buf, "off: %u", xlrec->offnum);
+               standby_desc_invalidations(buf, xlrec->nmsgs, xlrec->msgs,
+                                                                  xlrec->dbId, 
xlrec->tsId,
+                                                                  
xlrec->relcacheInitFileInval);
        }
 }
 
diff --git a/src/backend/access/transam/xact.c 
b/src/backend/access/transam/xact.c
index 6f0f80c..bffbcb4 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1358,13 +1358,14 @@ RecordTransactionCommit(void)
 
                /*
                 * Transactions without an assigned xid can contain invalidation
-                * messages.  Inplace updates do so, and standbys need to 
process
-                * those invals.  We can't emit a commit record without an xid, 
and we
-                * don't want to force assigning an xid, because that'd be 
problematic
-                * for e.g. vacuum.  Hence we emit a bespoke record for the
-                * invalidations. We don't want to use that in case a commit 
record is
-                * emitted, so they happen synchronously with commits (besides 
not
-                * wanting to emit more WAL records).
+                * messages.  While inplace updates formerly did so, they now 
send
+                * immediate invalidations.  Extensions might still do so, and
+                * standbys may need to process those invals.  We can't emit a 
commit
+                * record without an xid, and we don't want to force assigning 
an xid,
+                * because that'd be problematic for e.g. vacuum.  Hence we 
emit a
+                * bespoke record for the invalidations. We don't want to use 
that in
+                * case a commit record is emitted, so they happen 
synchronously with
+                * commits (besides not wanting to emit more WAL records).
                 *
                 * XXX Every known use of this capability is a defect.  Since 
an XID
                 * isn't controlling visibility of the change that prompted 
invals,
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b4b68b1..ba576c6 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2884,19 +2884,21 @@ index_update_stats(Relation rel,
        }
 
        /*
-        * If anything changed, write out the tuple
+        * If anything changed, write out the tuple and immediate invals
         */
        if (dirty)
-       {
                heap_inplace_update_finish(state, tuple);
-               /* the above sends a cache inval message */
-       }
        else
-       {
                heap_inplace_update_cancel(state);
-               /* no need to change tuple, but force relcache inval anyway */
-               CacheInvalidateRelcacheByTuple(tuple);
-       }
+
+       /*
+        * Queue a transactional relcache inval.  CREATE INDEX needs an 
immediate
+        * inval for the relhasindex change, but it also needs a transactional
+        * inval for when the new index's rows become visible.  Other CREATE 
INDEX
+        * and REINDEX code happens to also queue a transactional inval, but 
keep
+        * this in case rare callers rely on this part of our API contract.
+        */
+       CacheInvalidateRelcacheByTuple(tuple);
 
        heap_freetuple(tuple);
 
diff --git a/src/backend/commands/event_trigger.c 
b/src/backend/commands/event_trigger.c
index 36d82bd..5d4173a 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -975,11 +975,6 @@ EventTriggerOnLogin(void)
                                 * this instead of regular updates serves two 
purposes. First,
                                 * that avoids possible waiting on the 
row-level lock. Second,
                                 * that avoids dealing with TOAST.
-                                *
-                                * Changes made by inplace update may be lost 
due to
-                                * concurrent normal updates; see 
inplace-inval.spec. However,
-                                * we are OK with that.  The subsequent 
connections will still
-                                * have a chance to set "dathasloginevt" to 
false.
                                 */
                                heap_inplace_update_finish(state, tuple);
                        }
diff --git a/src/backend/replication/logical/decode.c 
b/src/backend/replication/logical/decode.c
index 8ec5adf..b2cf14e 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -508,23 +508,18 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer 
*buf)
 
                        /*
                         * Inplace updates are only ever performed on catalog 
tuples and
-                        * can, per definition, not change tuple visibility.  
Since we
-                        * don't decode catalog tuples, we're not interested in 
the
-                        * record's contents.
+                        * can, per definition, not change tuple visibility.  
Inplace
+                        * updates don't affect storage or interpretation of 
table rows,
+                        * so they don't affect logicalrep_write_tuple() 
outcomes.  Hence,
+                        * we don't process invalidations from the original 
operation.  If
+                        * inplace updates did affect those things, 
invalidations wouldn't
+                        * make it work, since there are no snapshot-specific 
versions of
+                        * inplace-updated values.  Since we also don't decode 
catalog
+                        * tuples, we're not interested in the record's 
contents.
                         *
-                        * In-place updates can be used either by XID-bearing 
transactions
-                        * (e.g.  in CREATE INDEX CONCURRENTLY) or by XID-less
-                        * transactions (e.g.  VACUUM).  In the former case, 
the commit
-                        * record will include cache invalidations, so we mark 
the
-                        * transaction as catalog modifying here. Currently 
that's
-                        * redundant because the commit will do that as well, 
but once we
-                        * support decoding in-progress relations, this will be 
important.
+                        * Older WAL may contain commit-time invals for inplace 
updates.
+                        * Excess invalidation is safe.
                         */
-                       if (!TransactionIdIsValid(xid))
-                               break;
-
-                       (void) SnapBuildProcessChange(builder, xid, 
buf->origptr);
-                       ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, 
buf->origptr);
                        break;
 
                case XLOG_HEAP_CONFIRM:
diff --git a/src/backend/utils/cache/catcache.c 
b/src/backend/utils/cache/catcache.c
index 98aa527..b40e38c 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -2287,7 +2287,8 @@ void
 PrepareToInvalidateCacheTuple(Relation relation,
                                                          HeapTuple tuple,
                                                          HeapTuple newtuple,
-                                                         void (*function) 
(int, uint32, Oid))
+                                                         void (*function) 
(int, uint32, Oid, void *),
+                                                         void *context)
 {
        slist_iter      iter;
        Oid                     reloid;
@@ -2328,7 +2329,7 @@ PrepareToInvalidateCacheTuple(Relation relation,
                hashvalue = CatalogCacheComputeTupleHashValue(ccp, 
ccp->cc_nkeys, tuple);
                dbid = ccp->cc_relisshared ? (Oid) 0 : MyDatabaseId;
 
-               (*function) (ccp->id, hashvalue, dbid);
+               (*function) (ccp->id, hashvalue, dbid, context);
 
                if (newtuple)
                {
@@ -2337,7 +2338,7 @@ PrepareToInvalidateCacheTuple(Relation relation,
                        newhashvalue = CatalogCacheComputeTupleHashValue(ccp, 
ccp->cc_nkeys, newtuple);
 
                        if (newhashvalue != hashvalue)
-                               (*function) (ccp->id, newhashvalue, dbid);
+                               (*function) (ccp->id, newhashvalue, dbid, 
context);
                }
        }
 }
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 0772314..00566fb 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -94,6 +94,10 @@
  *     worth trying to avoid sending such inval traffic in the future, if those
  *     problems can be overcome cheaply.
  *
+ *     When making a nontransactional change to a cacheable object, we must
+ *     likewise send the invalidation immediately, before ending the change's
+ *     critical section.  This includes inplace heap updates, relmap, and smgr.
+ *
  *     When wal_level=logical, write invalidations into WAL at each command 
end to
  *     support the decoding of the in-progress transactions.  See
  *     CommandEndInvalidationMessages.
@@ -130,13 +134,15 @@
 
 /*
  * Pending requests are stored as ready-to-send SharedInvalidationMessages.
- * We keep the messages themselves in arrays in TopTransactionContext
- * (there are separate arrays for catcache and relcache messages).  Control
- * information is kept in a chain of TransInvalidationInfo structs, also
- * allocated in TopTransactionContext.  (We could keep a subtransaction's
- * TransInvalidationInfo in its CurTransactionContext; but that's more
- * wasteful not less so, since in very many scenarios it'd be the only
- * allocation in the subtransaction's CurTransactionContext.)
+ * We keep the messages themselves in arrays in TopTransactionContext (there
+ * are separate arrays for catcache and relcache messages).  For transactional
+ * messages, control information is kept in a chain of TransInvalidationInfo
+ * structs, also allocated in TopTransactionContext.  (We could keep a
+ * subtransaction's TransInvalidationInfo in its CurTransactionContext; but
+ * that's more wasteful not less so, since in very many scenarios it'd be the
+ * only allocation in the subtransaction's CurTransactionContext.)  For
+ * inplace update messages, control information appears in an
+ * InvalidationInfo, allocated in CurrentMemoryContext.
  *
  * We can store the message arrays densely, and yet avoid moving data around
  * within an array, because within any one subtransaction we need only
@@ -147,7 +153,9 @@
  * struct.  Similarly, we need distinguish messages of prior subtransactions
  * from those of the current subtransaction only until the subtransaction
  * completes, after which we adjust the array indexes in the parent's
- * TransInvalidationInfo to include the subtransaction's messages.
+ * TransInvalidationInfo to include the subtransaction's messages.  Inplace
+ * invalidations don't need a concept of command or subtransaction boundaries,
+ * since we send them during the WAL insertion critical section.
  *
  * The ordering of the individual messages within a command's or
  * subtransaction's output is not considered significant, although this
@@ -200,7 +208,7 @@ typedef struct InvalidationMsgsGroup
 
 
 /*----------------
- * Invalidation messages are divided into two groups:
+ * Transactional invalidation messages are divided into two groups:
  *     1) events so far in current command, not yet reflected to caches.
  *     2) events in previous commands of current transaction; these have
  *        been reflected to local caches, and must be either broadcast to
@@ -216,26 +224,36 @@ typedef struct InvalidationMsgsGroup
  *----------------
  */
 
-typedef struct TransInvalidationInfo
+/* fields common to both transactional and inplace invalidation */
+typedef struct InvalidationInfo
 {
-       /* Back link to parent transaction's info */
-       struct TransInvalidationInfo *parent;
-
-       /* Subtransaction nesting depth */
-       int                     my_level;
-
        /* Events emitted by current command */
        InvalidationMsgsGroup CurrentCmdInvalidMsgs;
 
-       /* Events emitted by previous commands of this (sub)transaction */
-       InvalidationMsgsGroup PriorCmdInvalidMsgs;
-
        /* init file must be invalidated? */
        bool            RelcacheInitFileInval;
+} InvalidationInfo;
+
+/* subclass adding fields specific to transactional invalidation */
+typedef struct TransInvalidationInfo
+{
+       /* Base class */
+       struct InvalidationInfo ii;
+
+       /* Events emitted by previous commands of this (sub)transaction */
+       InvalidationMsgsGroup PriorCmdInvalidMsgs;
+
+       /* Back link to parent transaction's info */
+       struct TransInvalidationInfo *parent;
+
+       /* Subtransaction nesting depth */
+       int                     my_level;
 } TransInvalidationInfo;
 
 static TransInvalidationInfo *transInvalInfo = NULL;
 
+static InvalidationInfo *inplaceInvalInfo = NULL;
+
 /* GUC storage */
 int                    debug_discard_caches = 0;
 
@@ -543,9 +561,12 @@ ProcessInvalidationMessagesMulti(InvalidationMsgsGroup 
*group,
 static void
 RegisterCatcacheInvalidation(int cacheId,
                                                         uint32 hashValue,
-                                                        Oid dbId)
+                                                        Oid dbId,
+                                                        void *context)
 {
-       AddCatcacheInvalidationMessage(&transInvalInfo->CurrentCmdInvalidMsgs,
+       InvalidationInfo *info = (InvalidationInfo *) context;
+
+       AddCatcacheInvalidationMessage(&info->CurrentCmdInvalidMsgs,
                                                                   cacheId, 
hashValue, dbId);
 }
 
@@ -555,10 +576,9 @@ RegisterCatcacheInvalidation(int cacheId,
  * Register an invalidation event for all catcache entries from a catalog.
  */
 static void
-RegisterCatalogInvalidation(Oid dbId, Oid catId)
+RegisterCatalogInvalidation(InvalidationInfo *info, Oid dbId, Oid catId)
 {
-       AddCatalogInvalidationMessage(&transInvalInfo->CurrentCmdInvalidMsgs,
-                                                                 dbId, catId);
+       AddCatalogInvalidationMessage(&info->CurrentCmdInvalidMsgs, dbId, 
catId);
 }
 
 /*
@@ -567,10 +587,9 @@ RegisterCatalogInvalidation(Oid dbId, Oid catId)
  * As above, but register a relcache invalidation event.
  */
 static void
-RegisterRelcacheInvalidation(Oid dbId, Oid relId)
+RegisterRelcacheInvalidation(InvalidationInfo *info, Oid dbId, Oid relId)
 {
-       AddRelcacheInvalidationMessage(&transInvalInfo->CurrentCmdInvalidMsgs,
-                                                                  dbId, relId);
+       AddRelcacheInvalidationMessage(&info->CurrentCmdInvalidMsgs, dbId, 
relId);
 
        /*
         * Most of the time, relcache invalidation is associated with system
@@ -587,7 +606,7 @@ RegisterRelcacheInvalidation(Oid dbId, Oid relId)
         * as well.  Also zap when we are invalidating whole relcache.
         */
        if (relId == InvalidOid || RelationIdIsInInitFile(relId))
-               transInvalInfo->RelcacheInitFileInval = true;
+               info->RelcacheInitFileInval = true;
 }
 
 /*
@@ -597,26 +616,27 @@ RegisterRelcacheInvalidation(Oid dbId, Oid relId)
  * Only needed for catalogs that don't have catcaches.
  */
 static void
-RegisterSnapshotInvalidation(Oid dbId, Oid relId)
+RegisterSnapshotInvalidation(InvalidationInfo *info, Oid dbId, Oid relId)
 {
-       AddSnapshotInvalidationMessage(&transInvalInfo->CurrentCmdInvalidMsgs,
-                                                                  dbId, relId);
+       AddSnapshotInvalidationMessage(&info->CurrentCmdInvalidMsgs, dbId, 
relId);
 }
 
 /*
  * PrepareInvalidationState
  *             Initialize inval data for the current (sub)transaction.
  */
-static void
+static InvalidationInfo *
 PrepareInvalidationState(void)
 {
        TransInvalidationInfo *myInfo;
 
        Assert(IsTransactionState());
+       /* Can't queue transactional message while collecting inplace messages. 
*/
+       Assert(inplaceInvalInfo == NULL);
 
        if (transInvalInfo != NULL &&
                transInvalInfo->my_level == GetCurrentTransactionNestLevel())
-               return;
+               return (InvalidationInfo *) transInvalInfo;
 
        myInfo = (TransInvalidationInfo *)
                MemoryContextAllocZero(TopTransactionContext,
@@ -639,7 +659,7 @@ PrepareInvalidationState(void)
                 * counter.  This is a convenient place to check for that, as 
well as
                 * being important to keep management of the message arrays 
simple.
                 */
-               if (NumMessagesInGroup(&transInvalInfo->CurrentCmdInvalidMsgs) 
!= 0)
+               if 
(NumMessagesInGroup(&transInvalInfo->ii.CurrentCmdInvalidMsgs) != 0)
                        elog(ERROR, "cannot start a subtransaction when there 
are unprocessed inval messages");
 
                /*
@@ -648,8 +668,8 @@ PrepareInvalidationState(void)
                 * to update them to follow whatever is already in the arrays.
                 */
                SetGroupToFollow(&myInfo->PriorCmdInvalidMsgs,
-                                                
&transInvalInfo->CurrentCmdInvalidMsgs);
-               SetGroupToFollow(&myInfo->CurrentCmdInvalidMsgs,
+                                                
&transInvalInfo->ii.CurrentCmdInvalidMsgs);
+               SetGroupToFollow(&myInfo->ii.CurrentCmdInvalidMsgs,
                                                 &myInfo->PriorCmdInvalidMsgs);
        }
        else
@@ -665,6 +685,40 @@ PrepareInvalidationState(void)
        }
 
        transInvalInfo = myInfo;
+       return (InvalidationInfo *) myInfo;
+}
+
+/*
+ * PrepareInplaceInvalidationState
+ *             Initialize inval data for an inplace update.
+ *
+ * See previous function for more background.
+ */
+static InvalidationInfo *
+PrepareInplaceInvalidationState(void)
+{
+       InvalidationInfo *myInfo;
+
+       /* limit of one inplace update under assembly */
+       Assert(inplaceInvalInfo == NULL);
+
+       /* gone after WAL insertion CritSection ends, so use current context */
+       myInfo = (InvalidationInfo *) palloc0(sizeof(InvalidationInfo));
+
+       /* Stash our messages past end of the transactional messages, if any. */
+       if (transInvalInfo != NULL)
+               SetGroupToFollow(&myInfo->CurrentCmdInvalidMsgs,
+                                                
&transInvalInfo->ii.CurrentCmdInvalidMsgs);
+       else
+       {
+               InvalMessageArrays[CatCacheMsgs].msgs = NULL;
+               InvalMessageArrays[CatCacheMsgs].maxmsgs = 0;
+               InvalMessageArrays[RelCacheMsgs].msgs = NULL;
+               InvalMessageArrays[RelCacheMsgs].maxmsgs = 0;
+       }
+
+       inplaceInvalInfo = myInfo;
+       return myInfo;
 }
 
 /* ----------------------------------------------------------------
@@ -904,7 +958,7 @@ 
xactGetCommittedInvalidationMessages(SharedInvalidationMessage **msgs,
         * after we send the SI messages.  However, we need not do anything 
unless
         * we committed.
         */
-       *RelcacheInitFileInval = transInvalInfo->RelcacheInitFileInval;
+       *RelcacheInitFileInval = transInvalInfo->ii.RelcacheInitFileInval;
 
        /*
         * Collect all the pending messages into a single contiguous array of
@@ -915,7 +969,7 @@ 
xactGetCommittedInvalidationMessages(SharedInvalidationMessage **msgs,
         * not new ones.
         */
        nummsgs = NumMessagesInGroup(&transInvalInfo->PriorCmdInvalidMsgs) +
-               NumMessagesInGroup(&transInvalInfo->CurrentCmdInvalidMsgs);
+               NumMessagesInGroup(&transInvalInfo->ii.CurrentCmdInvalidMsgs);
 
        *msgs = msgarray = (SharedInvalidationMessage *)
                MemoryContextAlloc(CurTransactionContext,
@@ -928,7 +982,7 @@ 
xactGetCommittedInvalidationMessages(SharedInvalidationMessage **msgs,
                                                                                
msgs,
                                                                                
n * sizeof(SharedInvalidationMessage)),
                                                                 nmsgs += n));
-       ProcessMessageSubGroupMulti(&transInvalInfo->CurrentCmdInvalidMsgs,
+       ProcessMessageSubGroupMulti(&transInvalInfo->ii.CurrentCmdInvalidMsgs,
                                                                CatCacheMsgs,
                                                                
(memcpy(msgarray + nmsgs,
                                                                                
msgs,
@@ -940,7 +994,51 @@ 
xactGetCommittedInvalidationMessages(SharedInvalidationMessage **msgs,
                                                                                
msgs,
                                                                                
n * sizeof(SharedInvalidationMessage)),
                                                                 nmsgs += n));
-       ProcessMessageSubGroupMulti(&transInvalInfo->CurrentCmdInvalidMsgs,
+       ProcessMessageSubGroupMulti(&transInvalInfo->ii.CurrentCmdInvalidMsgs,
+                                                               RelCacheMsgs,
+                                                               
(memcpy(msgarray + nmsgs,
+                                                                               
msgs,
+                                                                               
n * sizeof(SharedInvalidationMessage)),
+                                                                nmsgs += n));
+       Assert(nmsgs == nummsgs);
+
+       return nmsgs;
+}
+
+/*
+ * inplaceGetInvalidationMessages() is called by the inplace update to collect
+ * invalidation messages to add to its WAL record.  Like the previous
+ * function, we might still fail.
+ */
+int
+inplaceGetInvalidationMessages(SharedInvalidationMessage **msgs,
+                                                          bool 
*RelcacheInitFileInval)
+{
+       SharedInvalidationMessage *msgarray;
+       int                     nummsgs;
+       int                     nmsgs;
+
+       /* Quick exit if we haven't done anything with invalidation messages. */
+       if (inplaceInvalInfo == NULL)
+       {
+               *RelcacheInitFileInval = false;
+               *msgs = NULL;
+               return 0;
+       }
+
+       *RelcacheInitFileInval = inplaceInvalInfo->RelcacheInitFileInval;
+       nummsgs = NumMessagesInGroup(&inplaceInvalInfo->CurrentCmdInvalidMsgs);
+       *msgs = msgarray = (SharedInvalidationMessage *)
+               palloc(nummsgs * sizeof(SharedInvalidationMessage));
+
+       nmsgs = 0;
+       ProcessMessageSubGroupMulti(&inplaceInvalInfo->CurrentCmdInvalidMsgs,
+                                                               CatCacheMsgs,
+                                                               
(memcpy(msgarray + nmsgs,
+                                                                               
msgs,
+                                                                               
n * sizeof(SharedInvalidationMessage)),
+                                                                nmsgs += n));
+       ProcessMessageSubGroupMulti(&inplaceInvalInfo->CurrentCmdInvalidMsgs,
                                                                RelCacheMsgs,
                                                                
(memcpy(msgarray + nmsgs,
                                                                                
msgs,
@@ -1015,6 +1113,7 @@ 
ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs,
 void
 PreCommit_Inval(void)
 {
+       Assert(inplaceInvalInfo == NULL);
        /* This is a separate function just to run before the critical section. 
*/
        Assert(CritSectionCount == 0);
 
@@ -1028,7 +1127,20 @@ PreCommit_Inval(void)
         * Relcache init file invalidation requires processing both before and
         * after we send the SI messages.
         */
-       if (transInvalInfo->RelcacheInitFileInval)
+       if (transInvalInfo->ii.RelcacheInitFileInval)
+               RelationCacheInitFilePreInvalidate();
+}
+
+/*
+ * PreInplace_Inval
+ *             Like previous function, but for inplace updates.
+ */
+void
+PreInplace_Inval(void)
+{
+       Assert(CritSectionCount == 0);
+
+       if (inplaceInvalInfo && inplaceInvalInfo->RelcacheInitFileInval)
                RelationCacheInitFilePreInvalidate();
 }
 
@@ -1062,12 +1174,12 @@ AtCommit_Inval(void)
                return;
 
        AppendInvalidationMessages(&transInvalInfo->PriorCmdInvalidMsgs,
-                                                          
&transInvalInfo->CurrentCmdInvalidMsgs);
+                                                          
&transInvalInfo->ii.CurrentCmdInvalidMsgs);
 
        ProcessInvalidationMessagesMulti(&transInvalInfo->PriorCmdInvalidMsgs,
                                                                         
SendSharedInvalidMessages);
 
-       if (transInvalInfo->RelcacheInitFileInval)
+       if (transInvalInfo->ii.RelcacheInitFileInval)
                RelationCacheInitFilePostInvalidate();
 
        /* Need not free anything explicitly */
@@ -1075,6 +1187,27 @@ AtCommit_Inval(void)
 }
 
 /*
+ * AtInplace_Inval
+ *             Like previous function, but for inplace updates.
+ */
+void
+AtInplace_Inval(void)
+{
+       Assert(CritSectionCount > 0);
+
+       if (inplaceInvalInfo == NULL)
+               return;
+
+       
ProcessInvalidationMessagesMulti(&inplaceInvalInfo->CurrentCmdInvalidMsgs,
+                                                                        
SendSharedInvalidMessages);
+
+       if (inplaceInvalInfo->RelcacheInitFileInval)
+               RelationCacheInitFilePostInvalidate();
+
+       inplaceInvalInfo = NULL;
+}
+
+/*
  * AtAbort_Inval
  *             Process queued-up invalidation messages at abort of main 
transaction.
  *
@@ -1097,6 +1230,7 @@ AtAbort_Inval(void)
 
        /* Need not free anything explicitly */
        transInvalInfo = NULL;
+       inplaceInvalInfo = NULL;
 }
 
 /*
@@ -1164,18 +1298,21 @@ AtEOSubXact_Inval(bool isCommit)
                                                                   
&myInfo->PriorCmdInvalidMsgs);
 
                /* Must readjust parent's CurrentCmdInvalidMsgs indexes now */
-               SetGroupToFollow(&myInfo->parent->CurrentCmdInvalidMsgs,
+               SetGroupToFollow(&myInfo->parent->ii.CurrentCmdInvalidMsgs,
                                                 
&myInfo->parent->PriorCmdInvalidMsgs);
 
                /* Pending relcache inval becomes parent's problem too */
-               if (myInfo->RelcacheInitFileInval)
-                       myInfo->parent->RelcacheInitFileInval = true;
+               if (myInfo->ii.RelcacheInitFileInval)
+                       myInfo->parent->ii.RelcacheInitFileInval = true;
 
                /* Pop the transaction state stack */
                transInvalInfo = myInfo->parent;
 
                /* Need not free anything else explicitly */
                pfree(myInfo);
+
+               /* Successful inplace update must clear this. */
+               Assert(inplaceInvalInfo == NULL);
        }
        else
        {
@@ -1187,6 +1324,9 @@ AtEOSubXact_Inval(bool isCommit)
 
                /* Need not free anything else explicitly */
                pfree(myInfo);
+
+               /* Reset from aborted inplace update. */
+               inplaceInvalInfo = NULL;
        }
 }
 
@@ -1216,7 +1356,7 @@ CommandEndInvalidationMessages(void)
        if (transInvalInfo == NULL)
                return;
 
-       ProcessInvalidationMessages(&transInvalInfo->CurrentCmdInvalidMsgs,
+       ProcessInvalidationMessages(&transInvalInfo->ii.CurrentCmdInvalidMsgs,
                                                                
LocalExecuteInvalidationMessage);
 
        /* WAL Log per-command invalidation messages for wal_level=logical */
@@ -1224,26 +1364,21 @@ CommandEndInvalidationMessages(void)
                LogLogicalInvalidations();
 
        AppendInvalidationMessages(&transInvalInfo->PriorCmdInvalidMsgs,
-                                                          
&transInvalInfo->CurrentCmdInvalidMsgs);
+                                                          
&transInvalInfo->ii.CurrentCmdInvalidMsgs);
 }
 
 
 /*
- * CacheInvalidateHeapTuple
- *             Register the given tuple for invalidation at end of command
- *             (ie, current command is creating or outdating this tuple).
- *             Also, detect whether a relcache invalidation is implied.
- *
- * For an insert or delete, tuple is the target tuple and newtuple is NULL.
- * For an update, we are called just once, with tuple being the old tuple
- * version and newtuple the new version.  This allows avoidance of duplicate
- * effort during an update.
+ * CacheInvalidateHeapTupleCommon
+ *             Common logic for end-of-command and inplace variants.
  */
-void
-CacheInvalidateHeapTuple(Relation relation,
-                                                HeapTuple tuple,
-                                                HeapTuple newtuple)
+static void
+CacheInvalidateHeapTupleCommon(Relation relation,
+                                                          HeapTuple tuple,
+                                                          HeapTuple newtuple,
+                                                          InvalidationInfo 
*(*prepare_callback) (void))
 {
+       InvalidationInfo *info;
        Oid                     tupleRelId;
        Oid                     databaseId;
        Oid                     relationId;
@@ -1267,11 +1402,8 @@ CacheInvalidateHeapTuple(Relation relation,
        if (IsToastRelation(relation))
                return;
 
-       /*
-        * If we're not prepared to queue invalidation messages for this
-        * subtransaction level, get ready now.
-        */
-       PrepareInvalidationState();
+       /* Allocate any required resources. */
+       info = prepare_callback();
 
        /*
         * First let the catcache do its thing
@@ -1280,11 +1412,12 @@ CacheInvalidateHeapTuple(Relation relation,
        if (RelationInvalidatesSnapshotsOnly(tupleRelId))
        {
                databaseId = IsSharedRelation(tupleRelId) ? InvalidOid : 
MyDatabaseId;
-               RegisterSnapshotInvalidation(databaseId, tupleRelId);
+               RegisterSnapshotInvalidation(info, databaseId, tupleRelId);
        }
        else
                PrepareToInvalidateCacheTuple(relation, tuple, newtuple,
-                                                                         
RegisterCatcacheInvalidation);
+                                                                         
RegisterCatcacheInvalidation,
+                                                                         (void 
*) info);
 
        /*
         * Now, is this tuple one of the primary definers of a relcache entry? 
See
@@ -1357,7 +1490,44 @@ CacheInvalidateHeapTuple(Relation relation,
        /*
         * Yes.  We need to register a relcache invalidation event.
         */
-       RegisterRelcacheInvalidation(databaseId, relationId);
+       RegisterRelcacheInvalidation(info, databaseId, relationId);
+}
+
+/*
+ * CacheInvalidateHeapTuple
+ *             Register the given tuple for invalidation at end of command
+ *             (ie, current command is creating or outdating this tuple) and 
end of
+ *             transaction.  Also, detect whether a relcache invalidation is 
implied.
+ *
+ * For an insert or delete, tuple is the target tuple and newtuple is NULL.
+ * For an update, we are called just once, with tuple being the old tuple
+ * version and newtuple the new version.  This allows avoidance of duplicate
+ * effort during an update.
+ */
+void
+CacheInvalidateHeapTuple(Relation relation,
+                                                HeapTuple tuple,
+                                                HeapTuple newtuple)
+{
+       CacheInvalidateHeapTupleCommon(relation, tuple, newtuple,
+                                                                  
PrepareInvalidationState);
+}
+
+/*
+ * CacheInvalidateHeapTupleInplace
+ *             Register the given tuple for nontransactional invalidation 
pertaining
+ *             to an inplace update.  Also, detect whether a relcache 
invalidation is
+ *             implied.
+ *
+ * Like CacheInvalidateHeapTuple(), but for inplace updates.
+ */
+void
+CacheInvalidateHeapTupleInplace(Relation relation,
+                                                               HeapTuple tuple,
+                                                               HeapTuple 
newtuple)
+{
+       CacheInvalidateHeapTupleCommon(relation, tuple, newtuple,
+                                                                  
PrepareInplaceInvalidationState);
 }
 
 /*
@@ -1376,14 +1546,13 @@ CacheInvalidateCatalog(Oid catalogId)
 {
        Oid                     databaseId;
 
-       PrepareInvalidationState();
-
        if (IsSharedRelation(catalogId))
                databaseId = InvalidOid;
        else
                databaseId = MyDatabaseId;
 
-       RegisterCatalogInvalidation(databaseId, catalogId);
+       RegisterCatalogInvalidation(PrepareInvalidationState(),
+                                                               databaseId, 
catalogId);
 }
 
 /*
@@ -1401,15 +1570,14 @@ CacheInvalidateRelcache(Relation relation)
        Oid                     databaseId;
        Oid                     relationId;
 
-       PrepareInvalidationState();
-
        relationId = RelationGetRelid(relation);
        if (relation->rd_rel->relisshared)
                databaseId = InvalidOid;
        else
                databaseId = MyDatabaseId;
 
-       RegisterRelcacheInvalidation(databaseId, relationId);
+       RegisterRelcacheInvalidation(PrepareInvalidationState(),
+                                                                databaseId, 
relationId);
 }
 
 /*
@@ -1422,9 +1590,8 @@ CacheInvalidateRelcache(Relation relation)
 void
 CacheInvalidateRelcacheAll(void)
 {
-       PrepareInvalidationState();
-
-       RegisterRelcacheInvalidation(InvalidOid, InvalidOid);
+       RegisterRelcacheInvalidation(PrepareInvalidationState(),
+                                                                InvalidOid, 
InvalidOid);
 }
 
 /*
@@ -1438,14 +1605,13 @@ CacheInvalidateRelcacheByTuple(HeapTuple classTuple)
        Oid                     databaseId;
        Oid                     relationId;
 
-       PrepareInvalidationState();
-
        relationId = classtup->oid;
        if (classtup->relisshared)
                databaseId = InvalidOid;
        else
                databaseId = MyDatabaseId;
-       RegisterRelcacheInvalidation(databaseId, relationId);
+       RegisterRelcacheInvalidation(PrepareInvalidationState(),
+                                                                databaseId, 
relationId);
 }
 
 /*
@@ -1459,8 +1625,6 @@ CacheInvalidateRelcacheByRelid(Oid relid)
 {
        HeapTuple       tup;
 
-       PrepareInvalidationState();
-
        tup = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
        if (!HeapTupleIsValid(tup))
                elog(ERROR, "cache lookup failed for relation %u", relid);
@@ -1650,7 +1814,7 @@ LogLogicalInvalidations(void)
        if (transInvalInfo == NULL)
                return;
 
-       group = &transInvalInfo->CurrentCmdInvalidMsgs;
+       group = &transInvalInfo->ii.CurrentCmdInvalidMsgs;
        nmsgs = NumMessagesInGroup(group);
 
        if (nmsgs > 0)
diff --git a/src/backend/utils/cache/syscache.c 
b/src/backend/utils/cache/syscache.c
index 8147cf6..f41b1c2 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -351,8 +351,7 @@ SearchSysCacheLocked1(int cacheId,
 
                /*
                 * If an inplace update just finished, ensure we process the 
syscache
-                * inval.  XXX this is insufficient: the inplace updater may 
not yet
-                * have reached AtCommit_Inval().  See test at 
inplace-inval.spec.
+                * inval.
                 *
                 * If a heap_update() call just released its LOCKTAG_TUPLE, 
we'll
                 * probably find the old tuple and reach "tuple concurrently 
updated".
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 42736f3..4591e9a 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -20,6 +20,7 @@
 #include "storage/buf.h"
 #include "storage/bufpage.h"
 #include "storage/relfilelocator.h"
+#include "storage/sinval.h"
 #include "utils/relcache.h"
 
 
@@ -425,9 +426,14 @@ typedef struct xl_heap_confirm
 typedef struct xl_heap_inplace
 {
        OffsetNumber offnum;            /* updated tuple's offset on page */
+       Oid                     dbId;                   /* MyDatabaseId */
+       Oid                     tsId;                   /* MyDatabaseTableSpace 
*/
+       bool            relcacheInitFileInval;  /* invalidate relcache init 
files */
+       int                     nmsgs;                  /* number of shared 
inval msgs */
+       SharedInvalidationMessage msgs[FLEXIBLE_ARRAY_MEMBER];
 } xl_heap_inplace;
 
-#define SizeOfHeapInplace      (offsetof(xl_heap_inplace, offnum) + 
sizeof(OffsetNumber))
+#define MinSizeOfHeapInplace   (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
 /*
  * This is what we need to know about setting a visibility map bit
diff --git a/src/include/storage/sinval.h b/src/include/storage/sinval.h
index 8f5744b..c812237 100644
--- a/src/include/storage/sinval.h
+++ b/src/include/storage/sinval.h
@@ -144,6 +144,8 @@ extern void ProcessCatchupInterrupt(void);
 
 extern int     xactGetCommittedInvalidationMessages(SharedInvalidationMessage 
**msgs,
                                                                                
                 bool *RelcacheInitFileInval);
+extern int     inplaceGetInvalidationMessages(SharedInvalidationMessage **msgs,
+                                                                               
   bool *RelcacheInitFileInval);
 extern void ProcessCommittedInvalidationMessages(SharedInvalidationMessage 
*msgs,
                                                                                
                 int nmsgs, bool RelcacheInitFileInval,
                                                                                
                 Oid dbid, Oid tsid);
diff --git a/src/include/utils/catcache.h b/src/include/utils/catcache.h
index 3fb9647..8f04bb8 100644
--- a/src/include/utils/catcache.h
+++ b/src/include/utils/catcache.h
@@ -225,6 +225,7 @@ extern void CatCacheInvalidate(CatCache *cache, uint32 
hashValue);
 extern void PrepareToInvalidateCacheTuple(Relation relation,
                                                                                
  HeapTuple tuple,
                                                                                
  HeapTuple newtuple,
-                                                                               
  void (*function) (int, uint32, Oid));
+                                                                               
  void (*function) (int, uint32, Oid, void *),
+                                                                               
  void *context);
 
 #endif                                                 /* CATCACHE_H */
diff --git a/src/include/utils/inval.h b/src/include/utils/inval.h
index 94d8b44..64c0265 100644
--- a/src/include/utils/inval.h
+++ b/src/include/utils/inval.h
@@ -30,6 +30,9 @@ extern void PreCommit_Inval(void);
 extern void AtCommit_Inval(void);
 extern void AtAbort_Inval(void);
 
+extern void PreInplace_Inval(void);
+extern void AtInplace_Inval(void);
+
 extern void AtEOSubXact_Inval(bool isCommit);
 
 extern void PostPrepare_Inval(void);
@@ -39,6 +42,9 @@ extern void CommandEndInvalidationMessages(void);
 extern void CacheInvalidateHeapTuple(Relation relation,
                                                                         
HeapTuple tuple,
                                                                         
HeapTuple newtuple);
+extern void CacheInvalidateHeapTupleInplace(Relation relation,
+                                                                               
        HeapTuple tuple,
+                                                                               
        HeapTuple newtuple);
 
 extern void CacheInvalidateCatalog(Oid catalogId);
 
diff --git a/src/test/isolation/expected/inplace-inval.out 
b/src/test/isolation/expected/inplace-inval.out
index 67b34ad..c35895a 100644
--- a/src/test/isolation/expected/inplace-inval.out
+++ b/src/test/isolation/expected/inplace-inval.out
@@ -14,7 +14,7 @@ step read1:
 
 relhasindex
 -----------
-f          
+t          
 (1 row)
 
 
diff --git a/src/test/isolation/specs/inplace-inval.spec 
b/src/test/isolation/specs/inplace-inval.spec
index d8e1c98..b99112d 100644
--- a/src/test/isolation/specs/inplace-inval.spec
+++ b/src/test/isolation/specs/inplace-inval.spec
@@ -1,7 +1,7 @@
-# If a heap_update() caller retrieves its oldtup from a cache, it's possible
-# for that cache entry to predate an inplace update, causing loss of that
-# inplace update.  This arises because the transaction may abort before
-# sending the inplace invalidation message to the shared queue.
+# An inplace update had been able to abort before sending the inplace
+# invalidation message to the shared queue.  If a heap_update() caller then
+# retrieved its oldtup from a cache, the heap_update() could revert the
+# inplace update.
 
 setup
 {
@@ -32,7 +32,7 @@ permutation
        cir1    # sets relhasindex=true; rollback discards cache inval
        cic2    # sees relhasindex=true, skips changing it (so no inval)
        ddl3    # cached row as the oldtup of an update, losing relhasindex
-       read1   # observe damage XXX is an extant bug
+       read1   # observe damage
 
 # without cachefill3, no bug
 permutation cir1 cic2 ddl3 read1
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 5696604..b2b565f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1252,6 +1252,7 @@ Interval
 IntervalAggState
 IntoClause
 InvalMessageArray
+InvalidationInfo
 InvalidationMsgsGroup
 IpcMemoryId
 IpcMemoryKey

Reply via email to