Re: race condition in pg_class

Noah Misch Fri, 30 Aug 2024 18:10:59 -0700

On Thu, Aug 29, 2024 at 09:08:43PM +0530, Nitin Motiani wrote:
> On Thu, Aug 29, 2024 at 8:11 PM Noah Misch <n...@leadboat.com> wrote:
> > On Tue, Aug 20, 2024 at 11:59:45AM +0300, Heikki Linnakangas wrote:
> > > My order of preference is: 2, 1, 3.
> >
> > I kept tuple locking responsibility in heapam.c.  That's simpler and better
> > for modularity, but it does mean we release+acquire after any xmax wait.
> > Before, we avoided that if the next genam.c scan found the same TID.  (If 
> > the
> > next scan finds the same TID, the xmax probably aborted.)  I think DDL 
> > aborts
> > are rare enough to justify simplifying as this version does.  I don't expect
> > anyone to notice the starvation outside of tests built to show it.  (With
> > previous versions, one can show it with a purpose-built test that commits
> > instead of aborting, like the "001_pgbench_grant@9" test.)
> >
> > This move also loses the optimization of unpinning before 
> > XactLockTableWait().
> > heap_update() doesn't optimize that way, so that's fine.
> >
> > The move ended up more like (1), though I did do
> > s/systable_inplace_update_scan/systable_inplace_update_begin/ like in (2).  
> > I
> > felt that worked better than (2) to achieve lock release before
> > CacheInvalidateHeapTuple().  Alternatives that could be fine:
> >
> From a consistency point of view, I find it cleaner if we can have all
> the heap_inplace_lock and heap_inplace_unlock in the same set of
> functions. So here those would be the systable_inplace_* functions.


That will technically be the case after inplace160, and I could make it so
here by inlining heap_inplace_unlock() into its heapam.c caller.  Would that
be cleaner or less clean?

> > - In the cancel case, call both systable_inplace_update_cancel and
> >   systable_inplace_update_end.  _finish or _cancel would own unlock, while
> >   _end would own systable_endscan().
> >
> What happens to CacheInvalidateHeapTuple() in this approach?  I think
> it will still need to be brought to the genam.c layer if we are
> releasing the lock in systable_inplace_update_finish.

Cancel scenarios don't do invalidation.  (Same under other alternatives.)

> > - Hoist the CacheInvalidateHeapTuple() up to the genam.c layer.  While
> >   tolerable now, this gets less attractive after the inplace160 patch from
> >   https://postgr.es/m/flat/20240523000548.58.nmi...@google.com
> >
> I skimmed through the inplace160 patch. It wasn't clear to me why this
> becomes less attractive with the patch. I see there is a new
> CacheInvalidateHeapTupleInPlace but that looks like it would be called
> while holding the lock. And then there is an
> AcceptInvalidationMessages which can perhaps be moved to the genam.c
> layer too. Is the concern that one invalidation call will be in the
> heapam layer and the other will be in the genam layer?

That, or a critical section would start in heapam.c, then end in genam.c.
Current call tree at inplace160 v4:

genam.c:systable_inplace_update_finish
 heapam.c:heap_inplace_update
  PreInplace_Inval
  START_CRIT_SECTION
  BUFFER_LOCK_UNLOCK
  AtInplace_Inval
  END_CRIT_SECTION
  UnlockTuple
  AcceptInvalidationMessages

If we hoisted all of invalidation up to the genam.c layer, a critical section
that starts in heapam.c would end in genam.c:

genam.c:systable_inplace_update_finish
 PreInplace_Inval
 heapam.c:heap_inplace_update
  START_CRIT_SECTION
 BUFFER_LOCK_UNLOCK
 AtInplace_Inval
 END_CRIT_SECTION
 UnlockTuple
 AcceptInvalidationMessages

If we didn't accept splitting the critical section but did accept splitting
invalidation responsibilities, one gets perhaps:

genam.c:systable_inplace_update_finish
 PreInplace_Inval
 heapam.c:heap_inplace_update
  START_CRIT_SECTION
  BUFFER_LOCK_UNLOCK
  AtInplace_Inval
  END_CRIT_SECTION
  UnlockTuple
 AcceptInvalidationMessages

That's how I ended up at inplace120 v9's design.

> Also I have a small question from inplace120.
> 
> I see that all the places we check resultRelInfo->ri_needLockTagTuple,
> we can just call
> IsInplaceUpdateRelation(resultRelInfo->ri_RelationDesc). Is there a
> big advantage of storing a separate bool field? Also there is another

No, not a big advantage.  I felt it was more in line with the typical style of
src/backend/executor.

> write to ri_RelationDesc in CatalogOpenIndexes in
> src/backlog/catalog/indexing.c. I think ri_needLockTagTuple needs to
> be set there also to keep it consistent with ri_RelationDesc. Please
> let me know if I am missing something about the usage of the new
> field.

Can you say more about consequences you found?

Only the full executor reads the field, doing so when it fetches the most
recent version of a row.  CatalogOpenIndexes() callers lack the full
executor's practice of fetching the most recent version of a row, so they
couldn't benefit reading the field.

I don't think any CatalogOpenIndexes() caller passes its ResultRelInfo to the
full executor, and "typedef struct ResultRelInfo *CatalogIndexState" exists in
part to keep it that way.  Since CatalogOpenIndexes() skips ri_TrigDesc and
other fields, I would expect other malfunctions if some caller tried.

Thanks,
nm

Re: race condition in pg_class

Reply via email to