On Thu, Aug 29, 2024 at 09:08:43PM +0530, Nitin Motiani wrote: > On Thu, Aug 29, 2024 at 8:11 PM Noah Misch <n...@leadboat.com> wrote: > > On Tue, Aug 20, 2024 at 11:59:45AM +0300, Heikki Linnakangas wrote: > > > My order of preference is: 2, 1, 3. > > > > I kept tuple locking responsibility in heapam.c. That's simpler and better > > for modularity, but it does mean we release+acquire after any xmax wait. > > Before, we avoided that if the next genam.c scan found the same TID. (If > > the > > next scan finds the same TID, the xmax probably aborted.) I think DDL > > aborts > > are rare enough to justify simplifying as this version does. I don't expect > > anyone to notice the starvation outside of tests built to show it. (With > > previous versions, one can show it with a purpose-built test that commits > > instead of aborting, like the "001_pgbench_grant@9" test.) > > > > This move also loses the optimization of unpinning before > > XactLockTableWait(). > > heap_update() doesn't optimize that way, so that's fine. > > > > The move ended up more like (1), though I did do > > s/systable_inplace_update_scan/systable_inplace_update_begin/ like in (2). > > I > > felt that worked better than (2) to achieve lock release before > > CacheInvalidateHeapTuple(). Alternatives that could be fine: > > > From a consistency point of view, I find it cleaner if we can have all > the heap_inplace_lock and heap_inplace_unlock in the same set of > functions. So here those would be the systable_inplace_* functions.
That will technically be the case after inplace160, and I could make it so here by inlining heap_inplace_unlock() into its heapam.c caller. Would that be cleaner or less clean? > > - In the cancel case, call both systable_inplace_update_cancel and > > systable_inplace_update_end. _finish or _cancel would own unlock, while > > _end would own systable_endscan(). > > > What happens to CacheInvalidateHeapTuple() in this approach? I think > it will still need to be brought to the genam.c layer if we are > releasing the lock in systable_inplace_update_finish. Cancel scenarios don't do invalidation. (Same under other alternatives.) > > - Hoist the CacheInvalidateHeapTuple() up to the genam.c layer. While > > tolerable now, this gets less attractive after the inplace160 patch from > > https://postgr.es/m/flat/20240523000548.58.nmi...@google.com > > > I skimmed through the inplace160 patch. It wasn't clear to me why this > becomes less attractive with the patch. I see there is a new > CacheInvalidateHeapTupleInPlace but that looks like it would be called > while holding the lock. And then there is an > AcceptInvalidationMessages which can perhaps be moved to the genam.c > layer too. Is the concern that one invalidation call will be in the > heapam layer and the other will be in the genam layer? That, or a critical section would start in heapam.c, then end in genam.c. Current call tree at inplace160 v4: genam.c:systable_inplace_update_finish heapam.c:heap_inplace_update PreInplace_Inval START_CRIT_SECTION BUFFER_LOCK_UNLOCK AtInplace_Inval END_CRIT_SECTION UnlockTuple AcceptInvalidationMessages If we hoisted all of invalidation up to the genam.c layer, a critical section that starts in heapam.c would end in genam.c: genam.c:systable_inplace_update_finish PreInplace_Inval heapam.c:heap_inplace_update START_CRIT_SECTION BUFFER_LOCK_UNLOCK AtInplace_Inval END_CRIT_SECTION UnlockTuple AcceptInvalidationMessages If we didn't accept splitting the critical section but did accept splitting invalidation responsibilities, one gets perhaps: genam.c:systable_inplace_update_finish PreInplace_Inval heapam.c:heap_inplace_update START_CRIT_SECTION BUFFER_LOCK_UNLOCK AtInplace_Inval END_CRIT_SECTION UnlockTuple AcceptInvalidationMessages That's how I ended up at inplace120 v9's design. > Also I have a small question from inplace120. > > I see that all the places we check resultRelInfo->ri_needLockTagTuple, > we can just call > IsInplaceUpdateRelation(resultRelInfo->ri_RelationDesc). Is there a > big advantage of storing a separate bool field? Also there is another No, not a big advantage. I felt it was more in line with the typical style of src/backend/executor. > write to ri_RelationDesc in CatalogOpenIndexes in > src/backlog/catalog/indexing.c. I think ri_needLockTagTuple needs to > be set there also to keep it consistent with ri_RelationDesc. Please > let me know if I am missing something about the usage of the new > field. Can you say more about consequences you found? Only the full executor reads the field, doing so when it fetches the most recent version of a row. CatalogOpenIndexes() callers lack the full executor's practice of fetching the most recent version of a row, so they couldn't benefit reading the field. I don't think any CatalogOpenIndexes() caller passes its ResultRelInfo to the full executor, and "typedef struct ResultRelInfo *CatalogIndexState" exists in part to keep it that way. Since CatalogOpenIndexes() skips ri_TrigDesc and other fields, I would expect other malfunctions if some caller tried. Thanks, nm