On Wed, Oct 23, 2024 at 7:54 PM Noah Misch <[email protected]> wrote:
> With the releases wrapping in 2.5 weeks, I'm ambivalent about pushing this > before the release or after. Pushing before means fewer occurrences of > corruption, but pushing after gives more bake time to discover these > changes > were defective. It's hard to predict which helps users more, on a > risk-adjusted basis. I'm leaning toward pushing this week. Opinions? > > On Sun, Oct 20, 2024 at 06:41:37PM +0530, Nitin Motiani wrote: > > I tested the patch locally and it works. And I have no other question > > regarding the structure. So this patch looks good to me to commit. > > Thanks. While resolving a back-branch merge conflict, I found > AtEOXact_Inval() and AtEOSubXact_Inval() were skipping inplaceInvalInfo > tasks > if transInvalInfo==NULL. If one PreInplace_Inval() failed, the session's > next > inplace update would get an assertion failure. Non-assert builds left > inplaceInvalInfo pointing to freed memory, but I didn't find a reachable > malfunction. Separately, the xact.c comment edit wasn't reflecting that v4 > brought back the transactional inval. v6 fixes those. Regarding the > back-branch alternatives to the WAL format change: > > On Tue, Jun 18, 2024 at 08:23:49AM -0700, Noah Misch wrote: > > On Mon, Jun 17, 2024 at 06:57:30PM -0700, Andres Freund wrote: > > > On 2024-06-17 16:58:54 -0700, Noah Misch wrote: > > > > On Sat, Jun 15, 2024 at 03:37:18PM -0700, Noah Misch wrote: > > > > > - heap_xlog_inplace() could set the shared-inval-queue overflow > signal on > > > > > every backend. This is more wasteful, but inplace updates might > be rare > > > > > enough (~once per VACUUM) to make it tolerable. > > > > > > We already set that surprisingly frequently, as > > > a) The size of the sinval queue is small > > > b) If a backend is busy, it does not process catchup interrupts > > > (i.e. executing queries, waiting for a lock prevents processing) > > > c) There's no deduplication of invals, we often end up sending the > same inval > > > over and over. > > > > > > So I suspect this might not be too bad, compared to the current > badness. > > > > That is good. > > I benchmarked that by hacking 027_stream_regress.pl to run "pgbench > --no-vacuum --client=4 -T 30 -b select-only" on the standby, concurrent > with > the primary running the regression tests. Standby clients acted on sinval > resetState ~16k times, and pgbench tps decreased 4.5%. That doesn't > necessarily mean the real-life cost would be unacceptable, but it was > enough > of a decrease that I switched to the next choice: > > > We might be able to do the overflow signal once at end of > > recovery, like RelationCacheInitFileRemove() does for the init file. > That's > > mildly harder to reason about, but it would be cheaper. Hmmm. > > I'm attaching the branch-specific patches for that and for the main fix. > Other notes from back-patching: > > - All branches change the ABI of PrepareToInvalidateCacheTuple(), a > function > catcache.c exports for the benefit of inval.c. No PGXN extension calls > that, and I can't think of a use case in extensions. > Unfortunately, I can think of four. I have four Table Access Methods that I now need to fork to be compatible with 18.0 and 18.1 on the one hand, and 18.2 onward on the other. I'm sorry I didn't follow this thread before it got pushed. Is there a reason for doing this change in back branches? The thread is pretty long, and I'm struggling to find a security or stability justification for the ABI break, but perhaps there is one. > > - Due to v15 commit 3aafc03, the patch for v14 differs to an unusual degree > from the patch for v15+v16. The difference is mostly mechanical, though. > > Thanks, > nm > -- *Mark Dilger*
