Re: [HACKERS] Rework the way multixact truncations work

Noah Misch Thu, 03 Dec 2015 01:40:12 -0800

On Wed, Dec 02, 2015 at 04:46:26PM +0100, Andres Freund wrote:
> On 2015-12-02 09:57:19 -0500, Noah Misch wrote:
> > Hackers have been too reticent to revert and redo defect-laden
> > commits.  If doing that is weird today, let it be normal.
> 
> Why?


See my paragraph ending with the two sentences you quoted.

> Especially if reverting and redoing includes conflicts that mainly
> increases the chance of accidental bugs.

True.  (That doesn't apply to these patches.)

> > git remote add community git://git.postgresql.org/git/postgresql.git
> > git remote add nmisch_github https://github.com/nmisch/postgresql.git
> > git fetch --multiple community nmisch_github
> > git diff community/master...nmisch_github/mxt4-rm-legacy
> 
> That's a nearly useless diff, because it includes a lot of other changes
> (218 files changed, 2828 insertions(+), 8742 deletions(-)) made since
> you published the changes.

Perhaps you used "git diff a..b", not "git diff a...b".  If not, please send
the outputs of "git rev-parse community/master nmisch_github/mxt4-rm-legacy"
and "git --version".

> > @@ -2190,7 +2194,8 @@ MultiXactSetNextMXact(MultiXactId nextMulti,
> > -   /*
> > -    * Computing the actual limits is only possible once the data directory 
> > is
> > -    * in a consistent state. There's no need to compute the limits while
> > -    * still replaying WAL - no decisions about new multis are made even
> > -    * though multixact creations might be replayed. So we'll only do 
> > further
> > -    * checks after TrimMultiXact() has been called.
> > -    */
> > +   /* Before the TrimMultiXact() call at end of recovery, skip the rest. */
> >     if (!MultiXactState->finishedStartup)
> >             return;
> > -
> >     Assert(!InRecovery);
> > 
> > -   /* Set limits for offset vacuum. */
> > +   /*
> > +    * Setting MultiXactState->oldestOffset entails a find_multixact_start()
> > +    * call, which is only possible once the data directory is in a 
> > consistent
> > +    * state. There's no need for an offset limit while still replaying WAL;
> > +    * no decisions about new multis are made even though multixact 
> > creations
> > +    * might be replayed.
> > +    */
> >     needs_offset_vacuum = SetOffsetVacuumLimit();
> 
> I don't really see the benefit of this change. The previous location of
> the comment is where we return early, so it seems appropriate to
> document the reason there?

I made that low-importance change for two reasons.  First, returning at that
point skips more than just the setting a limit; it also skips autovacuum
signalling and wraparound warnings.  Second, the function has just computed
mxid "actual limits", so branch mxt2-cosmetic made the comment specify that we
defer an offset limit, not any and all limits.

> >  static bool
> >  SetOffsetVacuumLimit(void)
> >  {
> > -   MultiXactId oldestMultiXactId;
> > +   MultiXactId     oldestMultiXactId;
> >     MultiXactId nextMXact;
> > -   MultiXactOffset oldestOffset = 0;       /* placate compiler */
> > -   MultiXactOffset prevOldestOffset;
> > +   MultiXactOffset oldestOffset = 0;               /* placate compiler */
> >     MultiXactOffset nextOffset;
> >     bool            oldestOffsetKnown = false;
> > +   MultiXactOffset prevOldestOffset;
> >     bool            prevOldestOffsetKnown;
> > -   MultiXactOffset offsetStopLimit = 0;
> 
> I don't see the benefit of the order changes here.

I reacted the same way.  Commit 4f627f8 reordered some declarations, which I
reverted when I refinished that commit as branch mxt3-main.

> > -           if (oldestOffsetKnown)
> > -                   ereport(DEBUG1,
> > -                                   (errmsg("oldest MultiXactId member is 
> > at offset %u",
> > -                                                   oldestOffset)));
> 
> That's imo a rather useful debug message.

The branches emit that message at the same times 4f627f8^ and earlier emit it.

> > -           else
> > +           if (!oldestOffsetKnown)
> > +           {
> > +                   /* XXX This message is incorrect if 
> > prevOldestOffsetKnown. */
> >                     ereport(LOG,
> >                                     (errmsg("MultiXact member wraparound 
> > protections are disabled because oldest checkpointed MultiXact %u does not 
> > exist on disk",
> >                                                     oldestMultiXactId)));
> > +           }
> >     }
> 
> Hm, the XXX is a "problem" in all 9.3+ - should we just fix it everywhere?

I welcome a project to fix it.

> >     LWLockRelease(MultiXactTruncationLock);
> > 
> >     /*
> > -    * If we can, compute limits (and install them MultiXactState) to 
> > prevent
> > -    * overrun of old data in the members SLRU area. We can only do so if 
> > the
> > -    * oldest offset is known though.
> > +    * There's no need to update anything if we don't know the oldest offset
> > +    * or if it hasn't changed.
> >      */
> 
> Is that really a worthwhile optimization?

I would neither remove that longstanding optimization nor add it from scratch
today.  Branch commit 06c9979 restored it as part of a larger restoration to
the pre-4f627f8 structure of SetOffsetVacuumLimit().

> > -typedef struct mxtruncinfo
> > -{
> > -   int                     earliestExistingPage;
> > -} mxtruncinfo;
> > -
> > -/*
> > - * SlruScanDirectory callback
> > - *         This callback determines the earliest existing page number.
> > - */
> > -static bool
> > -SlruScanDirCbFindEarliest(SlruCtl ctl, char *filename, int segpage, void 
> > *data)
> > -{
> > -   mxtruncinfo *trunc = (mxtruncinfo *) data;
> > -
> > -   if (trunc->earliestExistingPage == -1 ||
> > -           ctl->PagePrecedes(segpage, trunc->earliestExistingPage))
> > -   {
> > -           trunc->earliestExistingPage = segpage;
> > -   }
> > -
> > -   return false;                           /* keep going */
> > -}
> > -
> 
> That really seems like an independent change, deserving its own commit +
> explanation.

Indeed.  I explained that change at length in
http://www.postgresql.org/message-id/20151108085407.ga1097...@tornado.leadboat.com,
including that it's alone on a branch (mxt1-disk-independent), to become its
own PostgreSQL commit.

> > [branch commit 89a7232]
> 
> I don't think that's really a good idea - this way we restart after
> every single page write, whereas currently we only restart after passing
> through all pages once. In nearly all cases we'll only ever have to
> retry once in total, be because such old pages aren't usually going to
> be reread/redirtied.

Your improvement sounds fine, then.  Would both SimpleLruTruncate() and
SlruDeleteSegment() benefit from it?

> > @@ -9216,10 +9212,8 @@ xlog_redo(XLogReaderState *record)
> >             LWLockRelease(OidGenLock);
> >             MultiXactSetNextMXact(checkPoint.nextMulti,
> >                                                       
> > checkPoint.nextMultiOffset);
> > -
> > -           MultiXactAdvanceOldest(checkPoint.oldestMulti,
> > -                                                      
> > checkPoint.oldestMultiDB);
> >             SetTransactionIdLimit(checkPoint.oldestXid, 
> > checkPoint.oldestXidDB);
> > +           SetMultiXactIdLimit(checkPoint.oldestMulti, 
> > checkPoint.oldestMultiDB);
> > 
> >             /*
> >              * If we see a shutdown checkpoint while waiting for an 
> > end-of-backup
> > @@ -9309,17 +9303,13 @@ xlog_redo(XLogReaderState *record)
> >             LWLockRelease(OidGenLock);
> >             MultiXactAdvanceNextMXact(checkPoint.nextMulti,
> >                                                               
> > checkPoint.nextMultiOffset);
> > -
> > -           /*
> > -            * NB: This may perform multixact truncation when replaying WAL
> > -            * generated by an older primary.
> > -            */
> > -           MultiXactAdvanceOldest(checkPoint.oldestMulti,
> > -                                                      
> > checkPoint.oldestMultiDB);
> >             if (TransactionIdPrecedes(ShmemVariableCache->oldestXid,
> >                                                               
> > checkPoint.oldestXid))
> >                     SetTransactionIdLimit(checkPoint.oldestXid,
> >                                                               
> > checkPoint.oldestXidDB);
> > +           MultiXactAdvanceOldest(checkPoint.oldestMulti,
> > +                                                      
> > checkPoint.oldestMultiDB);
> > +
> >             /* ControlFile->checkPointCopy always tracks the latest ckpt 
> > XID */
> >             ControlFile->checkPointCopy.nextXidEpoch = 
> > checkPoint.nextXidEpoch;
> >             ControlFile->checkPointCopy.nextXid =
> >             checkPoint.nextXid;
> 
> Why?

master does not and will not have legacy truncation, so the deleted comment
does not belong in master.  Regarding the SetMultiXactIdLimit() call:

commit 611a2ec
Author:     Noah Misch <n...@leadboat.com>
AuthorDate: Sat Nov 7 15:06:28 2015 -0500
Commit:     Noah Misch <n...@leadboat.com>
CommitDate: Sat Nov 7 15:06:28 2015 -0500

    In xlog_redo(), believe a SHUTDOWN checkPoint.oldestMulti exactly.
    
    It was so before this branch.  This restores consistency with the
    handling of nextXid, nextMulti and oldestMulti: we treat them as exact
    for XLOG_CHECKPOINT_SHUTDOWN and as minima for XLOG_CHECKPOINT_ONLINE.
    I do not know of a case where this definitely matters for any of these
    counters.  It might matter if a bug causes oldestXid to move forward
    wrongly, causing it to then move backward later.  (I don't know if
    VACUUM does ever move oldestXid backward, but it's a plausible thing to
    do if on-disk state fully agrees with an older value.)  That example has
    no counterpart for oldestMultiXactId, because any update first arrives
    in an XLOG_MULTIXACT_TRUNCATE_ID record.  Therefore, this commit is
    probably cosmetic.

> > -   /*
> > -    * Truncate CLOG, multixact and CommitTs to the oldest computed value.
> > -    */
> >     TruncateCLOG(frozenXID);
> >     TruncateCommitTs(frozenXID);
> >     TruncateMultiXact(minMulti, minmulti_datoid);
> 
> Why? Sure, it's not a super important comment, but ...?

Yeah, it scarcely matters either way.  Commit 4f627f8 reduced this comment to
merely restating the code, so I removed it instead.

> > @@ -2204,8 +2202,8 @@ GetOldestSafeDecodingTransactionId(void)
> >   * the result is somewhat indeterminate, but we don't really care.  Even in
> >   * a multiprocessor with delayed writes to shared memory, it should be 
> > certain
> >   * that setting of delayChkpt will propagate to shared memory when the 
> > backend
> > - * takes a lock, so we cannot fail to see a virtual xact as delayChkpt if
> > - * it's already inserted its commit record.  Whether it takes a little 
> > while
> > + * takes a lock, so we cannot fail to see a virtual xact as delayChkpt if 
> > it's
> > + * already inserted its critical xlog record.  Whether it takes a little 
> > while
> >   * for clearing of delayChkpt to propagate is unimportant for correctness.
> >   */
> 
> Seems unrelated, given that this is already used in
> MarkBufferDirtyHint(). Don't get me wrong, I think the changes are a
> good idea, but it's not really tied to the truncation changes.

Quite so; its branch (one branch = one proposed PostgreSQL commit),
mxt2-cosmetic, contains no truncation changes.  Likewise for the other
independent comment improvements you noted.

nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rework the way multixact truncations work

Reply via email to