On Thu, Oct 29, 2015 at 08:46:52AM +0100, Andres Freund wrote:
> On October 29, 2015 7:59:03 AM GMT+01:00, Noah Misch <n...@leadboat.com> 
> wrote:
> >That helps; thanks.  Your design seems good.  I've located only insipid
> >defects.
> 
> Great!
> 
> > I propose to save some time by writing a patch series
> >eliminating
> >them, which you could hopefully review.  Does that sound good?
> 
> Yes, it does.

I have pushed a stack of branches to https://github.com/nmisch/postgresql.git:

mxt0-revert - reverts commits 4f627f8 and aa29c1c
mxt1-disk-independent - see below
mxt2-cosmetic - update already-wrong comments and formatting
mxt3-main - replaces commit 4f627f8
mxt4-rm-legacy - replaces commit aa29c1c

The plan is to squash each branch into one PostgreSQL commit.  In addition to
examining overall "git diff mxt2-cosmetic mxt3-main", I recommend reviewing
itemized changes and commit log entries in "git log -p --reverse --no-merges
mxt2-cosmetic..mxt3-main".  In particular, when a change involves something
you discussed upthread or was otherwise not obvious, I put a statement of
rationale in the commit log.

> +      * Make sure to only attempt truncation if there's values to truncate
> +      * away. In normal processing values shouldn't go backwards, but there's
> +      * some corner cases (due to bugs) where that's possible.

Which bugs are those?  I would like to include more detail if available.


If anything here requires careful study, it's the small mxt1-disk-independent
change, which I have also attached in patch form.  That patch removes the
SlruScanDirCbFindEarliest() test from TruncateMultiXact(), which in turn makes
multixact control decisions independent of whether TruncateMultiXact() is
successful at unlinking segments.  Today, one undeletable segment file can
cause TruncateMultiXact() to give up on truncation completely for a span of
hundreds of millions of MultiXactId.  Patched multixact.c will, like CLOG,
make its decisions strictly based on the content of files it expects to exist.
It will no longer depend on the absence of files it hopes will not exist.

To aid in explaining the change's effects, I will define some terms.  A
"logical wrap" occurs when no range of 2^31 integers covers the set of
MultiXactId stored in xmax fields.  A "file-level wrap" occurs when there
exists a pair of pg_multixact/offsets segment files such that:

  | segno_a * SLRU_PAGES_PER_SEGMENT * MULTIXACT_OFFSETS_PER_PAGE - 
    segno_b * SLRU_PAGES_PER_SEGMENT * MULTIXACT_OFFSETS_PER_PAGE | > 2^31

A logical wrap implies either a file-level wrap or missing visibility
metadata, but a file-level wrap does not imply other consequences.  The
SlruScanDirCbFindEarliest() test is usually redundant with
find_multixact_start(), because MultiXactIdPrecedes(oldestMXact, earliest)
almost implies that find_multixact_start() will fail.  The exception arises
when pg_multixact/offsets files compose a file-level wrap, which can happen
when TruncateMultiXact() fails to unlink segments as planned.  When it does
happen, the result of SlruScanDirCbFindEarliest(), and therefore the computed
"earliest" value, is undefined.  (This outcome is connected to our requirement
to use only half the pg_clog or pg_multixact/offsets address space at any one
time.  The PagePrecedes callbacks for these SLRUs cease to be transitive if
more than half their address space is in use.)

The SlruScanDirCbFindEarliest() test can be helpful when a file-level wrap
coexists with incorrect oldestMultiXactId (extant xmax values define "correct
oldestMultiXactId").  If we're lucky with readdir() order, the test will block
truncation so we don't delete still-needed segments.  I am content to lose
that, because (a) the code isn't reliable for (or even directed toward) that
purpose and (b) sites running on today's latest point releases no longer have
incorrect oldestMultiXactId.
diff --git a/src/backend/access/transam/multixact.c 
b/src/backend/access/transam/multixact.c
index cb12fc3..cb4a0cd 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2945,29 +2945,6 @@ SlruScanDirCbRemoveMembers(SlruCtl ctl, char *filename, 
int segpage,
        return false;                           /* keep going */
 }
 
-typedef struct mxtruncinfo
-{
-       int                     earliestExistingPage;
-} mxtruncinfo;
-
-/*
- * SlruScanDirectory callback
- *             This callback determines the earliest existing page number.
- */
-static bool
-SlruScanDirCbFindEarliest(SlruCtl ctl, char *filename, int segpage, void *data)
-{
-       mxtruncinfo *trunc = (mxtruncinfo *) data;
-
-       if (trunc->earliestExistingPage == -1 ||
-               ctl->PagePrecedes(segpage, trunc->earliestExistingPage))
-       {
-               trunc->earliestExistingPage = segpage;
-       }
-
-       return false;                           /* keep going */
-}
-
 /*
  * Remove all MultiXactOffset and MultiXactMember segments before the oldest
  * ones still of interest.
@@ -2986,8 +2963,6 @@ TruncateMultiXact(void)
        MultiXactOffset oldestOffset;
        MultiXactId             nextMXact;
        MultiXactOffset nextOffset;
-       mxtruncinfo trunc;
-       MultiXactId earliest;
        MembersLiveRange range;
 
        Assert(AmCheckpointerProcess() || AmStartupProcess() ||
@@ -3001,49 +2976,20 @@ TruncateMultiXact(void)
        Assert(MultiXactIdIsValid(oldestMXact));
 
        /*
-        * Note we can't just plow ahead with the truncation; it's possible that
-        * there are no segments to truncate, which is a problem because we are
-        * going to attempt to read the offsets page to determine where to
-        * truncate the members SLRU.  So we first scan the directory to 
determine
-        * the earliest offsets page number that we can read without error.
-        */
-       trunc.earliestExistingPage = -1;
-       SlruScanDirectory(MultiXactOffsetCtl, SlruScanDirCbFindEarliest, 
&trunc);
-       earliest = trunc.earliestExistingPage * MULTIXACT_OFFSETS_PER_PAGE;
-       if (earliest < FirstMultiXactId)
-               earliest = FirstMultiXactId;
-
-       /*
-        * If there's nothing to remove, we can bail out early.
-        *
-        * Due to bugs in early releases of PostgreSQL 9.3.X and 9.4.X,
-        * oldestMXact might point to a multixact that does not exist.
-        * Autovacuum will eventually advance it to a value that does exist,
-        * and we want to set a proper offsetStopLimit when that happens,
-        * so call DetermineSafeOldestOffset here even if we're not actually
-        * truncating.
-        */
-       if (MultiXactIdPrecedes(oldestMXact, earliest))
-       {
-               DetermineSafeOldestOffset(oldestMXact);
-               return;
-       }
-
-       /*
         * First, compute the safe truncation point for MultiXactMember. This is
         * the starting offset of the oldest multixact.
         *
-        * Hopefully, find_multixact_start will always work here, because we've
-        * already checked that it doesn't precede the earliest MultiXact on
-        * disk.  But if it fails, don't truncate anything, and log a message.
+        * Due to bugs in early releases of PostgreSQL 9.3.X and 9.4.X,
+        * oldestMXact might point to a multixact that does not exist.  Call
+        * DetermineSafeOldestOffset() to emit the message about disabled member
+        * wraparound protection.  Autovacuum will eventually advance 
oldestMXact
+        * to a value that does exist.
         */
        if (oldestMXact == nextMXact)
                oldestOffset = nextOffset;              /* there are NO 
MultiXacts */
        else if (!find_multixact_start(oldestMXact, &oldestOffset))
        {
-               ereport(LOG,
-                               (errmsg("oldest MultiXact %u not found, 
earliest MultiXact %u, skipping truncation",
-                                       oldestMXact, earliest)));
+               DetermineSafeOldestOffset(oldestMXact);
                return;
        }
 
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to