On 2015-12-09 16:50:06 -0500, Robert Haas wrote:
> Now, if subsequent to this an index scan happens to sweep through and
> try to fetch a block in 123456.2, it will work!  This happens because
> _mdfd_getseg() doesn't care about the length of the segments; it only
> cares whether or not they exist.

Unless in recovery in the startup process, or when EXTENSION_CREATE is
passed to it. Depending on whether it or mdnblocks were called first,
and depending on which segment is missing.  In that case it'd *possibly*
pad the last block in a sgment with zeroes, Yes, only the last block:
                         * We have to maintain the invariant that segments 
before the last
                         * active segment are of size RELSEG_SIZE; therefore, 
pad them out
                         * with zeroes if needed.  (This only matters if caller 
                         * extending the relation discontiguously, but that can 
happen in
                         * hash indexes.)
                        if (behavior == EXTENSION_CREATE || InRecovery)
                                if (_mdnblocks(reln, forknum, v) < RELSEG_SIZE)
                                        char       *zerobuf = palloc0(BLCKSZ);

                                        mdextend(reln, forknum,
                                                         nextsegno * 
((BlockNumber) RELSEG_SIZE) - 1,
                                                         zerobuf, skipFsync);
                                v->mdfd_chain = _mdfd_openseg(reln, forknum, 
+nextsegno, O_CREAT);

> If 123456.1 were actually missing,
> then we'd never test whether 123456.2 exists and we'd get an error.
> But because mdnblocks() created 123456.1, _mdfd_getseg() is now quite
> happy to fetch blocks in 123456.2; it considers the empty 123456.1
> file to be a sufficient reason to look for 123456.2, and seeing that
> file, and finding the block it wants therein, it happily returns that
> block, blithely ignoring the fact that it passed over a completely .1
> segment before returning a block from .2.  This is maybe not the
> smartest thing ever.

I think it might be worthwhile to add a check against this. It'd not be
very hard to ensure the segment is indeed large enough iff further
segments exist. Possibly not an ERROR, but a WARNING might be useful. To
avoid overhead and absurd log spamming we should make that check only if
the the segment isn't already open.

> The comment in mdnblocks.c says this:
>                          * Because we pass O_CREAT, we will create the
> next segment (with
>                          * zero length) immediately, if the last
> segment is of length
>                          * RELSEG_SIZE.  While perhaps not strictly
> necessary, this keeps
>                          * the logic simple.
> I don't really see how this "keeps the logic simple".  What it does is
> allow sequential scans and index scans to have two completely
> different notions of how many accessible blocks there are in the
> relation.  Granted, this kind of thing really shouldn't happen, but
> sometimes bad things do happen.  Therefore, I propose the attached
> patch.

I'm in favor of changing this behaviour - it's indeed weird and somewhat
error prone.

Do you want to change this in master, or backpatch a change? To me the
code in md.c is subtle and complicated enough that I'm rather inclined
to only change this in master.

This code is really hard to test effectively :(.


Andres Freund

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to