Hi,

We just came across a situation where a corrupted HFS+ filesystem
appears to return ERANGE on a customer machine.  Our first reaction was
to turn zero_damaged_pages on to allow taking a pg_dump backup of the
database, but surprisingly this does not work.  A quick glance at the
code shows the reason:

        if (nbytes != BLCKSZ)
        {
                if (nbytes < 0)
                        ereport(ERROR,
                                        (errcode_for_file_access(),
                                         errmsg("could not read block %u in 
file \"%s\": %m",
                                                        blocknum, 
FilePathName(v->mdfd_vfd))));

                /*
                 * Short read: we are at or past EOF, or we read a partial 
block at
                 * EOF.  Normally this is an error; upper levels should never 
try to
                 * read a nonexistent block.  However, if zero_damaged_pages is 
ON or
                 * we are InRecovery, we should instead return zeroes without
                 * complaining.  This allows, for example, the case of trying to
                 * update a block that was later truncated away.
                 */
                if (zero_damaged_pages || InRecovery)
                        MemSet(buffer, 0, BLCKSZ);
                else
                        ereport(ERROR,
                                        (errcode(ERRCODE_DATA_CORRUPTED),
                                         errmsg("could not read block %u in 
file \"%s\": read only %d of %d bytes",
                                                        blocknum, 
FilePathName(v->mdfd_vfd),
                                                        nbytes, BLCKSZ)));


Note that zero_damaged_pages only enters the picture if it's a short
read, not if the read actually fails completely.

Is this by design, or is this just an oversight?

See
http://lists.gnu.org/archive/html/rdiff-backup-users/2007-12/msg00053.html

I don't have yet any evidence that the filesystem is actually corrupt,
but the error message from the kernel is "Result out of range", which is
not documented to be possible on read() in Mac OS X.

-- 
Álvaro Herrera <alvhe...@alvh.no-ip.org>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to