On 2019-03-26 18:22:55 +0100, Michael Banck wrote:
> Hi,
> 
> Am Dienstag, den 19.03.2019, 13:00 -0700 schrieb Andres Freund:
> > CREATE TABLE corruptme AS SELECT g.i::text AS data FROM generate_series(1, 
> > 1000000) g(i);
> > SELECT pg_relation_size('corruptme');
> > postgres[22890][1]=# SELECT current_setting('data_directory') || '/' || 
> > pg_relation_filepath('corruptme');
> > ┌─────────────────────────────────────┐
> > │              ?column?               │
> > ├─────────────────────────────────────┤
> > │ /srv/dev/pgdev-dev/base/13390/16384 │
> > └─────────────────────────────────────┘
> > (1 row)
> > dd if=/dev/urandom of=/srv/dev/pgdev-dev/base/13390/16384 bs=8192 count=1 
> > conv=notrunc
> > 
> > Try a basebackup and see how many times it'll detect the corrupt
> > data. In the vast majority of cases you're going to see checksum
> > failures when reading the data for normal operation, but not when using
> > basebackup (or this new tool).
> > 
> > At the very very least this would need to do
> > 
> > a) checks that the page is all zeroes if PageIsNew() (like
> >    PageIsVerified() does for the backend). That avoids missing cases
> >    where corruption just zeroed out the header, but not the whole page.
> > b) Check that pd_lsn is between startlsn and the insertion pointer. That
> >    avoids accepting just about all random data.
> > 
> > And that'd *still* be less strenuous than what normal backends
> > check. And that's already not great (due to not noticing zeroed out
> > data).
> 
> I've done the above in the attached patch now. Well, literally like an
> hour ago, then went jogging and came back to see you outlined about
> fixing this differently in a separate thread. Still might be helpful for
> the TAP test changes at least.

Sorry, I just hadn't seen much movement on this, and I'm a bit concerned
about such a critical issue not being addressed.


>                               /*
> -                              * Only check pages which have not been 
> modified since the
> -                              * start of the base backup. Otherwise, they 
> might have been
> -                              * written only halfway and the checksum would 
> not be valid.
> -                              * However, replaying WAL would reinstate the 
> correct page in
> -                              * this case. We also skip completely new 
> pages, since they
> -                              * don't have a checksum yet.
> +                              * We skip completely new pages after checking 
> they are
> +                              * all-zero, since they don't have a checksum 
> yet.
>                                */
> -                             if (!PageIsNew(page) && PageGetLSN(page) < 
> startptr)
> +                             if (PageIsNew(page))
>                               {
> -                                     checksum = pg_checksum_page((char *) 
> page, blkno + segmentno * RELSEG_SIZE);
> -                                     phdr = (PageHeader) page;
> -                                     if (phdr->pd_checksum != checksum)
> +                                     all_zeroes = true;
> +                                     pagebytes = (size_t *) page;
> +                                     for (int i = 0; i < (BLCKSZ / 
> sizeof(size_t)); i++)

Can we please abstract the zeroeness check into a separate function to
be used both by PageIsVerified() and this?


> +                                     if (!all_zeroes)
> +                                     {
> +                                             /*
> +                                              * pd_upper is zero, but the 
> page is not all zero.  We
> +                                              * cannot run 
> pg_checksum_page() on the page as it
> +                                              * would throw an assertion 
> failure.  Consider this a
> +                                              * checksum failure.
> +                                              */

I don't think the assertion failure is the relevant bit here, it's htat
the page is corrupted, no?


> +                                     /*
> +                                      * Only check pages which have not been 
> modified since the
> +                                      * start of the base backup. Otherwise, 
> they might have been
> +                                      * written only halfway and the 
> checksum would not be valid.
> +                                      * However, replaying WAL would 
> reinstate the correct page in
> +                                      * this case. If the page LSN is larger 
> than the current redo
> +                                      * pointer then we assume a bogus LSN 
> due to random page header
> +                                      * corruption and do verify the 
> checksum.
> +                                      */
> +                                     if (PageGetLSN(page) < startptr || 
> PageGetLSN(page) > GetRedoRecPtr())

I don't think GetRedoRecPtr() is the right check? Wouldn't it need to be
GetInsertRecPtr()?

Greetings,

Andres Freund

Reply via email to