Re: pgsql: Validate page level checksums in base backups

2018-04-04 Thread Michael Banck
Hi,

On Wed, Apr 04, 2018 at 11:38:35AM +0200, Magnus Hagander wrote:
> On Tue, Apr 3, 2018 at 10:48 PM, Michael Banck 
> wrote:
> 
> > Hi,
> >
> > On Tue, Apr 03, 2018 at 08:48:08PM +0200, Magnus Hagander wrote:
> > > On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane  wrote:
> > > I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
> > > > since the way in which the corruption is induced is just guessing
> > > > as to where page boundaries are.
> > >
> > > Yeah, that might be a problem. Those should be calculated from the block
> > > size.
> > >
> > > Also, scribbling on tables as sensitive as pg_class is just asking for
> > > > trouble IMO.  I don't see anything in this test, for example, that
> > > > prevents autovacuum from running and causing a PANIC before the test
> > > > can complete.  Even with AV off, there's a good chance that clobber-
> > > > cache-always animals will fall over because they do so many more
> > > > physical accesses to the system catalogs.  I'd suggest inducing the
> > > > corruption in some user table(s) that we can more tightly constrain
> > > > the source server's accesses to.
> > >
> > > Yeah, that seems like a good idea. And probably also shut the server down
> > > while writing the corruption, just in case.
> > >
> > > Will stick looking into that on my todo for when I'm back, unless beaten
> > to
> > > it. Michael, you want a stab at it?
> >
> > Attached is a patch which does that hopefully:
> >
> > 1. creates two user tables, one large enough for at least 6 blocks
> > (around 360kb), the other just one block.
> >
> > 2. stops the cluster before scribbling over its data and starts it
> > afterwards.
> >
> > 3. uses the blocksize (and the pager header size) to determine offsets
> > for scribbling.
> >
> > I've tested it with blocksizes 8 and 32 now, the latter should make sure
> > that the first table is indeed large enough, but maybe something less
> > arbitrary than "1 integers" should be used?
> >
> > Anyway, sorry for the hassle.
> >
> 
> Applied, with the addition that I explicitly disabled autovacuum on those
> tables as well.
 
Thanks! It looks like there were no further builfarm failures so far,
let's see how this goes.

> We might want to enhance it further by calculating the figure 10,000 based
> on blocksize perhaps?

10,000 was roughly twice the size needed for 32k block sizes. If there
are concerns that this might not be enough, I am happy to invest some
more time here (next week probably). However, the pg_basebackup
testsuite takes up 800+ MB to run, so I don't see the urgent need of
optimizing away 50-100 KB (which clearly everybody else thought as well)
if we are talking about disk space overhead.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer



Re: pgsql: Validate page level checksums in base backups

2018-04-04 Thread Magnus Hagander
On Tue, Apr 3, 2018 at 10:48 PM, Michael Banck 
wrote:

> Hi,
>
> On Tue, Apr 03, 2018 at 08:48:08PM +0200, Magnus Hagander wrote:
> > On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane  wrote:
> > I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
> > > since the way in which the corruption is induced is just guessing
> > > as to where page boundaries are.
> >
> > Yeah, that might be a problem. Those should be calculated from the block
> > size.
> >
> > Also, scribbling on tables as sensitive as pg_class is just asking for
> > > trouble IMO.  I don't see anything in this test, for example, that
> > > prevents autovacuum from running and causing a PANIC before the test
> > > can complete.  Even with AV off, there's a good chance that clobber-
> > > cache-always animals will fall over because they do so many more
> > > physical accesses to the system catalogs.  I'd suggest inducing the
> > > corruption in some user table(s) that we can more tightly constrain
> > > the source server's accesses to.
> >
> > Yeah, that seems like a good idea. And probably also shut the server down
> > while writing the corruption, just in case.
> >
> > Will stick looking into that on my todo for when I'm back, unless beaten
> to
> > it. Michael, you want a stab at it?
>
> Attached is a patch which does that hopefully:
>
> 1. creates two user tables, one large enough for at least 6 blocks
> (around 360kb), the other just one block.
>
> 2. stops the cluster before scribbling over its data and starts it
> afterwards.
>
> 3. uses the blocksize (and the pager header size) to determine offsets
> for scribbling.
>
> I've tested it with blocksizes 8 and 32 now, the latter should make sure
> that the first table is indeed large enough, but maybe something less
> arbitrary than "1 integers" should be used?
>
> Anyway, sorry for the hassle.
>

Applied, with the addition that I explicitly disabled autovacuum on those
tables as well.

We might want to enhance it further by calculating the figure 10,000 based
on blocksize perhaps?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread David Steele

On 4/3/18 4:48 PM, Michael Banck wrote:


Attached is a patch which does that hopefully:

1. creates two user tables, one large enough for at least 6 blocks
(around 360kb), the other just one block.

2. stops the cluster before scribbling over its data and starts it
afterwards.

3. uses the blocksize (and the pager header size) to determine offsets
for scribbling.


This patch looks reasonable to me.


I've tested it with blocksizes 8 and 32 now, the latter should make sure
that the first table is indeed large enough, but maybe something less
arbitrary than "1 integers" should be used?


It might be quicker to just stop the cluster and then write out an 
arbitrary number of zero pages.  Zero pages are always considered valid 
so you can then corrupt whichever pages you want for testing.


--
-David
da...@pgmasters.net



Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread Michael Banck
Hi,

On Tue, Apr 03, 2018 at 08:48:08PM +0200, Magnus Hagander wrote:
> On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane  wrote:
> I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
> > since the way in which the corruption is induced is just guessing
> > as to where page boundaries are.
> 
> Yeah, that might be a problem. Those should be calculated from the block
> size.
> 
> Also, scribbling on tables as sensitive as pg_class is just asking for
> > trouble IMO.  I don't see anything in this test, for example, that
> > prevents autovacuum from running and causing a PANIC before the test
> > can complete.  Even with AV off, there's a good chance that clobber-
> > cache-always animals will fall over because they do so many more
> > physical accesses to the system catalogs.  I'd suggest inducing the
> > corruption in some user table(s) that we can more tightly constrain
> > the source server's accesses to.
> 
> Yeah, that seems like a good idea. And probably also shut the server down
> while writing the corruption, just in case.
> 
> Will stick looking into that on my todo for when I'm back, unless beaten to
> it. Michael, you want a stab at it?

Attached is a patch which does that hopefully:

1. creates two user tables, one large enough for at least 6 blocks
(around 360kb), the other just one block.

2. stops the cluster before scribbling over its data and starts it
afterwards.

3. uses the blocksize (and the pager header size) to determine offsets
for scribbling.

I've tested it with blocksizes 8 and 32 now, the latter should make sure
that the first table is indeed large enough, but maybe something less
arbitrary than "1 integers" should be used?

Anyway, sorry for the hassle.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 3162cdcd01..3fe49f68a7 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -406,19 +406,25 @@ like(
 my $checksum = $node->safe_psql('postgres', 'SHOW data_checksums;');
 is($checksum, 'on', 'checksums are enabled');
 
-# get relfilenodes of relations to corrupt
-my $pg_class = $node->safe_psql('postgres',
-	q{SELECT pg_relation_filepath('pg_class')}
+# create tables to corrupt and get their relfilenodes
+my $file_corrupt1 = $node->safe_psql('postgres',
+q{SELECT a INTO corrupt1 FROM generate_series(1,1) AS a; SELECT pg_relation_filepath('corrupt1')}
 );
-my $pg_index = $node->safe_psql('postgres',
-	q{SELECT pg_relation_filepath('pg_index')}
+my $file_corrupt2 = $node->safe_psql('postgres',
+q{SELECT b INTO corrupt2 FROM generate_series(1,2) AS b; SELECT pg_relation_filepath('corrupt2')}
 );
 
+# set page header and block sizes
+my $pageheader_size = 24;
+my $block_size = $node->safe_psql('postgres', 'SHOW block_size;');
+
 # induce corruption
-open $file, '+<', "$pgdata/$pg_class";
-seek($file, 4000, 0);
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt1";
+seek($file, $pageheader_size, 0);
 syswrite($file, '\0\0\0\0\0\0\0\0\0');
 close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
 
 $node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt"],
 	1,
@@ -428,13 +434,15 @@ $node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt"],
 );
 
 # induce further corruption in 5 more blocks
-open $file, '+<', "$pgdata/$pg_class";
-my @offsets = (12192, 20384, 28576, 36768, 44960);
-foreach my $offset (@offsets) {
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt1";
+for my $i ( 1..5 ) {
+  my $offset = $pageheader_size + $i * $block_size;
   seek($file, $offset, 0);
   syswrite($file, '\0\0\0\0\0\0\0\0\0');
 }
 close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
 
 $node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2"],
 1,
@@ -444,10 +452,12 @@ $node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2"],
 );
 
 # induce corruption in a second file
-open $file, '+<', "$pgdata/$pg_index";
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt2";
 seek($file, 4000, 0);
 syswrite($file, '\0\0\0\0\0\0\0\0\0');
 close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
 
 $node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt3"],
 1,
@@ -460,3 +470,6 @@ $node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt3"],
 $node->command_ok(
 	[   'pg_basebackup', '-D', "$tempdir/backup_corrupt4", '-k' ],
 	'pg_basebackup with -k does not report 

Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread Tom Lane
Magnus Hagander  writes:
> On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane  wrote:
>> It's scribbling on the source cluster's disk files and assuming that that
>> translates one-for-one to what gets sent to the slave server --- but what
>> if some of the blocks that it modifies on-disk are resident in the
>> source's shared buffers?  I think you'd have to shut down the source and
>> then apply the corruption if you want stable results.

> It doesn't actually use a slave server as part of the tests.
> And basebackups don't read from the sources shared buffers, but it *does*
> read from the kernel buffers.

Right, so the failure case occurs when the source server writes back a
dirty page from its shared buffers to disk, in between the test script's
corrupting that page on-disk and then trying to read it.

Again, you'd have several orders of magnitude better chance of getting
reproducible behavior if you were targeting something that wasn't a
system catalog.

regards, tom lane



Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread Magnus Hagander
On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane  wrote:

> Magnus Hagander  writes:
> > Yeah, there's clearly a second problem here.
>
> I think this test script is broken in many ways.
>
> It's scribbling on the source cluster's disk files and assuming that that
> translates one-for-one to what gets sent to the slave server --- but what
> if some of the blocks that it modifies on-disk are resident in the
> source's shared buffers?  I think you'd have to shut down the source and
> then apply the corruption if you want stable results.
>

It doesn't actually use a slave server as part of the tests.

And basebackups don't read from the sources shared buffers, but it *does*
read from the kernel buffers.


I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
> since the way in which the corruption is induced is just guessing
> as to where page boundaries are.
>

Yeah, that might be a problem. Those should be calculated from the block
size.


Also, scribbling on tables as sensitive as pg_class is just asking for
> trouble IMO.  I don't see anything in this test, for example, that
> prevents autovacuum from running and causing a PANIC before the test
> can complete.  Even with AV off, there's a good chance that clobber-
> cache-always animals will fall over because they do so many more
> physical accesses to the system catalogs.  I'd suggest inducing the
> corruption in some user table(s) that we can more tightly constrain
> the source server's accesses to.
>

Yeah, that seems like a good idea. And probably also shut the server down
while writing the corruption, just in case.

Will stick looking into that on my todo for when I'm back, unless beaten to
it. Michael, you want a stab at it?

//Magnus


Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread Peter Geoghegan
On Tue, Apr 3, 2018 at 11:29 AM, Tom Lane  wrote:
> Also, scribbling on tables as sensitive as pg_class is just asking for
> trouble IMO.  I don't see anything in this test, for example, that
> prevents autovacuum from running and causing a PANIC before the test
> can complete.

+1

> Even with AV off, there's a good chance that clobber-
> cache-always animals will fall over because they do so many more
> physical accesses to the system catalogs.  I'd suggest inducing the
> corruption in some user table(s) that we can more tightly constrain
> the source server's accesses to.

I've simulated quite a few novel corruption scenarios using pg_hexedit
in the past year. It would be nice if pg_prewarm offered an easy way
to evict from shared_buffers for repeated testing. Obviously you can
accomplish the same thing in other ways, but that isn't ideal, and
particularly hinders automated testing.

-- 
Peter Geoghegan



Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread Tom Lane
Magnus Hagander  writes:
> Yeah, there's clearly a second problem here.

I think this test script is broken in many ways.

It's scribbling on the source cluster's disk files and assuming that that
translates one-for-one to what gets sent to the slave server --- but what
if some of the blocks that it modifies on-disk are resident in the
source's shared buffers?  I think you'd have to shut down the source and
then apply the corruption if you want stable results.

I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
since the way in which the corruption is induced is just guessing
as to where page boundaries are.

Also, scribbling on tables as sensitive as pg_class is just asking for
trouble IMO.  I don't see anything in this test, for example, that
prevents autovacuum from running and causing a PANIC before the test
can complete.  Even with AV off, there's a good chance that clobber-
cache-always animals will fall over because they do so many more
physical accesses to the system catalogs.  I'd suggest inducing the
corruption in some user table(s) that we can more tightly constrain
the source server's accesses to.

regards, tom lane



Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread Magnus Hagander
On Tue, Apr 3, 2018 at 7:13 PM, Andres Freund  wrote:

> On 2018-04-03 11:52:26 +, Magnus Hagander wrote:
> > Validate page level checksums in base backups
> >
> > When base backups are run over the replication protocol (for example
> > using pg_basebackup), verify the checksums of all data blocks if
> > checksums are enabled. If checksum failures are encountered, log them
> > as warnings but don't abort the backup.
> >
> > This becomes the default behaviour in pg_basebackup (provided checksums
> > are enabled on the server), so add a switch (-k) to disable the checks
> > if necessary.
> >
> > Author: Michael Banck
> > Reviewed-By: Magnus Hagander, David Steele
> > Discussion: https://postgr.es/m/20180228180856.GE13784@
> nighthawk.caipicrew.dd-dns.de
>
> Hm, something isn't quite right with the test:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=
> pogona=2018-04-03%2016%3A10%3A01
>
> # WARNING:  6 total checksum verification failures
> # pg_basebackup: checksum error occured
> # '
> # doesn't match '(?^s:^WARNING.*7 total checksum verification
> failures)'
>

Yeah, there's clearly a second problem here.

I'm stuck in meetings most of tonight, so I might not get to it right away.
If someonen else figures it out meanwhile please do, otherwise I'll get to
it as soon as I can.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread Andres Freund
On 2018-04-03 11:52:26 +, Magnus Hagander wrote:
> Validate page level checksums in base backups
> 
> When base backups are run over the replication protocol (for example
> using pg_basebackup), verify the checksums of all data blocks if
> checksums are enabled. If checksum failures are encountered, log them
> as warnings but don't abort the backup.
> 
> This becomes the default behaviour in pg_basebackup (provided checksums
> are enabled on the server), so add a switch (-k) to disable the checks
> if necessary.
> 
> Author: Michael Banck
> Reviewed-By: Magnus Hagander, David Steele
> Discussion: 
> https://postgr.es/m/20180228180856.ge13...@nighthawk.caipicrew.dd-dns.de

Hm, something isn't quite right with the test:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona=2018-04-03%2016%3A10%3A01

# WARNING:  6 total checksum verification failures
# pg_basebackup: checksum error occured
# '
# doesn't match '(?^s:^WARNING.*7 total checksum verification failures)'

Greetings,

Andres Freund



pgsql: Validate page level checksums in base backups

2018-04-03 Thread Magnus Hagander
Validate page level checksums in base backups

When base backups are run over the replication protocol (for example
using pg_basebackup), verify the checksums of all data blocks if
checksums are enabled. If checksum failures are encountered, log them
as warnings but don't abort the backup.

This becomes the default behaviour in pg_basebackup (provided checksums
are enabled on the server), so add a switch (-k) to disable the checks
if necessary.

Author: Michael Banck
Reviewed-By: Magnus Hagander, David Steele
Discussion: 
https://postgr.es/m/20180228180856.ge13...@nighthawk.caipicrew.dd-dns.de

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/4eb77d50c21ddd35b77421c27e0d7853eb4f9202

Modified Files
--
doc/src/sgml/protocol.sgml   |  13 +-
doc/src/sgml/ref/pg_basebackup.sgml  |  16 ++
src/backend/replication/basebackup.c | 227 ++-
src/backend/replication/repl_gram.y  |   8 +-
src/backend/replication/repl_scanner.l   |   1 +
src/bin/pg_basebackup/pg_basebackup.c|  38 -
src/bin/pg_basebackup/t/010_pg_basebackup.pl |  62 +++-
7 files changed, 352 insertions(+), 13 deletions(-)