Re: pgsql: Validate page level checksums in base backups

2018-04-15 Thread Magnus Hagander
On Tue, Apr 10, 2018 at 11:45 PM, David Steele  wrote:

> On 4/10/18 5:24 PM, Tomas Vondra wrote:
>
>>
>> I think there's a bug in sendFile(). We do check checksums on all pages
>> that pass this LSN check:
>>
>>  /*
>>   * Only check pages which have not been modified since the
>>   * start of the base backup. Otherwise, they might have been
>>   * written only halfway and the checksum would not be valid.
>>   * However, replaying WAL would reinstate the correct page in
>>   * this case.
>>   */
>>  if (PageGetLSN(page) < startptr)
>>  {
>>  ...
>>  }
>>
>> Now, imagine the page is new, i.e. all-zeroes. That means the LSN is 0/0
>> too, and we'll try to verify the checksum - but we actually do not set
>> checksums on empty pages.
>>
>> So I think it should be something like this:
>>
>>  if ((!PageIsNew(page)) && (PageGetLSN(page) < startptr))
>>  {
>>  ...
>>  }
>>
>> It might be worth verifying that the page is actually all-zeroes (and
>> not just with corrupted pd_upper value. Not sure it's worth it.
>>
>> I've found this by fairly trivial stress testing - running pgbench and
>> pg_basebackup in a loop. It was failing pretty reliably (~75% of runs).
>> With the proposed change I see no further failures.
>>
>
> Good catch, Tomas!
>
> I should have seen this since I had the same issue when I implemented this
> feature in pgBackRest.
>
> Anyway, I agree that your fix looks correct.
>

I've applied this fix, along with the same thing in pg_verify_checksums.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: pgsql: Validate page level checksums in base backups

2018-04-10 Thread Michael Paquier
On Tue, Apr 10, 2018 at 11:35:21PM +0200, Tomas Vondra wrote:
> BTW pg_verify_checksums needs the same fix.

Yes you are right here.  Just for the record: what needs to be done is
to check for PageIsNew() in scan_file() before fetching
pg_checksum_page() and after reading an individual block, which applies
even to what HEAD has for pg_verify_checksums.c.

I am adding an open item for this one.
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: Validate page level checksums in base backups

2018-04-10 Thread David Steele

On 4/10/18 5:24 PM, Tomas Vondra wrote:


I think there's a bug in sendFile(). We do check checksums on all pages
that pass this LSN check:

 /*
  * Only check pages which have not been modified since the
  * start of the base backup. Otherwise, they might have been
  * written only halfway and the checksum would not be valid.
  * However, replaying WAL would reinstate the correct page in
  * this case.
  */
 if (PageGetLSN(page) < startptr)
 {
 ...
 }

Now, imagine the page is new, i.e. all-zeroes. That means the LSN is 0/0
too, and we'll try to verify the checksum - but we actually do not set
checksums on empty pages.

So I think it should be something like this:

 if ((!PageIsNew(page)) && (PageGetLSN(page) < startptr))
 {
 ...
 }

It might be worth verifying that the page is actually all-zeroes (and
not just with corrupted pd_upper value. Not sure it's worth it.

I've found this by fairly trivial stress testing - running pgbench and
pg_basebackup in a loop. It was failing pretty reliably (~75% of runs).
With the proposed change I see no further failures.


Good catch, Tomas!

I should have seen this since I had the same issue when I implemented 
this feature in pgBackRest.


Anyway, I agree that your fix looks correct.

Thanks,
--
-David
da...@pgmasters.net



Re: pgsql: Validate page level checksums in base backups

2018-04-10 Thread Tomas Vondra


On 04/10/2018 11:24 PM, Tomas Vondra wrote:
> Hi,
> 
> I think there's a bug in sendFile(). We do check checksums on all pages
> that pass this LSN check:
> 
> /*
>  * Only check pages which have not been modified since the
>  * start of the base backup. Otherwise, they might have been
>  * written only halfway and the checksum would not be valid.
>  * However, replaying WAL would reinstate the correct page in
>  * this case.
>  */
> if (PageGetLSN(page) < startptr)
> {
> ...
> }
> 
> Now, imagine the page is new, i.e. all-zeroes. That means the LSN is 0/0
> too, and we'll try to verify the checksum - but we actually do not set
> checksums on empty pages.
> 
> So I think it should be something like this:
> 
> if ((!PageIsNew(page)) && (PageGetLSN(page) < startptr))
> {
> ...
> }
> 
> It might be worth verifying that the page is actually all-zeroes (and
> not just with corrupted pd_upper value. Not sure it's worth it.
> 
> I've found this by fairly trivial stress testing - running pgbench and
> pg_basebackup in a loop. It was failing pretty reliably (~75% of runs).
> With the proposed change I see no further failures.
> 

BTW pg_verify_checksums needs the same fix.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: Validate page level checksums in base backups

2018-04-10 Thread Tomas Vondra
Hi,

I think there's a bug in sendFile(). We do check checksums on all pages
that pass this LSN check:

/*
 * Only check pages which have not been modified since the
 * start of the base backup. Otherwise, they might have been
 * written only halfway and the checksum would not be valid.
 * However, replaying WAL would reinstate the correct page in
 * this case.
 */
if (PageGetLSN(page) < startptr)
{
...
}

Now, imagine the page is new, i.e. all-zeroes. That means the LSN is 0/0
too, and we'll try to verify the checksum - but we actually do not set
checksums on empty pages.

So I think it should be something like this:

if ((!PageIsNew(page)) && (PageGetLSN(page) < startptr))
{
...
}

It might be worth verifying that the page is actually all-zeroes (and
not just with corrupted pd_upper value. Not sure it's worth it.

I've found this by fairly trivial stress testing - running pgbench and
pg_basebackup in a loop. It was failing pretty reliably (~75% of runs).
With the proposed change I see no further failures.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: Validate page level checksums in base backups

2018-04-06 Thread Magnus Hagander
On Thu, Apr 5, 2018 at 2:41 PM, Michael Banck 
wrote:

> Hi,
>
> On Thu, Apr 05, 2018 at 01:02:27PM +0200, Magnus Hagander wrote:
> > On Wed, Apr 4, 2018 at 8:22 PM, Michael Banck  >
> > wrote:
> > > Otherwise, I had a quick look and there is no obvious outlier; the
> > > pgdata is 220 MB after the testrun (195 MB of which is WAL, maybe that
> > > could be cut down somehow?) and the base backups are 22-40 MB each, and
> > > there is around 20 of them, so that adds up to more than 750 MB.
> >
> > It certainly seems reasonable to delete the base backups once they're
> made,
> > after each step, rather than keeping them around forever.
>
> I had a look at this and found a copy-pasto in one of the test cases
> while testing, patch attached.
>

Cute. Applied.


I've also attached a second patch (that applies on top of the first)
> that removes the base backups once they are no longer needed, also
> attached (but see below).
>
> > Do we have a precedent somewhere for how we do this, or does our test
> > framework already have a way to do it? How are all the actual data
> > directories etc cleaned up?
>
> They (and the base backups) are getting purged on success of the whole
> testsuite. So to be clear - we are not leaving behind 1 GB of disk space
> on success, but we use 1 GB of disk space during the test.
>
> > Or should it just be a matter of sprinkling some unlink() calls
> throughout
> > the test file?
>
> I used rmtree() from File::Path (which is also used by PostgresNode to
> clean up) to remove them during the run.
>
>
This seems fine to me. Applied as well.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: pgsql: Validate page level checksums in base backups

2018-04-05 Thread Alvaro Herrera
Michael Banck wrote:
> Hi,

> > Do we have a precedent somewhere for how we do this, or does our test
> > framework already have a way to do it? How are all the actual data
> > directories etc cleaned up?
> 
> They (and the base backups) are getting purged on success of the whole
> testsuite. So to be clear - we are not leaving behind 1 GB of disk space
> on success, but we use 1 GB of disk space during the test.

Yeah, but it means you force the OS to keep 1 GB of useless dung in
cache.  I would hope that creating 40 MB, deleting those, then creating
further 40 MB (lather, rinse, repeat) would not overall evict 1 GB from
my OS cache.  Unless the OS keeps the unlinked files in cache, which
would be stupid.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: Validate page level checksums in base backups

2018-04-05 Thread Michael Banck
Hi,

On Thu, Apr 05, 2018 at 01:02:27PM +0200, Magnus Hagander wrote:
> On Wed, Apr 4, 2018 at 8:22 PM, Michael Banck 
> wrote:
> > Otherwise, I had a quick look and there is no obvious outlier; the
> > pgdata is 220 MB after the testrun (195 MB of which is WAL, maybe that
> > could be cut down somehow?) and the base backups are 22-40 MB each, and
> > there is around 20 of them, so that adds up to more than 750 MB.
>
> It certainly seems reasonable to delete the base backups once they're made,
> after each step, rather than keeping them around forever.

I had a look at this and found a copy-pasto in one of the test cases
while testing, patch attached.

I've also attached a second patch (that applies on top of the first)
that removes the base backups once they are no longer needed, also
attached (but see below).
 
> Do we have a precedent somewhere for how we do this, or does our test
> framework already have a way to do it? How are all the actual data
> directories etc cleaned up?

They (and the base backups) are getting purged on success of the whole
testsuite. So to be clear - we are not leaving behind 1 GB of disk space
on success, but we use 1 GB of disk space during the test.
 
> Or should it just be a matter of sprinkling some unlink() calls throughout
> the test file?

I used rmtree() from File::Path (which is also used by PostgresNode to
clean up) to remove them during the run.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index e2f1465472..8e2d1ddb2c 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -334,7 +334,7 @@ ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxf/pg_wal")),
 $node->command_ok(
 	[ 'pg_basebackup', '-D', "$tempdir/backupxs", '-X', 'stream' ],
 	'pg_basebackup -X stream runs');
-ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxf/pg_wal")),
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxs/pg_wal")),
 	'WAL files copied');
 $node->command_ok(
 	[ 'pg_basebackup', '-D', "$tempdir/backupxst", '-X', 'stream', '-Ft' ],
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index e2f1465472..afb392dbb3 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -3,6 +3,7 @@ use warnings;
 use Cwd;
 use Config;
 use File::Basename qw(basename dirname);
+use File::Path qw(rmtree);
 use PostgresNode;
 use TestLib;
 use Test::More tests => 104;
@@ -135,6 +136,7 @@ foreach my $filename (@tempRelationFiles)
 # Make sure existing backup_label was ignored.
 isnt(slurp_file("$tempdir/backup/backup_label"),
 	'DONOTCOPY', 'existing backup_label not copied');
+rmtree("$tempdir/backup");
 
 $node->command_ok(
 	[   'pg_basebackup', '-D', "$tempdir/backup2", '--waldir',
@@ -142,10 +144,13 @@ $node->command_ok(
 	'separate xlog directory');
 ok(-f "$tempdir/backup2/PG_VERSION", 'backup was created');
 ok(-d "$tempdir/xlog2/", 'xlog directory was created');
+rmtree("$tempdir/backup2");
+rmtree("$tempdir/xlog2");
 
 $node->command_ok([ 'pg_basebackup', '-D', "$tempdir/tarbackup", '-Ft' ],
 	'tar format');
 ok(-f "$tempdir/tarbackup/base.tar", 'backup tar was created');
+rmtree("$tempdir/tarbackup");
 
 $node->command_fails(
 	[ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T=/foo" ],
@@ -212,6 +217,7 @@ SKIP:
 	ok(-f "$tempdir/tarbackup2/base.tar", 'backup tar was created');
 	my @tblspc_tars = glob "$tempdir/tarbackup2/[0-9]*.tar";
 	is(scalar(@tblspc_tars), 1, 'one tablespace tar was created');
+	rmtree("$tempdir/tarbackup2");
 
 	# Create an unlogged table to test that forks other than init are not copied.
 	$node->safe_psql('postgres',
@@ -281,6 +287,7 @@ SKIP:
 
 	ok( -d "$tempdir/backup1/pg_replslot",
 		'pg_replslot symlink copied as directory');
+	rmtree("$tempdir/backup1");
 
 	mkdir "$tempdir/tbl=spc2";
 	$node->safe_psql('postgres', "DROP TABLE test1;");
@@ -295,6 +302,7 @@ SKIP:
 	ok(-d "$tempdir/tbackup/tbl=spc2",
 		'tablespace with = sign was relocated');
 	$node->safe_psql('postgres', "DROP TABLESPACE tblspc2;");
+	rmtree("$tempdir/backup3");
 
 	mkdir "$tempdir/$superlongname";
 	$node->safe_psql('postgres',
@@ -303,12 +311,14 @@ SKIP:
 		[ 'pg_basebackup', '-D', "$tempdir/tarbackup_l3", '-Ft' ],
 		'pg_basebackup tar with long symlink target');
 	$node->safe_psql('postgres', "DROP TABLESPACE tblspc3;");
+	rmtree("$tempdir/tarbackup_l3");
 }
 
 $node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backupR", '-R' ],
 	'pg_basebackup -R runs');
 ok(-f "$tempdir/backupR/recovery.conf", 'recovery.conf was created');
 my $recover

Re: pgsql: Validate page level checksums in base backups

2018-04-05 Thread Magnus Hagander
On Wed, Apr 4, 2018 at 8:22 PM, Michael Banck 
wrote:

> Hi,
>
> On Wed, Apr 04, 2018 at 07:25:11PM +0200, Magnus Hagander wrote:
> > On Wed, Apr 4, 2018 at 3:47 PM, Alvaro Herrera 
> > wrote:
> > > Michael Banck wrote:
> > >
> > > > However, the pg_basebackup testsuite takes up 800+ MB to run,
> > >
> > > Uh, you're right.  This seems a bit over the top.  Can we reduce that
> > > without losing coverage?  We've gone great lengths to avoid large
> > > amounts of data in tests elsewhere.
> >
> > That didn't come out of this patch, right? This is a pre-existing issue?
>
> It was around that ballpark before, but this patch probably made
> things worse as it adds four additional datadirs (around 40 MB each
> here) and we are close to 1 GB now.
>
> I haven't looked at the other testsuites, but if it is ok to cleanup the
> basebackups after each set of tests suceeded, that would alleviate the
> problem.
>
> Otherwise, I had a quick look and there is no obvious outlier; the
> pgdata is 220 MB after the testrun (195 MB of which is WAL, maybe that
> could be cut down somehow?) and the base backups are 22-40 MB each, and
> there is around 20 of them, so that adds up to more than 750 MB.
>
>
It certainly seems reasonable to delete the base backups once they're made,
after each step, rather than keeping them around forever.

Do we have a precedent somewhere for how we do this, or does our test
framework already have a way to do it? How are all the actual data
directories etc cleaned up?

Or should it just be a matter of sprinkling some unlink() calls throughout
the test file?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: pgsql: Validate page level checksums in base backups

2018-04-04 Thread Michael Banck
Hi,

On Wed, Apr 04, 2018 at 07:25:11PM +0200, Magnus Hagander wrote:
> On Wed, Apr 4, 2018 at 3:47 PM, Alvaro Herrera 
> wrote:
> > Michael Banck wrote:
> >
> > > However, the pg_basebackup testsuite takes up 800+ MB to run,
> >
> > Uh, you're right.  This seems a bit over the top.  Can we reduce that
> > without losing coverage?  We've gone great lengths to avoid large
> > amounts of data in tests elsewhere.
> 
> That didn't come out of this patch, right? This is a pre-existing issue?

It was around that ballpark before, but this patch probably made
things worse as it adds four additional datadirs (around 40 MB each
here) and we are close to 1 GB now.

I haven't looked at the other testsuites, but if it is ok to cleanup the
basebackups after each set of tests suceeded, that would alleviate the
problem.

Otherwise, I had a quick look and there is no obvious outlier; the
pgdata is 220 MB after the testrun (195 MB of which is WAL, maybe that
could be cut down somehow?) and the base backups are 22-40 MB each, and
there is around 20 of them, so that adds up to more than 750 MB.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer



Re: pgsql: Validate page level checksums in base backups

2018-04-04 Thread Magnus Hagander
On Wed, Apr 4, 2018 at 3:47 PM, Alvaro Herrera 
wrote:

> Michael Banck wrote:
>
> > However, the pg_basebackup testsuite takes up 800+ MB to run,
>
> Uh, you're right.  This seems a bit over the top.  Can we reduce that
> without losing coverage?  We've gone great lengths to avoid large
> amounts of data in tests elsewhere.
>

That didn't come out of this patch, right? This is a pre-existing issue?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: pgsql: Validate page level checksums in base backups

2018-04-04 Thread Alvaro Herrera
Michael Banck wrote:

> However, the pg_basebackup testsuite takes up 800+ MB to run,

Uh, you're right.  This seems a bit over the top.  Can we reduce that
without losing coverage?  We've gone great lengths to avoid large
amounts of data in tests elsewhere.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: Validate page level checksums in base backups

2018-04-04 Thread Michael Banck
Hi,

On Wed, Apr 04, 2018 at 11:38:35AM +0200, Magnus Hagander wrote:
> On Tue, Apr 3, 2018 at 10:48 PM, Michael Banck 
> wrote:
> 
> > Hi,
> >
> > On Tue, Apr 03, 2018 at 08:48:08PM +0200, Magnus Hagander wrote:
> > > On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane  wrote:
> > > I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
> > > > since the way in which the corruption is induced is just guessing
> > > > as to where page boundaries are.
> > >
> > > Yeah, that might be a problem. Those should be calculated from the block
> > > size.
> > >
> > > Also, scribbling on tables as sensitive as pg_class is just asking for
> > > > trouble IMO.  I don't see anything in this test, for example, that
> > > > prevents autovacuum from running and causing a PANIC before the test
> > > > can complete.  Even with AV off, there's a good chance that clobber-
> > > > cache-always animals will fall over because they do so many more
> > > > physical accesses to the system catalogs.  I'd suggest inducing the
> > > > corruption in some user table(s) that we can more tightly constrain
> > > > the source server's accesses to.
> > >
> > > Yeah, that seems like a good idea. And probably also shut the server down
> > > while writing the corruption, just in case.
> > >
> > > Will stick looking into that on my todo for when I'm back, unless beaten
> > to
> > > it. Michael, you want a stab at it?
> >
> > Attached is a patch which does that hopefully:
> >
> > 1. creates two user tables, one large enough for at least 6 blocks
> > (around 360kb), the other just one block.
> >
> > 2. stops the cluster before scribbling over its data and starts it
> > afterwards.
> >
> > 3. uses the blocksize (and the pager header size) to determine offsets
> > for scribbling.
> >
> > I've tested it with blocksizes 8 and 32 now, the latter should make sure
> > that the first table is indeed large enough, but maybe something less
> > arbitrary than "1 integers" should be used?
> >
> > Anyway, sorry for the hassle.
> >
> 
> Applied, with the addition that I explicitly disabled autovacuum on those
> tables as well.
 
Thanks! It looks like there were no further builfarm failures so far,
let's see how this goes.

> We might want to enhance it further by calculating the figure 10,000 based
> on blocksize perhaps?

10,000 was roughly twice the size needed for 32k block sizes. If there
are concerns that this might not be enough, I am happy to invest some
more time here (next week probably). However, the pg_basebackup
testsuite takes up 800+ MB to run, so I don't see the urgent need of
optimizing away 50-100 KB (which clearly everybody else thought as well)
if we are talking about disk space overhead.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer



Re: pgsql: Validate page level checksums in base backups

2018-04-04 Thread Magnus Hagander
On Tue, Apr 3, 2018 at 10:48 PM, Michael Banck 
wrote:

> Hi,
>
> On Tue, Apr 03, 2018 at 08:48:08PM +0200, Magnus Hagander wrote:
> > On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane  wrote:
> > I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
> > > since the way in which the corruption is induced is just guessing
> > > as to where page boundaries are.
> >
> > Yeah, that might be a problem. Those should be calculated from the block
> > size.
> >
> > Also, scribbling on tables as sensitive as pg_class is just asking for
> > > trouble IMO.  I don't see anything in this test, for example, that
> > > prevents autovacuum from running and causing a PANIC before the test
> > > can complete.  Even with AV off, there's a good chance that clobber-
> > > cache-always animals will fall over because they do so many more
> > > physical accesses to the system catalogs.  I'd suggest inducing the
> > > corruption in some user table(s) that we can more tightly constrain
> > > the source server's accesses to.
> >
> > Yeah, that seems like a good idea. And probably also shut the server down
> > while writing the corruption, just in case.
> >
> > Will stick looking into that on my todo for when I'm back, unless beaten
> to
> > it. Michael, you want a stab at it?
>
> Attached is a patch which does that hopefully:
>
> 1. creates two user tables, one large enough for at least 6 blocks
> (around 360kb), the other just one block.
>
> 2. stops the cluster before scribbling over its data and starts it
> afterwards.
>
> 3. uses the blocksize (and the pager header size) to determine offsets
> for scribbling.
>
> I've tested it with blocksizes 8 and 32 now, the latter should make sure
> that the first table is indeed large enough, but maybe something less
> arbitrary than "1 integers" should be used?
>
> Anyway, sorry for the hassle.
>

Applied, with the addition that I explicitly disabled autovacuum on those
tables as well.

We might want to enhance it further by calculating the figure 10,000 based
on blocksize perhaps?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread David Steele

On 4/3/18 4:48 PM, Michael Banck wrote:


Attached is a patch which does that hopefully:

1. creates two user tables, one large enough for at least 6 blocks
(around 360kb), the other just one block.

2. stops the cluster before scribbling over its data and starts it
afterwards.

3. uses the blocksize (and the pager header size) to determine offsets
for scribbling.


This patch looks reasonable to me.


I've tested it with blocksizes 8 and 32 now, the latter should make sure
that the first table is indeed large enough, but maybe something less
arbitrary than "1 integers" should be used?


It might be quicker to just stop the cluster and then write out an 
arbitrary number of zero pages.  Zero pages are always considered valid 
so you can then corrupt whichever pages you want for testing.


--
-David
da...@pgmasters.net



Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread Michael Banck
Hi,

On Tue, Apr 03, 2018 at 08:48:08PM +0200, Magnus Hagander wrote:
> On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane  wrote:
> I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
> > since the way in which the corruption is induced is just guessing
> > as to where page boundaries are.
> 
> Yeah, that might be a problem. Those should be calculated from the block
> size.
> 
> Also, scribbling on tables as sensitive as pg_class is just asking for
> > trouble IMO.  I don't see anything in this test, for example, that
> > prevents autovacuum from running and causing a PANIC before the test
> > can complete.  Even with AV off, there's a good chance that clobber-
> > cache-always animals will fall over because they do so many more
> > physical accesses to the system catalogs.  I'd suggest inducing the
> > corruption in some user table(s) that we can more tightly constrain
> > the source server's accesses to.
> 
> Yeah, that seems like a good idea. And probably also shut the server down
> while writing the corruption, just in case.
> 
> Will stick looking into that on my todo for when I'm back, unless beaten to
> it. Michael, you want a stab at it?

Attached is a patch which does that hopefully:

1. creates two user tables, one large enough for at least 6 blocks
(around 360kb), the other just one block.

2. stops the cluster before scribbling over its data and starts it
afterwards.

3. uses the blocksize (and the pager header size) to determine offsets
for scribbling.

I've tested it with blocksizes 8 and 32 now, the latter should make sure
that the first table is indeed large enough, but maybe something less
arbitrary than "1 integers" should be used?

Anyway, sorry for the hassle.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 3162cdcd01..3fe49f68a7 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -406,19 +406,25 @@ like(
 my $checksum = $node->safe_psql('postgres', 'SHOW data_checksums;');
 is($checksum, 'on', 'checksums are enabled');
 
-# get relfilenodes of relations to corrupt
-my $pg_class = $node->safe_psql('postgres',
-	q{SELECT pg_relation_filepath('pg_class')}
+# create tables to corrupt and get their relfilenodes
+my $file_corrupt1 = $node->safe_psql('postgres',
+q{SELECT a INTO corrupt1 FROM generate_series(1,1) AS a; SELECT pg_relation_filepath('corrupt1')}
 );
-my $pg_index = $node->safe_psql('postgres',
-	q{SELECT pg_relation_filepath('pg_index')}
+my $file_corrupt2 = $node->safe_psql('postgres',
+q{SELECT b INTO corrupt2 FROM generate_series(1,2) AS b; SELECT pg_relation_filepath('corrupt2')}
 );
 
+# set page header and block sizes
+my $pageheader_size = 24;
+my $block_size = $node->safe_psql('postgres', 'SHOW block_size;');
+
 # induce corruption
-open $file, '+<', "$pgdata/$pg_class";
-seek($file, 4000, 0);
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt1";
+seek($file, $pageheader_size, 0);
 syswrite($file, '\0\0\0\0\0\0\0\0\0');
 close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
 
 $node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt"],
 	1,
@@ -428,13 +434,15 @@ $node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt"],
 );
 
 # induce further corruption in 5 more blocks
-open $file, '+<', "$pgdata/$pg_class";
-my @offsets = (12192, 20384, 28576, 36768, 44960);
-foreach my $offset (@offsets) {
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt1";
+for my $i ( 1..5 ) {
+  my $offset = $pageheader_size + $i * $block_size;
   seek($file, $offset, 0);
   syswrite($file, '\0\0\0\0\0\0\0\0\0');
 }
 close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
 
 $node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2"],
 1,
@@ -444,10 +452,12 @@ $node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2"],
 );
 
 # induce corruption in a second file
-open $file, '+<', "$pgdata/$pg_index";
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt2";
 seek($file, 4000, 0);
 syswrite($file, '\0\0\0\0\0\0\0\0\0');
 close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
 
 $node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt3"],
 1,
@@ -460,3 +470,6 @@ $node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt3"],
 $node->command_ok(
 	[   'pg_basebackup', '-D', "$tempdir/backup_corrupt4", '-k' ],
 	'pg_basebackup with -k does not report checksum mismatch');
+
+

Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread Tom Lane
Magnus Hagander  writes:
> On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane  wrote:
>> It's scribbling on the source cluster's disk files and assuming that that
>> translates one-for-one to what gets sent to the slave server --- but what
>> if some of the blocks that it modifies on-disk are resident in the
>> source's shared buffers?  I think you'd have to shut down the source and
>> then apply the corruption if you want stable results.

> It doesn't actually use a slave server as part of the tests.
> And basebackups don't read from the sources shared buffers, but it *does*
> read from the kernel buffers.

Right, so the failure case occurs when the source server writes back a
dirty page from its shared buffers to disk, in between the test script's
corrupting that page on-disk and then trying to read it.

Again, you'd have several orders of magnitude better chance of getting
reproducible behavior if you were targeting something that wasn't a
system catalog.

regards, tom lane



Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread Magnus Hagander
On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane  wrote:

> Magnus Hagander  writes:
> > Yeah, there's clearly a second problem here.
>
> I think this test script is broken in many ways.
>
> It's scribbling on the source cluster's disk files and assuming that that
> translates one-for-one to what gets sent to the slave server --- but what
> if some of the blocks that it modifies on-disk are resident in the
> source's shared buffers?  I think you'd have to shut down the source and
> then apply the corruption if you want stable results.
>

It doesn't actually use a slave server as part of the tests.

And basebackups don't read from the sources shared buffers, but it *does*
read from the kernel buffers.


I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
> since the way in which the corruption is induced is just guessing
> as to where page boundaries are.
>

Yeah, that might be a problem. Those should be calculated from the block
size.


Also, scribbling on tables as sensitive as pg_class is just asking for
> trouble IMO.  I don't see anything in this test, for example, that
> prevents autovacuum from running and causing a PANIC before the test
> can complete.  Even with AV off, there's a good chance that clobber-
> cache-always animals will fall over because they do so many more
> physical accesses to the system catalogs.  I'd suggest inducing the
> corruption in some user table(s) that we can more tightly constrain
> the source server's accesses to.
>

Yeah, that seems like a good idea. And probably also shut the server down
while writing the corruption, just in case.

Will stick looking into that on my todo for when I'm back, unless beaten to
it. Michael, you want a stab at it?

//Magnus


Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread Peter Geoghegan
On Tue, Apr 3, 2018 at 11:29 AM, Tom Lane  wrote:
> Also, scribbling on tables as sensitive as pg_class is just asking for
> trouble IMO.  I don't see anything in this test, for example, that
> prevents autovacuum from running and causing a PANIC before the test
> can complete.

+1

> Even with AV off, there's a good chance that clobber-
> cache-always animals will fall over because they do so many more
> physical accesses to the system catalogs.  I'd suggest inducing the
> corruption in some user table(s) that we can more tightly constrain
> the source server's accesses to.

I've simulated quite a few novel corruption scenarios using pg_hexedit
in the past year. It would be nice if pg_prewarm offered an easy way
to evict from shared_buffers for repeated testing. Obviously you can
accomplish the same thing in other ways, but that isn't ideal, and
particularly hinders automated testing.

-- 
Peter Geoghegan



Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread Tom Lane
Magnus Hagander  writes:
> Yeah, there's clearly a second problem here.

I think this test script is broken in many ways.

It's scribbling on the source cluster's disk files and assuming that that
translates one-for-one to what gets sent to the slave server --- but what
if some of the blocks that it modifies on-disk are resident in the
source's shared buffers?  I think you'd have to shut down the source and
then apply the corruption if you want stable results.

I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
since the way in which the corruption is induced is just guessing
as to where page boundaries are.

Also, scribbling on tables as sensitive as pg_class is just asking for
trouble IMO.  I don't see anything in this test, for example, that
prevents autovacuum from running and causing a PANIC before the test
can complete.  Even with AV off, there's a good chance that clobber-
cache-always animals will fall over because they do so many more
physical accesses to the system catalogs.  I'd suggest inducing the
corruption in some user table(s) that we can more tightly constrain
the source server's accesses to.

regards, tom lane



Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread Magnus Hagander
On Tue, Apr 3, 2018 at 7:13 PM, Andres Freund  wrote:

> On 2018-04-03 11:52:26 +, Magnus Hagander wrote:
> > Validate page level checksums in base backups
> >
> > When base backups are run over the replication protocol (for example
> > using pg_basebackup), verify the checksums of all data blocks if
> > checksums are enabled. If checksum failures are encountered, log them
> > as warnings but don't abort the backup.
> >
> > This becomes the default behaviour in pg_basebackup (provided checksums
> > are enabled on the server), so add a switch (-k) to disable the checks
> > if necessary.
> >
> > Author: Michael Banck
> > Reviewed-By: Magnus Hagander, David Steele
> > Discussion: https://postgr.es/m/20180228180856.GE13784@
> nighthawk.caipicrew.dd-dns.de
>
> Hm, something isn't quite right with the test:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=
> pogona&dt=2018-04-03%2016%3A10%3A01
>
> # WARNING:  6 total checksum verification failures
> # pg_basebackup: checksum error occured
> # '
> # doesn't match '(?^s:^WARNING.*7 total checksum verification
> failures)'
>

Yeah, there's clearly a second problem here.

I'm stuck in meetings most of tonight, so I might not get to it right away.
If someonen else figures it out meanwhile please do, otherwise I'll get to
it as soon as I can.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread Andres Freund
On 2018-04-03 11:52:26 +, Magnus Hagander wrote:
> Validate page level checksums in base backups
> 
> When base backups are run over the replication protocol (for example
> using pg_basebackup), verify the checksums of all data blocks if
> checksums are enabled. If checksum failures are encountered, log them
> as warnings but don't abort the backup.
> 
> This becomes the default behaviour in pg_basebackup (provided checksums
> are enabled on the server), so add a switch (-k) to disable the checks
> if necessary.
> 
> Author: Michael Banck
> Reviewed-By: Magnus Hagander, David Steele
> Discussion: 
> https://postgr.es/m/20180228180856.ge13...@nighthawk.caipicrew.dd-dns.de

Hm, something isn't quite right with the test:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2018-04-03%2016%3A10%3A01

# WARNING:  6 total checksum verification failures
# pg_basebackup: checksum error occured
# '
# doesn't match '(?^s:^WARNING.*7 total checksum verification failures)'

Greetings,

Andres Freund