Re: More issues with pg_verify_checksums and checksum verification in base backups

2019-08-06 Thread Michael Paquier
On Tue, Aug 06, 2019 at 04:20:43PM -0400, Stephen Frost wrote:
> On Tue, Aug 6, 2019 at 15:45 Magnus Hagander  wrote:
>> When agreement cannot be found, perhaps a parameter is in order?
>>
>> That is, have the tool complain about such files by default but with a
>> HINT that it may or may not be a problem, and a switch that makes it stop
>> complaining?
> 
> WFM.

Fine by me.  I'd also rather not change the behavior that we have now
without the switch.
--
Michael


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2019-08-06 Thread Stephen Frost
Greetings,

On Tue, Aug 6, 2019 at 15:45 Magnus Hagander  wrote:

> On Tue, Aug 6, 2019 at 6:07 PM Stephen Frost  wrote:
>
>> Greetings,
>>
>> * Andres Freund (and...@anarazel.de) wrote:
>> > On 2019-08-06 10:58:15 -0400, Stephen Frost wrote:
>> > > * Michael Banck (michael.ba...@credativ.de) wrote:
>> > > > Independently of the whitelist/blacklist question, I believe
>> > > > pg_checksums should not error out as soon as it encounters a weird
>> looking
>> > > > file, but either (i) still checksum it or (ii) skip it? Or is that
>> to be
>> > > > considered a pilot error and it's fine for pg_checksums to fold?
>> > >
>> > > imv, random files that we don't know about are exactly 'pilot error'
>> to
>> > > be complained about..  This is exactly why the whitelist idea falls
>> > > over.
>> >
>> > I still think this whole assumption is bad, and that you're fixing
>> > non-problems, and creating serious usability issues with zero benefits.
>>
>> I doubt we're going to get to agreement on this, unfortunately.
>>
>
> When agreement cannot be found, perhaps a parameter is in order?
>
> That is, have the tool complain about such files by default but with a
> HINT that it may or may not be a problem, and a switch that makes it stop
> complaining?
>

WFM.

Thanks!

Stephen

>


Re: More issues with pg_verify_checksums and checksum verification in base backups

2019-08-06 Thread Magnus Hagander
On Tue, Aug 6, 2019 at 6:07 PM Stephen Frost  wrote:

> Greetings,
>
> * Andres Freund (and...@anarazel.de) wrote:
> > On 2019-08-06 10:58:15 -0400, Stephen Frost wrote:
> > > * Michael Banck (michael.ba...@credativ.de) wrote:
> > > > Independently of the whitelist/blacklist question, I believe
> > > > pg_checksums should not error out as soon as it encounters a weird
> looking
> > > > file, but either (i) still checksum it or (ii) skip it? Or is that
> to be
> > > > considered a pilot error and it's fine for pg_checksums to fold?
> > >
> > > imv, random files that we don't know about are exactly 'pilot error' to
> > > be complained about..  This is exactly why the whitelist idea falls
> > > over.
> >
> > I still think this whole assumption is bad, and that you're fixing
> > non-problems, and creating serious usability issues with zero benefits.
>
> I doubt we're going to get to agreement on this, unfortunately.
>

When agreement cannot be found, perhaps a parameter is in order?

That is, have the tool complain about such files by default but with a HINT
that it may or may not be a problem, and a switch that makes it stop
complaining?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: More issues with pg_verify_checksums and checksum verification in base backups

2019-08-06 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2019-08-06 10:58:15 -0400, Stephen Frost wrote:
> > * Michael Banck (michael.ba...@credativ.de) wrote:
> > > Independently of the whitelist/blacklist question, I believe
> > > pg_checksums should not error out as soon as it encounters a weird looking
> > > file, but either (i) still checksum it or (ii) skip it? Or is that to be
> > > considered a pilot error and it's fine for pg_checksums to fold?
> > 
> > imv, random files that we don't know about are exactly 'pilot error' to
> > be complained about..  This is exactly why the whitelist idea falls
> > over.
> 
> I still think this whole assumption is bad, and that you're fixing
> non-problems, and creating serious usability issues with zero benefits.

I doubt we're going to get to agreement on this, unfortunately.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2019-08-06 Thread Andres Freund
Hi,

On 2019-08-06 10:58:15 -0400, Stephen Frost wrote:
> * Michael Banck (michael.ba...@credativ.de) wrote:
> > Independently of the whitelist/blacklist question, I believe
> > pg_checksums should not error out as soon as it encounters a weird looking
> > file, but either (i) still checksum it or (ii) skip it? Or is that to be
> > considered a pilot error and it's fine for pg_checksums to fold?
> 
> imv, random files that we don't know about are exactly 'pilot error' to
> be complained about..  This is exactly why the whitelist idea falls
> over.

I still think this whole assumption is bad, and that you're fixing
non-problems, and creating serious usability issues with zero benefits.

Greetings,

Andres Freund




Re: More issues with pg_verify_checksums and checksum verification in base backups

2019-08-06 Thread Stephen Frost
Greetings,

* Michael Banck (michael.ba...@credativ.de) wrote:
> Independently of the whitelist/blacklist question, I believe
> pg_checksums should not error out as soon as it encounters a weird looking
> file, but either (i) still checksum it or (ii) skip it? Or is that to be
> considered a pilot error and it's fine for pg_checksums to fold?

imv, random files that we don't know about are exactly 'pilot error' to
be complained about..  This is exactly why the whitelist idea falls
over.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2019-08-05 Thread Michael Paquier
On Sat, Aug 03, 2019 at 06:47:48PM +0200, Michael Banck wrote:
> first off, a bit of a meta-question: Did the whitelist approach die
> completely, or are we going to tackle it again for v13 or later?

At this stage, it is burried.  Amen.

> This is something I still have in the test suite of my pg_checksums
> fork, cause I never reverted that one from isRelFile() back to
> skipfile() (so it doesn't fail on the above cause `123.' is not
> considered a relation file worth checksumming).

We could actually fix this one.  It is confusing to have pg_checksums
generate a report about a segment number which is actually incorrect.

> Independently of the whitelist/blacklist question, I believe
> pg_checksums should not error out as soon as it encounters a weird looking
> file, but either (i) still checksum it or (ii) skip it? Or is that to be
> considered a pilot error and it's fine for pg_checksums to fold?

That's actually the distinctions between the black and white lists
which would have handled that.
--
Michael


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-29 Thread Michael Paquier
On Thu, Nov 29, 2018 at 11:03:54AM +0900, Michael Paquier wrote:
> Thanks David for the input.  I think that I will be able to finish
> wrapping and commit this stuff tomorrow.

And done.  I kept the split into two commits for clarity as suggested by
Stephen, as bisect would actually complain only when using EXEC_BACKEND,
and only for the test suite of pg_verify_checksums.
--
Michael


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-28 Thread Michael Paquier
On Wed, Nov 28, 2018 at 12:32:29PM -0500, David Steele wrote:
> Looks good to me.

Thanks David for the input.  I think that I will be able to finish
wrapping and commit this stuff tomorrow.
--
Michael


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-28 Thread David Steele
On 11/27/18 8:32 PM, Michael Paquier wrote:
> 
>>> Attached are two patches to fix all the mess:
>>> - 0001 is a revert of the whitelist, minus the set of regression tests
>>> checking after corrupted files and empty files.
>>> - 0002 is a fix for all the issues reported on this thread, with tests
>>> added (including the tablespace test from Michael Banck):
>>> -- Base backups gain EXEC_BACKEND files in their warning filters.
>>> -- pg_verify_checksums gains the same files.
>>> -- temporary files are filtered out.
>>> -- pg_verify_checksums performs filtering checks only on regular files,
>>> not on paths.
>>>
>>> 0001 and 0002 need to be merged as 0001 would cause the buildfarm to
>>> turn red on Windows if applied alone.  Can you know see my point?
>>
>> Yes, I think they could be merged to address that, though I'm not sure
>> that it's necessairly a huge deal either, if they're going to be pushed
>> together.
> 
> This avoids noise failures when bisecting a regression, which matters in
> some cases.  To keep the history cleaner perhaps you are right and it
> would be cleaner to split into two commits.
> 
> Let's wait a bit and see if others have extra opinions to offer.

Looks good to me.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-27 Thread Michael Paquier
On Tue, Nov 27, 2018 at 08:17:12PM -0500, Stephen Frost wrote:
> * Michael Paquier (mich...@paquier.xyz) wrote:
>> Please see 0002 attached, which moves the call to skipfile() where I
>> think it should go.
> 
> Alright, on a quick glance that seems ok.

Thanks.

>> Base backups are impacted as well as this causes spurious warnings.
> 
> Right- could you please also add comments around the two lists to make
> other hackers aware that the list shows up in two places and that any
> changes should be made in both places..?

Good point.  Added in my local branch.

>> Attached are two patches to fix all the mess:
>> - 0001 is a revert of the whitelist, minus the set of regression tests
>> checking after corrupted files and empty files.
>> - 0002 is a fix for all the issues reported on this thread, with tests
>> added (including the tablespace test from Michael Banck):
>> -- Base backups gain EXEC_BACKEND files in their warning filters.
>> -- pg_verify_checksums gains the same files.
>> -- temporary files are filtered out.
>> -- pg_verify_checksums performs filtering checks only on regular files,
>> not on paths.
>> 
>> 0001 and 0002 need to be merged as 0001 would cause the buildfarm to
>> turn red on Windows if applied alone.  Can you know see my point?
> 
> Yes, I think they could be merged to address that, though I'm not sure
> that it's necessairly a huge deal either, if they're going to be pushed
> together.

This avoids noise failures when bisecting a regression, which matters in
some cases.  To keep the history cleaner perhaps you are right and it
would be cleaner to split into two commits.

Let's wait a bit and see if others have extra opinions to offer.
--
Michael


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-27 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Tue, Nov 27, 2018 at 06:27:57PM -0500, Stephen Frost wrote:
> > * Michael Paquier (mich...@paquier.xyz) wrote:
> >> Believe me or not, but we have spent so much energy on this stuff that I
> >> am ready to give up on the whitelist patch and focus on other business.
> > 
> > I would have hoped that you'd see why I was concerned from the start
> > about this now that we have a released version of pg_verify_checksums
> > in 11.1 that is paper-bag-worthy.
> 
> That's unrelated.  Let's be clear, I still don't like the fact that we
> don't use a whitelist approach, per the arguments raised upthread.  The
> tool also clearly lacked testing and coverage from the beginning and has
> never been able to work on Windows, which was a very bad decision.
> Still we need to live with that and I take my share of the blame,
> willing to make this stuff work better for all.

I don't agree that it's unrelated and I'm disappointed that you feel it
is, but I'm not going to belabor the point.

> >> - skipfile should be called after making sure that we work on a file.
> > 
> > It's not clear to me what this is referring to, exactly...?  Can you
> > clarify?
> 
> Please see 0002 attached, which moves the call to skipfile() where I
> think it should go.

Alright, on a quick glance that seems ok.

> >> - it is necessary to exclude EXEC_BACKEND files.
> > 
> > Agreed, they should be added to the exclude list.
> 
> Base backups are impacted as well as this causes spurious warnings.

Right- could you please also add comments around the two lists to make
other hackers aware that the list shows up in two places and that any
changes should be made in both places..?

> Attached are two patches to fix all the mess:
> - 0001 is a revert of the whitelist, minus the set of regression tests
> checking after corrupted files and empty files.
> - 0002 is a fix for all the issues reported on this thread, with tests
> added (including the tablespace test from Michael Banck):
> -- Base backups gain EXEC_BACKEND files in their warning filters.
> -- pg_verify_checksums gains the same files.
> -- temporary files are filtered out.
> -- pg_verify_checksums performs filtering checks only on regular files,
> not on paths.
> 
> 0001 and 0002 need to be merged as 0001 would cause the buildfarm to
> turn red on Windows if applied alone.  Can you know see my point?

Yes, I think they could be merged to address that, though I'm not sure
that it's necessairly a huge deal either, if they're going to be pushed
together.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-27 Thread Michael Paquier
On Tue, Nov 27, 2018 at 06:27:57PM -0500, Stephen Frost wrote:
> * Michael Paquier (mich...@paquier.xyz) wrote:
>> Believe me or not, but we have spent so much energy on this stuff that I
>> am ready to give up on the whitelist patch and focus on other business.
> 
> I would have hoped that you'd see why I was concerned from the start
> about this now that we have a released version of pg_verify_checksums
> in 11.1 that is paper-bag-worthy.

That's unrelated.  Let's be clear, I still don't like the fact that we
don't use a whitelist approach, per the arguments raised upthread.  The
tool also clearly lacked testing and coverage from the beginning and has
never been able to work on Windows, which was a very bad decision.
Still we need to live with that and I take my share of the blame,
willing to make this stuff work better for all.

>> - skipfile should be called after making sure that we work on a file.
> 
> It's not clear to me what this is referring to, exactly...?  Can you
> clarify?

Please see 0002 attached, which moves the call to skipfile() where I
think it should go.

>> - it is necessary to exclude EXEC_BACKEND files.
> 
> Agreed, they should be added to the exclude list.

Base backups are impacted as well as this causes spurious warnings.

Attached are two patches to fix all the mess:
- 0001 is a revert of the whitelist, minus the set of regression tests
checking after corrupted files and empty files.
- 0002 is a fix for all the issues reported on this thread, with tests
added (including the tablespace test from Michael Banck):
-- Base backups gain EXEC_BACKEND files in their warning filters.
-- pg_verify_checksums gains the same files.
-- temporary files are filtered out.
-- pg_verify_checksums performs filtering checks only on regular files,
not on paths.

0001 and 0002 need to be merged as 0001 would cause the buildfarm to
turn red on Windows if applied alone.  Can you know see my point?
--
Michael
From ad07e7c4ef9c40e2fc13ff59bb51a071ab69ebbe Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 28 Nov 2018 09:32:44 +0900
Subject: [PATCH 1/2] Switch pg_verify_checksums back to a blacklist

This basically reverts commit d55241af705667d4503638e3f77d3689fd6be31,
leaving around a portion of the regression tests still adapted with
empty relation files, and corrupted cases.
---
 .../pg_verify_checksums/pg_verify_checksums.c | 77 ---
 src/bin/pg_verify_checksums/t/002_actions.pl  | 17 
 2 files changed, 17 insertions(+), 77 deletions(-)

diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index f0e09bea20..1bc020ab6c 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -15,7 +15,6 @@
 
 #include "catalog/pg_control.h"
 #include "common/controldata_utils.h"
-#include "common/relpath.h"
 #include "getopt_long.h"
 #include "pg_getopt.h"
 #include "storage/bufpage.h"
@@ -50,69 +49,27 @@ usage(void)
 	printf(_("Report bugs to .\n"));
 }
 
-/*
- * isRelFileName
- *
- * Check if the given file name is authorized for checksum verification.
- */
+static const char *const skip[] = {
+	"pg_control",
+	"pg_filenode.map",
+	"pg_internal.init",
+	"PG_VERSION",
+	NULL,
+};
+
 static bool
-isRelFileName(const char *fn)
+skipfile(const char *fn)
 {
-	int			pos;
+	const char *const *f;
 
-	/*--
-	 * Only files including data checksums are authorized for verification.
-	 * This is guessed based on the file name by reverse-engineering
-	 * GetRelationPath() so make sure to update both code paths if any
-	 * updates are done.  The following file name formats are allowed:
-	 * 
-	 * .
-	 * _
-	 * _.
-	 *
-	 * Note that temporary files, beginning with 't', are also skipped.
-	 *
-	 *--
-	 */
-
-	/* A non-empty string of digits should follow */
-	for (pos = 0; isdigit((unsigned char) fn[pos]); ++pos)
-		;
-	/* leave if no digits */
-	if (pos == 0)
-		return false;
-	/* good to go if only digits */
-	if (fn[pos] == '\0')
+	if (strcmp(fn, ".") == 0 ||
+		strcmp(fn, "..") == 0)
 		return true;
 
-	/* Authorized fork files can be scanned */
-	if (fn[pos] == '_')
-	{
-		int			forkchar = forkname_chars([pos + 1], NULL);
-
-		if (forkchar <= 0)
-			return false;
-
-		pos += forkchar + 1;
-	}
-
-	/* Check for an optional segment number */
-	if (fn[pos] == '.')
-	{
-		int			segchar;
-
-		for (segchar = 1; isdigit((unsigned char) fn[pos + segchar]); ++segchar)
-			;
-
-		if (segchar <= 1)
-			return false;
-		pos += segchar;
-	}
-
-	/* Now this should be the end */
-	if (fn[pos] != '\0')
-		return false;
-	return true;
+	for (f = skip; *f; f++)
+		if (strcmp(*f, fn) == 0)
+			return true;
+	return false;
 }
 
 static void
@@ -189,7 +146,7 @@ scan_directory(const char *basedir, const char *subdir)
 		char		fn[MAXPGPATH];
 		struct stat st;
 
-		if (!isRelFileName(de->d_name))
+		if (skipfile(de->d_name))
 			continue;
 
 		snprintf(fn, sizeof(fn), "%s/%s", path, 

Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-27 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Tue, Nov 27, 2018 at 05:45:41PM -0500, Stephen Frost wrote:
> > This doesn't exactly change my opinion regarding this discussion and I'd
> > rather we revert the "whitelist" patch and use the very minimal patch
> > from here:
> > 
> > https://www.postgresql.org/message-id/20181012013918.GA30064%40paquier.xyz
> 
> Believe me or not, but we have spent so much energy on this stuff that I
> am ready to give up on the whitelist patch and focus on other business.

I would have hoped that you'd see why I was concerned from the start
about this now that we have a released version of pg_verify_checksums
in 11.1 that is paper-bag-worthy.

> This doesn't change a couple of things though, so it is not *just* a
> simple revert with the patch you mention applied:
> - Adding a test for tablespaces makes sense.

I agree with this, of course.

> - skipfile should be called after making sure that we work on a file.

It's not clear to me what this is referring to, exactly...?  Can you
clarify?

> - temporary files and temporary paths should be ignored.

There's example code for how to do this in basebackup.c

> - it is necessary to exclude EXEC_BACKEND files.

Agreed, they should be added to the exclude list.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-27 Thread Michael Paquier
On Tue, Nov 27, 2018 at 05:45:41PM -0500, Stephen Frost wrote:
> This doesn't exactly change my opinion regarding this discussion and I'd
> rather we revert the "whitelist" patch and use the very minimal patch
> from here:
> 
> https://www.postgresql.org/message-id/20181012013918.GA30064%40paquier.xyz

Believe me or not, but we have spent so much energy on this stuff that I
am ready to give up on the whitelist patch and focus on other business.
This doesn't change a couple of things though, so it is not *just* a
simple revert with the patch you mention applied:
- Adding a test for tablespaces makes sense.
- skipfile should be called after making sure that we work on a file.
- temporary files and temporary paths should be ignored.
- it is necessary to exclude EXEC_BACKEND files.
--
Michael


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-27 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Mon, Nov 19, 2018 at 10:17:19PM -0500, Stephen Frost wrote:
> > Let's try to not conflate these two issues though, they're quite
> > independent.
> 
> This is a poke about a recent issue raised by Michael Banck here:
> https://www.postgresql.org/message-id/f1543332405.17247.9.ca...@credativ.de
> And for which I am suggesting a minimal fix, which is extracted from a
> patch at the most-top of this thread:
> https://www.postgresql.org/message-id/20181127213625.gm1...@paquier.xyz
> 
> If there are no objections, I would like to fix the actual issue.  Then
> I'll rebase a patch on top of what has been fixed for this thread for
> what I proposed in the base backup code.

So the as-committed "whitelist" approach did, in fact, end up excluding
huge portions of the database from being checked, that is, everything
inside of tablespaces.

This doesn't exactly change my opinion regarding this discussion and I'd
rather we revert the "whitelist" patch and use the very minimal patch
from here:

https://www.postgresql.org/message-id/20181012013918.GA30064%40paquier.xyz

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-27 Thread Michael Paquier
On Mon, Nov 19, 2018 at 10:17:19PM -0500, Stephen Frost wrote:
> Let's try to not conflate these two issues though, they're quite
> independent.

This is a poke about a recent issue raised by Michael Banck here:
https://www.postgresql.org/message-id/f1543332405.17247.9.ca...@credativ.de
And for which I am suggesting a minimal fix, which is extracted from a
patch at the most-top of this thread:
https://www.postgresql.org/message-id/20181127213625.gm1...@paquier.xyz

If there are no objections, I would like to fix the actual issue.  Then
I'll rebase a patch on top of what has been fixed for this thread for
what I proposed in the base backup code.
--
Michael


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-19 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2018-11-19 21:18:43 -0500, Stephen Frost wrote:
> > As has been mentioned elsewhere, there's really a 'right' way to do
> > things and allowing PG to be 'extensible' by simply ignoring random
> > files showing up isn't that- if we want PG to be extensible in this way
> > then we need to provide a mechanism for that to happen.
> 
> I still don't buy this argument. I'm giving up here, as I just don't
> have enough energy to keep up with this discussion.
> 
> FWIW, I think it's bad, that we don't error out on checksum failures in
> basebackups by default. And that's only really feasible with a
> whitelisting approach.

No, we could error out on checksum failures in either approach, but we
explicitly don't with good reason: if you're doing a backup, you
probably want to actually capture the current data.

This is something we've thought quite a bit about.  In fact, as I
recall, the original pg_basebackup code actually *did* error out, even
with the blacklist approach, and we made a solid argument which was
ultimately agreed to by those involved at the time that erroring out
half-way through was a bad idea.

What we do instead is exit with a non-zero exit code to make it clear
that there was an issue, to allow the user to capture that and raise
alarms, but to still have all of the data which we were able to
capture in the hopes that the backup is at least salvagable.  In
addition, at least in pgbackrest, we don't consider such a backup to be
pristine and therefore we don't expire out the prior backups- we don't
do any backup expiration in pg_basebackup, so it's up to the user to
make sure that if pg_basebackup exits with a non-zero exit code that
they capture and handle that and *don't* blow away a previously good
backup.

The very last thing *any* backup tool should do is give up half-way
through and throw a nasty error, leaving you with the knowledge that
your system is hosed *and* no backup of what was there exist and
making it extremely likely that whatever corruption exists is being
propagated further.

Let's try to not conflate these two issues though, they're quite
independent.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-19 Thread Andres Freund
Hi,

On 2018-11-19 21:18:43 -0500, Stephen Frost wrote:
> As has been mentioned elsewhere, there's really a 'right' way to do
> things and allowing PG to be 'extensible' by simply ignoring random
> files showing up isn't that- if we want PG to be extensible in this way
> then we need to provide a mechanism for that to happen.

I still don't buy this argument. I'm giving up here, as I just don't
have enough energy to keep up with this discussion.

FWIW, I think it's bad, that we don't error out on checksum failures in
basebackups by default. And that's only really feasible with a
whitelisting approach.

Greetings,

Andres Freund



Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-19 Thread Stephen Frost
Greetings Michael,

* Michael Paquier (mich...@paquier.xyz) wrote:
> I am still a fan of the whitelist approach as there is no actual point
> in restricting what people can do with Postgres in terms of
> extensibility (relying on tablespace paths for storage plugin looks like
> an important thing to me, and we would close doors with a black list,
> causing warnings to be generated for basically everything which is not
> from heap).  What worries me the most is actually the fact that we have
> not heard from the original authors of the pg_verify_checksums what they
> think on the matter and how we ought to do things, because their
> opinion is important.  If there is a clear agreement for the direction
> to take, I am of course perfectly fine if the conclusion is the opposite
> of what I think, but a 3vs2, (3vs3 if I count Andres) is kind of hard to
> conclude that we have an actual agreement. 

I can understand that we want PostgreSQL to be extensible, but as David
pointed out up-thread, what we've actually seen in the wild are cases
where random files have mistakenly ended up in the data directory and
those have been cases where it's been quite good to have the warnings
thrown to indicate that there's been some mistake.  I don't think we do
our users any service by simply ignoring random files showing up in the
data directories.

As has been mentioned elsewhere, there's really a 'right' way to do
things and allowing PG to be 'extensible' by simply ignoring random
files showing up isn't that- if we want PG to be extensible in this way
then we need to provide a mechanism for that to happen.

While I'd also like to hear from the authors of pg_verify_checksums as
to their thoughts, I'm guessing that they're mostly watching from the
sidelines while we discuss and not wanting to end up picking the wrong
side.

When it comes to what we typically do, at least imv, when there's an
impass or a disagreement of approaches is to actually not move forward
with one side of it over what was in place previously.  David, in
particular, was certainly involved in the verify checksums work and in
the changes for pg_basebackup, having had quite a bit of experience
implementing that same mechanism in pgbackrest quite a while before it
got into PG proper.  That real-world experience with exactly this
feature is really quite relevant, imv.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-19 Thread Michael Paquier
On Mon, Nov 19, 2018 at 08:45:29PM -0500, Stephen Frost wrote:
> Michael, this obviously didn't happen and instead we ended up releasing
> 11.1 with your changes, but I don't feel like this issue is closed and
> I'm a bit disappointed that there hasn't been any further responses or
> discussions on this.

This issue is not closed.

> I discussed this at length with David and Amit, both of whom have now
> also commented on this issue, at PGConf.Eu, but still there hasn't been
> a response from you.  Is your thought here that your lack of response
> should be taken as meaning I should simply revert your commit and then
> commit your earlier patch to just add the param file?  While we
> typically take silence as acceptance, it's a bit different when it comes
> to reverting someone else's change, at least to my mind.
>
> I'm happy to go ahead and make those changes if there's no disagreement
> regarding it.

Well, I did not have any express feeling to offer a response as it seems
to me that the topic of how to handle things has not moved a iota to an
agreement.  From what I can see, there are still two school of thoughts:
- Use a white list.  Individuals which have expressed an interest on
this approach, based on the status of this thread are myself, Kyotaro
Horiguchi.  And at least a third person which I think would prefer the
white-list approach is Andres, but he did not reply directly to this
thread.
- Use a black list, which a least a set of three people have expressed
an opinion about on this thread: Amit Kapila, David Steele and
yourself.

> Also, just to be clear, I don't intend this with any animosity and I
> certainly understand if it just has fallen through the cracks or been
> lost in the shuffle but I really don't like the implication put forward
> that we're happy to not throw any kind of warning or notice about random
> files showing up in the PG data directories.

Don't worry about that!  Thanks for trying to make this thread moving
on.

I am still a fan of the whitelist approach as there is no actual point
in restricting what people can do with Postgres in terms of
extensibility (relying on tablespace paths for storage plugin looks like
an important thing to me, and we would close doors with a black list,
causing warnings to be generated for basically everything which is not
from heap).  What worries me the most is actually the fact that we have
not heard from the original authors of the pg_verify_checksums what they
think on the matter and how we ought to do things, because their
opinion is important.  If there is a clear agreement for the direction
to take, I am of course perfectly fine if the conclusion is the opposite
of what I think, but a 3vs2, (3vs3 if I count Andres) is kind of hard to
conclude that we have an actual agreement. 
--
Michael


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-19 Thread Stephen Frost
Greetings,

* David Steele (da...@pgmasters.net) wrote:
> On 10/30/18 11:59 AM, Stephen Frost wrote:
> > * Kyotaro HORIGUCHI (horiguchi.kyot...@lab.ntt.co.jp) wrote:
> >> So I'm +1 for the Michael's current patch as (I think) we can't
> >> make visible or large changes.
> >>
> >> That said, I agree with Stephen's concern on the point we could
> >> omit requried files in future, but on the other hand I don't want
> >> random files are simply rejected.
> > 
> > They aren't rejected- there's a warning thrown about them.
> 
> pgBackRest has been using a whitelist/blacklist method for identifying
> checksummable files for almost 2 years we haven't seen any issues.  The
> few times a "random" file appeared in the logs with checksum warnings it
> was later identified as having been mistakenly copied into $PGDATA.  The
> backup still completed successfully in these cases.
> 
> So to be clear, we whitelist the global, base, and pg_tblspc dirs and
> blacklist PG_VERSION, pg_filenode.map, pg_internal.init, and pg_control
> (just for global) when deciding which files to checksum.  Recently we
> added logic to exclude unlogged and temporary relations as well, though
> that's not required.
> 
> For PG11 I would recommend just adding the param file generated by exec
> backend to the black list for both pg_basebackup and pg_verifychecksums,
> then create a common facility for blacklisting for PG12.

Michael, this obviously didn't happen and instead we ended up releasing
11.1 with your changes, but I don't feel like this issue is closed and
I'm a bit disappointed that there hasn't been any further responses or
discussions on this.

I discussed this at length with David and Amit, both of whom have now
also commented on this issue, at PGConf.Eu, but still there hasn't been
a response from you.  Is your thought here that your lack of response
should be taken as meaning I should simply revert your commit and then
commit your earlier patch to just add the param file?  While we
typically take silence as acceptance, it's a bit different when it comes
to reverting someone else's change, at least to my mind.

I'm happy to go ahead and make those changes if there's no disagreement
regarding it.

Also, just to be clear, I don't intend this with any animosity and I
certainly understand if it just has fallen through the cracks or been
lost in the shuffle but I really don't like the implication put forward
that we're happy to not throw any kind of warning or notice about random
files showing up in the PG data directories.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-01 Thread Michael Paquier
On Thu, Nov 01, 2018 at 04:44:40PM +0530, Amit Kapila wrote:
> This sounds like a good argument for having a whitelist approach, but
> is it really a big problem if a user gets warning for files that the
> utility is not able to verify checksums for?  I think in some sense
> this message can be useful to the user as it can allow him to know
> which files are not verified by the utility for some form of
> corruption.  I guess one can say that users might not be interested in
> this information in which case such a check could be optional as you
> seem to be suggesting in the following paragraph.

The replication protocol supports NOVERIFY_CHECKSUMS to avoid the
warnings so they enabled by default, and can be disabled at will.
pg_basebackup supports the same interface.
--
Michael


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-01 Thread Amit Kapila
On Sun, Oct 21, 2018 at 7:12 PM Michael Paquier  wrote:
>
> Hi all,
>
> This is a follow-up of the following thread:
> https://www.postgresql.org/message-id/20181012010411.re53cwcistcpi...@alap3.anarazel.de
>
> In a nutshell, b34e84f1 has added TAP tests for pg_verify_checksums, and
> the buildfarm has immediately complained about files generated by
> EXEC_BACKEND missing, causing pg_verify_checksums to fail for such
> builds.  An extra issue has been noticed by Andres in the tool: it
> misses to check for temporary files, hence files like
> base/pgsql_tmp/pgsql_tmp$PID.NN.sharedfileset/1.1 can also cause the
> tool to fail.  After a crash, those files would not be removed, and
> stopping the instance would still let them around.
>
> pg_verify_checksums used first a blacklist to decide if files have
> checksums or not.  So with this approach all files should have checksums
> except files like pg_control, pg_filenode.map, PG_VERSION, etc.
>
> d55241a has added a first fix to silence the buildfarm for EXEC_BACKEND
> builds.  After discussion, Andres has pointed out that some extensions
> may want to drop files within global/ or base/ as part of their logic.
> cstore_fdw was mentioned but if you look at their README that's not the
> case.  However, I think that Andres argument is pretty good as with
> pluggable storage we should allow extensions to put custom files for
> different tablespaces.  So this commit has changed the logic of
> pg_verify_checksums to use a whitelist, which assumes that only normal
> relation files have checksums:
> 
> .
> _
> _.
> After more discussion on the thread mentioned above, Stephen has pointed
> out that base backups use the same blacklist logic when verifying
> checksums.  This has the same problem with EXEC_BACKEND-specific files,
> resulting in spurious warnings (that's an example!) which are confusing
> for the user:
> $ pg_basebackup -D popo
> WARNING:  cannot verify checksum in file "./global/config_exec_params",
> block 0: read buffer size 5 and page size 8192 differ
>
> Folks on this thread agreed that both pg_verify_checksums and checksum
> verification for base backups should use the same logic.  It seems to me
> that using a whitelist-based approach for both is easier to maintain as
> the patterns of files supporting checksums is more limited than files
> which do not support checksums.  So this way we allow extensions
> bundling custom files to still work with pg_verify_checksums and
> checksum verification in base backups.
>

This sounds like a good argument for having a whitelist approach, but
is it really a big problem if a user gets warning for files that the
utility is not able to verify checksums for?  I think in some sense
this message can be useful to the user as it can allow him to know
which files are not verified by the utility for some form of
corruption.  I guess one can say that users might not be interested in
this information in which case such a check could be optional as you
seem to be suggesting in the following paragraph.

> Something else which has been discussed on this thread is that we should
> have a dedicated API to decide if a file has checksums or not, and if it
> should be skipped in both cases.  That's work only for HEAD though, so
> we need to do something for HEAD and v11, which is simple.
>

+1.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-10-30 Thread David Steele
On 10/30/18 11:59 AM, Stephen Frost wrote:
> 
> * Kyotaro HORIGUCHI (horiguchi.kyot...@lab.ntt.co.jp) wrote:
>>
>> So I'm +1 for the Michael's current patch as (I think) we can't
>> make visible or large changes.
>>
>> That said, I agree with Stephen's concern on the point we could
>> omit requried files in future, but on the other hand I don't want
>> random files are simply rejected.
> 
> They aren't rejected- there's a warning thrown about them.

pgBackRest has been using a whitelist/blacklist method for identifying
checksummable files for almost 2 years we haven't seen any issues.  The
few times a "random" file appeared in the logs with checksum warnings it
was later identified as having been mistakenly copied into $PGDATA.  The
backup still completed successfully in these cases.

So to be clear, we whitelist the global, base, and pg_tblspc dirs and
blacklist PG_VERSION, pg_filenode.map, pg_internal.init, and pg_control
(just for global) when deciding which files to checksum.  Recently we
added logic to exclude unlogged and temporary relations as well, though
that's not required.

For PG11 I would recommend just adding the param file generated by exec
backend to the black list for both pg_basebackup and pg_verifychecksums,
then create a common facility for blacklisting for PG12.

I'm not very excited about the idea of encouraging extensions to drop
files in the postgres relation directories (base, global, pg_tblspc).
If we don't say we support it then in my mind that means we don't.
There are lots of ways extension authors could make naming mistakes that
would lead to their files being cleaned up by Postgres at startup or
included in a DROP DATABASE.

I am OK with allowing an extension directory for each tablespace/db dir
where extensions can safe drop files for PG12, if we decide that's
something worth doing.

Regards,
-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-10-30 Thread Stephen Frost
Greetings,

* Kyotaro HORIGUCHI (horiguchi.kyot...@lab.ntt.co.jp) wrote:
> At Wed, 24 Oct 2018 14:31:37 +0900, Michael Paquier  
> wrote in <20181024053137.gl1...@paquier.xyz>
> > On Sun, Oct 21, 2018 at 08:56:32PM -0400, Stephen Frost wrote:
> > > All of this pie-in-the-sky about what pluggable storage might have is
> > > just hand-waving, in my opinion, and not worth much more than that.  I
> > > hope (and suspect..) that the actual pluggable storage that's being
> > > worked on doesn't do any of this "just drop a file somewhere" because
> > > there's a lot of downsides to it- and if it did, it wouldn't be much
> > > more than what we can do with an FDW, so why go through and add it?
> > 
> > Well, there is no point in enforcing a rule that something is forbidden
> > if if was never implied and never documented (the rule here is to be
> > able to drop custom files into global/, base/ or pg_tblspc.).  Postgres
> > is highly-customizable, I would prefer if features in core are designed
> > so as we keep things extensible, the checksum verification for base
> > backup on the contrary restricts things.
> > 
> > So, do we have other opinions about this thread?
> 
> I believe of freedom for anyone of dropping any files into
> anywhere in $PGDATA if he thinks it sefe for the time
> being. Changes in core sometimes breaks some extensions and they
> used to be 'fixed' in the manner or claim for a replacement to
> core. This is the same for changes of (undocumented) APIs. I
> think things have been going around in this manner for years and
> I don't think core side is unable to define a definite border of
> what extensions are allowed to do since we are not knowledgeable
> of all extensions that will be created in future or that have
> created.
> 
> So I'm +1 for the Michael's current patch as (I think) we can't
> make visible or large changes.
> 
> 
> That said, I agree with Stephen's concern on the point we could
> omit requried files in future, but on the other hand I don't want
> random files are simply rejected.

They aren't rejected- there's a warning thrown about them.

> So I propose to sort files into roughly three categories. One is
> files we know of but to skip. Second is files we know of and need
> checksum verification. The last is the files we just don't know of.

That last set really should be empty though.

> In the attached PoC (checksum_)find_file_type categorizes a file
> into 6 (or 7) categories.

I'm certainly not thrilled with the idea of adding a bunch more code to
v11 for this.

> ENTRY_TO_IGNORE: to be always ignored, for "." and "..".
> FILE_TO_SKIP:we know of and to skip. files in the black list.
> DIR_TO_SKIP: directories same to the above.
> DIR_TO_SCAN:   we know any file to scan may be in it.
> HEAP_TO_SCAN:  we know it has checksums in heap format.
> FILE_UNKNOWN:we don't know of.
> (STAT_FAILED:stat filed for the file)
> 
> Among these types, what are related to the discussion above are
> FILE_TO_SKIP, HEAP_TO_SCAN and FILE_UNKNOWN. Others are for
> controlling search loop in pg_verify_checksums.
> 
> Based on this categorization, pg_verify_checksum's result is shown as:
> 
> > Checksum scan completed
> > Data checksum version: 1
> > Files scanned:  1095
> > Blocks scanned: 2976
> > Bad checksums:  0
> + Files skipped: 8
> + Unknown files skipped: 1
> + Skipped directories: 1

I'd rather those files be shown as a warning than just listed as
'Unknown'.  Previously, throwing a warning is exactly what we did and
seems like the most sensible thing to do when we come across random
files which have shown up in the data directory that we don't recognize
or understand.

> (Data checksum version is bonded with heap checksums..)
> 
> If this sort of change is acceptable for us, I believe it
> comforms everyone here's wishes. If skipped unknown is not zero
> on a buildfarm animal, it is a sign of something is forgotten.

Having a buildfarm checking would certainly be good, but I'm really not
sure that the added complexity here makes sense.  If we're going to go
down this road, it also seems like it'd make sense to complain about
things that we don't recognize in other directories too.

> The second attached (also PoC) is common'ize the file sorter. The
> function is moved to src/common/file_checksums.c and both
> pg_verify_checksums.c and basebackup.c uses it.  With
> log_min_messages=debug1, you will see the following message for
> unkown files during basebackup.

I'm definitely on-board with figuring out a way to provide more
inspection of the data directory through src/common functions that can
be leveraged by the frontend and backend tools, as well as other tools.

> > DEBUG:  checksum verification was skipped for unknown file: ./base/hoge
> 
> This changed one behavior of the tool. -r now can take
> 'dbid/relid' addition to just 'relid'.  I think we could have
> .verify_checksum.exlucde file so that extensions can declare
> files.

Generally speaking, if we really want 

Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-10-25 Thread Kyotaro HORIGUCHI
Mmm. It took too long time than expected because I was repeatedly
teased by git..

At Wed, 24 Oct 2018 14:31:37 +0900, Michael Paquier  wrote 
in <20181024053137.gl1...@paquier.xyz>
> On Sun, Oct 21, 2018 at 08:56:32PM -0400, Stephen Frost wrote:
> > All of this pie-in-the-sky about what pluggable storage might have is
> > just hand-waving, in my opinion, and not worth much more than that.  I
> > hope (and suspect..) that the actual pluggable storage that's being
> > worked on doesn't do any of this "just drop a file somewhere" because
> > there's a lot of downsides to it- and if it did, it wouldn't be much
> > more than what we can do with an FDW, so why go through and add it?
> 
> Well, there is no point in enforcing a rule that something is forbidden
> if if was never implied and never documented (the rule here is to be
> able to drop custom files into global/, base/ or pg_tblspc.).  Postgres
> is highly-customizable, I would prefer if features in core are designed
> so as we keep things extensible, the checksum verification for base
> backup on the contrary restricts things.
> 
> So, do we have other opinions about this thread?

I believe of freedom for anyone of dropping any files into
anywhere in $PGDATA if he thinks it sefe for the time
being. Changes in core sometimes breaks some extensions and they
used to be 'fixed' in the manner or claim for a replacement to
core. This is the same for changes of (undocumented) APIs. I
think things have been going around in this manner for years and
I don't think core side is unable to define a definite border of
what extensions are allowed to do since we are not knowledgeable
of all extensions that will be created in future or that have
created.

So I'm +1 for the Michael's current patch as (I think) we can't
make visible or large changes.


That said, I agree with Stephen's concern on the point we could
omit requried files in future, but on the other hand I don't want
random files are simply rejected.

So I propose to sort files into roughly three categories. One is
files we know of but to skip. Second is files we know of and need
checksum verification. The last is the files we just don't know of.

In the attached PoC (checksum_)find_file_type categorizes a file
into 6 (or 7) categories.

ENTRY_TO_IGNORE: to be always ignored, for "." and "..".
FILE_TO_SKIP:we know of and to skip. files in the black list.
DIR_TO_SKIP: directories same to the above.
DIR_TO_SCAN: we know any file to scan may be in it.
HEAP_TO_SCAN:we know it has checksums in heap format.
FILE_UNKNOWN:we don't know of.
(STAT_FAILED:stat filed for the file)

Among these types, what are related to the discussion above are
FILE_TO_SKIP, HEAP_TO_SCAN and FILE_UNKNOWN. Others are for
controlling search loop in pg_verify_checksums.

Based on this categorization, pg_verify_checksum's result is shown as:

> Checksum scan completed
> Data checksum version: 1
> Files scanned:  1095
> Blocks scanned: 2976
> Bad checksums:  0
+ Files skipped: 8
+ Unknown files skipped: 1
+ Skipped directories: 1

(Data checksum version is bonded with heap checksums..)

If this sort of change is acceptable for us, I believe it
comforms everyone here's wishes. If skipped unknown is not zero
on a buildfarm animal, it is a sign of something is forgotten.

The second attached (also PoC) is common'ize the file sorter. The
function is moved to src/common/file_checksums.c and both
pg_verify_checksums.c and basebackup.c uses it.  With
log_min_messages=debug1, you will see the following message for
unkown files during basebackup.

> DEBUG:  checksum verification was skipped for unknown file: ./base/hoge

This changed one behavior of the tool. -r now can take
'dbid/relid' addition to just 'relid'.  I think we could have
.verify_checksum.exlucde file so that extensions can declare
files.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 187d84a0ffc94fb9d5c9c0f6708227cc8f47fa3c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 25 Oct 2018 16:48:47 +0900
Subject: [PATCH 1/2] Make pg_verify_checksums conscious of unknown files

---
 src/bin/pg_verify_checksums/pg_verify_checksums.c | 225 +-
 1 file changed, 179 insertions(+), 46 deletions(-)

diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index f0e09bea20..4b527913c1 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -26,6 +26,9 @@
 static int64 files = 0;
 static int64 blocks = 0;
 static int64 badblocks = 0;
+static int64 skipped_known = 0;
+static int64 skipped_unknown = 0;
+static int64 skipped_dirs = 0;
 static ControlFileData *ControlFile;
 
 static char *only_relfilenode = NULL;
@@ -33,6 +36,46 @@ static bool verbose = false;
 
 static const char *progname;
 
+/* struct for checksum verification paremter*/
+typedef struct
+{
+	union
+	{
+		struct
+		{
+			BlockNumber	

Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-10-23 Thread Michael Paquier
On Sun, Oct 21, 2018 at 08:56:32PM -0400, Stephen Frost wrote:
> All of this pie-in-the-sky about what pluggable storage might have is
> just hand-waving, in my opinion, and not worth much more than that.  I
> hope (and suspect..) that the actual pluggable storage that's being
> worked on doesn't do any of this "just drop a file somewhere" because
> there's a lot of downsides to it- and if it did, it wouldn't be much
> more than what we can do with an FDW, so why go through and add it?

Well, there is no point in enforcing a rule that something is forbidden
if if was never implied and never documented (the rule here is to be
able to drop custom files into global/, base/ or pg_tblspc.).  Postgres
is highly-customizable, I would prefer if features in core are designed
so as we keep things extensible, the checksum verification for base
backup on the contrary restricts things.

So, do we have other opinions about this thread?
--
Michael


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-10-21 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Sun, Oct 21, 2018 at 11:03:30AM -0400, Stephen Frost wrote:
> > This doesn't change my opinion of the bigger question though, which is
> > to what extent we should be implicitly supporting extensions and
> > whatever else putting files into the database and tablespace
> > directories.
> 
> Well, the whole point is that I have never seen either that it is
> forbidden for extensions to drop files into global/ and/or base/.  I am
> pretty sure that I'd actually want to be able to do that myself by the
> way.  If I had pluggable storage APIs and the possibility to write by
> myself a storage engine out-of-the-box, I would like to be able to rely
> on the default tablespace path and use other tablespace paths.

All of this pie-in-the-sky about what pluggable storage might have is
just hand-waving, in my opinion, and not worth much more than that.  I
hope (and suspect..) that the actual pluggable storage that's being
worked on doesn't do any of this "just drop a file somewhere" because
there's a lot of downsides to it- and if it did, it wouldn't be much
more than what we can do with an FDW, so why go through and add it?

Considering the example given doesn't today do that (maybe it once did?)
as you mentioned up-thread, it seems like maybe there was a realization
that it's not a good idea even without this issue around pg_basebackup
and pg_verify_checksums.

> > Even if we go with this proposed change to look at the relation
> > filename, I'd be happier with some kind of warning being thrown when we
> > come across files we don't recognize in directories that aren't intended
> > to have random files showing up.
> 
> Yes, that could be something we could do, as an option I would guess as
> this does not match with what v10 does.  I'll wait for more people to
> provide input on this thread before answering more, but if possible I
> think that we should focus on fixing v11 and HEAD first, then try to
> figure out what we'd want to do later on.

pg_basebackup works the way it does in v10 because we've accepted
letting users drop files into the root of PGDATA and even encouraged it
to some extent.  I don't think there was ever a serious intent that it
would be used to back up an extension's self-created files on the
filesystem in tablespaces, since there's no way for it to know how to do
so in a way that would ensure that they're consistent or useful or
sensible to backup online.

What are you thinking the 'option' would look like?  Everything I
come up with seems awful confusing and not very sensible.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-10-21 Thread Michael Paquier
On Sun, Oct 21, 2018 at 11:03:30AM -0400, Stephen Frost wrote:
> This doesn't change my opinion of the bigger question though, which is
> to what extent we should be implicitly supporting extensions and
> whatever else putting files into the database and tablespace
> directories.

Well, the whole point is that I have never seen either that it is
forbidden for extensions to drop files into global/ and/or base/.  I am
pretty sure that I'd actually want to be able to do that myself by the
way.  If I had pluggable storage APIs and the possibility to write by
myself a storage engine out-of-the-box, I would like to be able to rely
on the default tablespace path and use other tablespace paths.

> Even if we go with this proposed change to look at the relation
> filename, I'd be happier with some kind of warning being thrown when we
> come across files we don't recognize in directories that aren't intended
> to have random files showing up.

Yes, that could be something we could do, as an option I would guess as
this does not match with what v10 does.  I'll wait for more people to
provide input on this thread before answering more, but if possible I
think that we should focus on fixing v11 and HEAD first, then try to
figure out what we'd want to do later on.
--
Michael


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-10-21 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> This is a follow-up of the following thread:
> https://www.postgresql.org/message-id/20181012010411.re53cwcistcpi...@alap3.anarazel.de

Thanks for starting this thread Michael.

> pg_verify_checksums used first a blacklist to decide if files have
> checksums or not.  So with this approach all files should have checksums
> except files like pg_control, pg_filenode.map, PG_VERSION, etc.

So, this is a great example- pg_control actually *does* have a CRC and
we really should be checking it in tools like pg_verify_checksums and
pg_basebackup.  Similairly, we should be trying to get to a point where
we have checksums for anything else that we actually care about.

> d55241a has added a first fix to silence the buildfarm for EXEC_BACKEND
> builds.  After discussion, Andres has pointed out that some extensions
> may want to drop files within global/ or base/ as part of their logic.

> cstore_fdw was mentioned but if you look at their README that's not the
> case.  

So the one example that's been put forward doesn't actually do this..?

> However, I think that Andres argument is pretty good as with
> pluggable storage we should allow extensions to put custom files for
> different tablespaces.  

Andres' wasn't argueing, that I saw at least, that pluggable storage
would result in random files showing up in tablespace directories that
the core code has no knowledge of.  In fact, he seemed to be saying in
20181019205724.ewnuhvrsanaci...@alap3.anarazel.de that pluggable storage
results in the core code being aware of the files that those other
storage mechanisms create, so this entire line of argument seems to be
without merit.

> So this commit has changed the logic of
> pg_verify_checksums to use a whitelist, which assumes that only normal
> relation files have checksums:
> 
> .
> _
> _. After more discussion on the thread mentioned above, Stephen has pointed
> out that base backups use the same blacklist logic when verifying
> checksums.  This has the same problem with EXEC_BACKEND-specific files,
> resulting in spurious warnings (that's an example!) which are confusing
> for the user:
> $ pg_basebackup -D popo
> WARNING:  cannot verify checksum in file "./global/config_exec_params",
> block 0: read buffer size 5 and page size 8192 differ

Stephen also extensively argued that extensions shouldn't be dropping
random files into PG's database and tablespace directories and that we
shouldn't be writing code which attempts to work around that (and,
indeed, ultimately can't since technically extension authors could drop
files into those directories which match these "whitelist patterns").

Further, using a whitelist risks possibly missing files that should be
included, unlike having an exclude list which ensures that we actually
check everything and complain about anything found that's out of the
ordinary.  Additionally, having these checks would hopefully make it
clear that people shouldn't be dropping random files into our database
and tablespace directories- something we didn't try to do for the root
of the data directory, resulting in, frankly, a big mess.  Allowing
random files to exist in the data directory has lead to quite a few
issues including things like pg_basebackup breaking because of a
root-owned file or similar that can't be read, and this extends that.

I thought the point of this new thread was to encourage discussion of
that point and the pros and cons seen for each side, not to only include
one side of it, so I'm rather disappointed.

> Folks on this thread agreed that both pg_verify_checksums and checksum
> verification for base backups should use the same logic.  It seems to me
> that using a whitelist-based approach for both is easier to maintain as
> the patterns of files supporting checksums is more limited than files
> which do not support checksums.  So this way we allow extensions
> bundling custom files to still work with pg_verify_checksums and
> checksum verification in base backups.

This is not an accurate statement- those random files which some
extension drops into these directories don't "work" with
pg_verify_checksums, this just makes pg_verify_checksums ignore them.
That is not the same thing at all.  Worse, we end up with things like
pg_basebackup copying/backing up those files, but skipping them for
validation checking, when we have no reason to expect that those files
will be at all valid when they're copied and no way to see if they are
valid in the resulting restore.

Other parts of the system will continue to similairly do things that
people might not expect- DROP DATABASE will happily nuke any file it
finds, no matter if it matches these patterns or not.

Basically, the way I see it, at least, is that we should either maintain
that PG's database and tablespace directories are under the purview of
PG and other things shouldn't be putting files there without the core
code's knowledge, or we decide that it's ok for 

More issues with pg_verify_checksums and checksum verification in base backups

2018-10-21 Thread Michael Paquier
Hi all,

This is a follow-up of the following thread:
https://www.postgresql.org/message-id/20181012010411.re53cwcistcpi...@alap3.anarazel.de

In a nutshell, b34e84f1 has added TAP tests for pg_verify_checksums, and
the buildfarm has immediately complained about files generated by
EXEC_BACKEND missing, causing pg_verify_checksums to fail for such
builds.  An extra issue has been noticed by Andres in the tool: it
misses to check for temporary files, hence files like
base/pgsql_tmp/pgsql_tmp$PID.NN.sharedfileset/1.1 can also cause the
tool to fail.  After a crash, those files would not be removed, and
stopping the instance would still let them around.

pg_verify_checksums used first a blacklist to decide if files have
checksums or not.  So with this approach all files should have checksums
except files like pg_control, pg_filenode.map, PG_VERSION, etc.

d55241a has added a first fix to silence the buildfarm for EXEC_BACKEND
builds.  After discussion, Andres has pointed out that some extensions
may want to drop files within global/ or base/ as part of their logic.
cstore_fdw was mentioned but if you look at their README that's not the
case.  However, I think that Andres argument is pretty good as with
pluggable storage we should allow extensions to put custom files for
different tablespaces.  So this commit has changed the logic of
pg_verify_checksums to use a whitelist, which assumes that only normal
relation files have checksums:

.
_
_.diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index b20f6c379c..207e2facb8 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -72,7 +72,6 @@ static void parse_basebackup_options(List *options, basebackup_options *opt);
 static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
 static int	compareWalFileNames(const void *a, const void *b);
 static void throttle(size_t increment);
-static bool is_checksummed_file(const char *fullpath, const char *filename);
 
 /* Was the backup currently in-progress initiated in recovery mode? */
 static bool backup_started_in_recovery = false;
@@ -187,17 +186,6 @@ static const char *excludeFiles[] =
 	NULL
 };
 
-/*
- * List of files excluded from checksum validation.
- */
-static const char *const noChecksumFiles[] = {
-	"pg_control",
-	"pg_filenode.map",
-	"pg_internal.init",
-	"PG_VERSION",
-	NULL,
-};
-
 
 /*
  * Called when ERROR or FATAL happens in perform_base_backup() after
@@ -1101,8 +1089,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
 
 		/* Exclude all forks for unlogged tables except the init fork */
 		if (isDbDir &&
-			parse_filename_for_nontemp_relation(de->d_name, ,
-))
+			ParseRelationFileName(de->d_name, , ))
 		{
 			/* Never exclude init forks */
 			if (relForkNum != INIT_FORKNUM)
@@ -1312,32 +1299,6 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
 	return size;
 }
 
-/*
- * Check if a file should have its checksum validated.
- * We validate checksums on files in regular tablespaces
- * (including global and default) only, and in those there
- * are some files that are explicitly excluded.
- */
-static bool
-is_checksummed_file(const char *fullpath, const char *filename)
-{
-	const char *const *f;
-
-	/* Check that the file is in a tablespace */
-	if (strncmp(fullpath, "./global/", 9) == 0 ||
-		strncmp(fullpath, "./base/", 7) == 0 ||
-		strncmp(fullpath, "/", 1) == 0)
-	{
-		/* Compare file against noChecksumFiles skiplist */
-		for (f = noChecksumFiles; *f; f++)
-			if (strcmp(*f, filename) == 0)
-return false;
-
-		return true;
-	}
-	else
-		return false;
-}
 
 /*
  * Functions for handling tar file format
@@ -1391,13 +1352,15 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 		char	   *filename;
 
 		/*
-		 * Get the filename (excluding path).  As last_dir_separator()
-		 * includes the last directory separator, we chop that off by
-		 * incrementing the pointer.
+		 * Check if a given file should have its checksums verified or not.
+		 * Only relation files should have this property now.  First get the
+		 * filename (excluding path).  As last_dir_separator() includes the
+		 * last directory separator, we chop that off by incrementing the
+		 * pointer.
 		 */
 		filename = last_dir_separator(readfilename) + 1;
 
-		if (is_checksummed_file(readfilename, filename))
+		if (ParseRelationFileName(filename, NULL, NULL))
 		{
 			verify_checksum = true;
 
diff --git a/src/backend/storage/file/reinit.c b/src/backend/storage/file/reinit.c
index 74ff6c359b..eae32e9a5d 100644
--- a/src/backend/storage/file/reinit.c
+++ b/src/backend/storage/file/reinit.c
@@ -186,8 +186,7 @@ ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
 			unlogged_relation_entry ent;
 
 			/* Skip anything that doesn't look like a relation data file. */
-			if (!parse_filename_for_nontemp_relation(de->d_name, ,
-