Re: backup manifests

2020-04-24 Thread Robert Haas
On Thu, Apr 23, 2020 at 5:16 PM Andres Freund wrote: > Do you not see a warning when compiling with optimizations enabled? No, I don't. I tried it with -O{0,1,2,3} and I always use -Wall -Werror. No warnings. [rhaas pgsql]$ clang -v clang version 5.0.2 (tags/RELEASE_502/final) Target:

Re: backup manifests

2020-04-23 Thread Andres Freund
Hi, On 2020-04-23 08:57:39 -0400, Robert Haas wrote: > On Sun, Apr 5, 2020 at 3:31 PM Andres Freund wrote: > > The warnings don't seem too unreasonable. The compiler can't see that > > the error_cb inside json_manifest_parse_failure() is not expected to > > return. Probably worth adding a

Re: backup manifests

2020-04-23 Thread Robert Haas
On Sun, Apr 5, 2020 at 3:31 PM Andres Freund wrote: > The warnings don't seem too unreasonable. The compiler can't see that > the error_cb inside json_manifest_parse_failure() is not expected to > return. Probably worth adding a wrapper around the calls to > context->error_cb and mark that as

Re: backup manifests

2020-04-22 Thread Fujii Masao
On 2020/04/23 1:28, Robert Haas wrote: On Wed, Apr 22, 2020 at 12:21 PM Fujii Masao wrote: I found three minor issues in pg_verifybackup. + {"print-parse-wal", no_argument, NULL, 'p'}, This is unused option, so this line should be removed. + printf(_(" -m,

Re: backup manifests

2020-04-22 Thread Robert Haas
On Wed, Apr 22, 2020 at 12:21 PM Fujii Masao wrote: > I found three minor issues in pg_verifybackup. > > + {"print-parse-wal", no_argument, NULL, 'p'}, > > This is unused option, so this line should be removed. > > + printf(_(" -m, --manifest=PATH use specified path

Re: backup manifests

2020-04-22 Thread Fujii Masao
On 2020/04/15 11:18, Fujii Masao wrote: On 2020/04/14 0:15, Robert Haas wrote: On Sun, Apr 12, 2020 at 10:09 PM Fujii Masao wrote: I found other minor issues. I think these are all correct fixes. Thanks for the post-commit review, and sorry for this mistakes. Thanks for the review,

Re: backup manifests

2020-04-14 Thread Fujii Masao
On 2020/04/14 0:15, Robert Haas wrote: On Sun, Apr 12, 2020 at 10:09 PM Fujii Masao wrote: I found other minor issues. I think these are all correct fixes. Thanks for the post-commit review, and sorry for this mistakes. Thanks for the review, Michael and Robert. Pushed the patches!

Re: backup manifests

2020-04-13 Thread Robert Haas
On Sun, Apr 12, 2020 at 10:09 PM Fujii Masao wrote: > I found other minor issues. I think these are all correct fixes. Thanks for the post-commit review, and sorry for this mistakes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company

Re: backup manifests

2020-04-12 Thread Michael Paquier
On Mon, Apr 13, 2020 at 11:09:34AM +0900, Fujii Masao wrote: > - while ((c = getopt_long(argc, argv, > "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP", > + while ((c = getopt_long(argc, argv, > "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvPm:", > > "m:" seems unnecessary, so should be removed? >

Re: backup manifests

2020-04-12 Thread Fujii Masao
On 2020/04/09 23:06, Fujii Masao wrote: On 2020/04/09 2:35, Robert Haas wrote: On Wed, Apr 8, 2020 at 1:15 AM Fujii Masao wrote: When there is a backup_manifest in the database cluster, it's included in the backup even when --no-manifest is specified. ISTM that this is problematic because

Re: backup manifests

2020-04-09 Thread Fujii Masao
On 2020/04/09 23:10, Stephen Frost wrote: Greetings, * Fujii Masao (masao.fu...@oss.nttdata.com) wrote: On 2020/04/09 2:35, Robert Haas wrote: On Wed, Apr 8, 2020 at 1:15 AM Fujii Masao wrote: When there is a backup_manifest in the database cluster, it's included in the backup even when

Re: backup manifests

2020-04-09 Thread Stephen Frost
Greetings, * Fujii Masao (masao.fu...@oss.nttdata.com) wrote: > On 2020/04/09 2:35, Robert Haas wrote: > >On Wed, Apr 8, 2020 at 1:15 AM Fujii Masao > >wrote: > >>When there is a backup_manifest in the database cluster, it's included in > >>the backup even when --no-manifest is specified. ISTM

Re: backup manifests

2020-04-09 Thread Fujii Masao
On 2020/04/09 2:35, Robert Haas wrote: On Wed, Apr 8, 2020 at 1:15 AM Fujii Masao wrote: When there is a backup_manifest in the database cluster, it's included in the backup even when --no-manifest is specified. ISTM that this is problematic because the backup_manifest is obviously not

Re: backup manifests and contemporaneous buildfarm failures

2020-04-08 Thread Tom Lane
Andrew Dunstan writes: > On 4/8/20 3:41 PM, Robert Haas wrote: >> I don't understand what the local $ENV{MSYS2_ARG_CONV_EXCL} = >> $source_ts_prefix does, > You don't want to know > See for the > gory details. I don't want to

Re: backup manifests and contemporaneous buildfarm failures

2020-04-08 Thread Andrew Dunstan
On 4/8/20 3:41 PM, Robert Haas wrote: > On Wed, Apr 8, 2020 at 1:59 PM Tom Lane wrote: >> I guess we could commit it and find out. I'm all for the simpler >> coding if it works. > I don't understand what the local $ENV{MSYS2_ARG_CONV_EXCL} = > $source_ts_prefix does, You don't want to know

Re: backup manifests and contemporaneous buildfarm failures

2020-04-08 Thread Robert Haas
On Wed, Apr 8, 2020 at 1:59 PM Tom Lane wrote: > I guess we could commit it and find out. I'm all for the simpler > coding if it works. I don't understand what the local $ENV{MSYS2_ARG_CONV_EXCL} = $source_ts_prefix does, but the remove/unlink condition was suggested by Amit Kapila on the basis

Re: backup manifests and contemporaneous buildfarm failures

2020-04-08 Thread Tom Lane
Andrew Dunstan writes: > OK, tricky, but here's what I did to get this working on fairywren. > First, on Msys2 there is a problem with name mangling. We've had to fix > this before by telling it to ignore certain argument prefixes. > Second, once that was fixed rmdir was failing on the

Re: backup manifests and contemporaneous buildfarm failures

2020-04-08 Thread Andrew Dunstan
On 4/7/20 9:42 AM, Andrew Dunstan wrote: > On Tue, Apr 7, 2020 at 12:37 AM Tom Lane wrote: >> Robert Haas writes: >>> Taking stock of the situation this morning, most of the buildfarm is >>> now green. There are three failures, on eelpout (6 hours ago), >>> fairywren (17 hours ago), and hyrax

Re: backup manifests

2020-04-08 Thread Robert Haas
On Wed, Apr 8, 2020 at 1:15 AM Fujii Masao wrote: > When there is a backup_manifest in the database cluster, it's included in > the backup even when --no-manifest is specified. ISTM that this is problematic > because the backup_manifest is obviously not valid for the backup. > So, isn't it better

Re: backup manifests

2020-04-07 Thread Fujii Masao
On 2020/04/04 4:22, Robert Haas wrote: On Thu, Apr 2, 2020 at 4:34 PM David Steele wrote: +1. These would be great tests to have and a win for pg_basebackup overall but I don't think they should be a prerequisite for this commit. Not to mention the server. I can't say that I have a lot of

Re: backup manifests and contemporaneous buildfarm failures

2020-04-07 Thread Andrew Dunstan
On Tue, Apr 7, 2020 at 12:37 AM Tom Lane wrote: > > Robert Haas writes: > > Taking stock of the situation this morning, most of the buildfarm is > > now green. There are three failures, on eelpout (6 hours ago), > > fairywren (17 hours ago), and hyrax (3 days, 7 hours ago). > > fairywren has now

Re: backup manifests and contemporaneous buildfarm failures

2020-04-07 Thread Andrew Dunstan
On Mon, Apr 6, 2020 at 1:18 AM Fabien COELHO wrote: > > > Hello, > > >> Do I need to precede those with some recursive chmod commands? Perhaps > >> the client should refuse to run if there is still something left after > >> these. > > > > I think the latter would be a very good idea, just so that

Re: backup manifests and contemporaneous buildfarm failures

2020-04-06 Thread Tom Lane
Robert Haas writes: > Taking stock of the situation this morning, most of the buildfarm is > now green. There are three failures, on eelpout (6 hours ago), > fairywren (17 hours ago), and hyrax (3 days, 7 hours ago). fairywren has now done this twice in the pg_validatebackupCheck step: exec

Re: backup manifests and contemporaneous buildfarm failures

2020-04-06 Thread Andrew Dunstan
On 4/6/20 7:53 AM, Robert Haas wrote: > On Sun, Apr 5, 2020 at 4:07 PM Andrew Dunstan > wrote: >> Do I need to precede those with some recursive chmod commands? > +1. > >> Perhaps >> the client should refuse to run if there is still something left after >> these. > +1 to that, too. > See

Re: backup manifests and contemporaneous buildfarm failures

2020-04-06 Thread Robert Haas
On Sun, Apr 5, 2020 at 4:07 PM Andrew Dunstan wrote: > Do I need to precede those with some recursive chmod commands? +1. > Perhaps > the client should refuse to run if there is still something left after > these. +1 to that, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The

Re: backup manifests and contemporaneous buildfarm failures

2020-04-05 Thread Fabien COELHO
Hello, Do I need to precede those with some recursive chmod commands? Perhaps the client should refuse to run if there is still something left after these. I think the latter would be a very good idea, just so that this sort of failure is less obscure. Not sure about whether a recursive

Re: backup manifests and contemporaneous buildfarm failures

2020-04-05 Thread Tom Lane
Andrew Dunstan writes: > Hmm, the buildfarm client does this at the beginning of each run to > remove anything that might be left over from a previous run: > rmtree("inst"); > rmtree("$pgsql") unless ($from_source && !$use_vpath); Right, the point is precisely that some versions of

Re: backup manifests and contemporaneous buildfarm failures

2020-04-05 Thread Andrew Dunstan
On 4/5/20 9:10 AM, Mikael Kjellström wrote: > On 2020-04-04 04:43, Robert Haas wrote: > >> I think I've done about as much as I can do for tonight, though. Most >> things are green now, and the ones that aren't are failing because of >> stuff that is at least plausibly fixed. By morning it

Re: backup manifests

2020-04-05 Thread Andres Freund
Hi, On 2020-04-03 15:22:23 -0400, Robert Haas wrote: > I've pushed all the patches. Seeing new warnings in an optimized build /home/andres/src/postgresql-master/src/bin/pg_validatebackup/parse_manifest.c: In function 'json_manifest_object_end':

Re: backup manifests and contemporaneous buildfarm failures

2020-04-05 Thread Mikael Kjellström
On 2020-04-04 04:43, Robert Haas wrote: I think I've done about as much as I can do for tonight, though. Most things are green now, and the ones that aren't are failing because of stuff that is at least plausibly fixed. By morning it should be clearer how much broken stuff is left, although

Re: backup manifests and contemporaneous buildfarm failures

2020-04-04 Thread Thomas Munro
On Sun, Apr 5, 2020 at 2:36 AM Robert Haas wrote: > eelpout is unhappy because: > > +WARNING: could not remove shared memory segment > "/PostgreSQL.248989127": No such file or directory > +WARNING: could not remove shared memory segment > "/PostgreSQL.1450751626": No such file or directory

Re: backup manifests and contemporaneous buildfarm failures

2020-04-04 Thread Tom Lane
Robert Haas writes: > On Sat, Apr 4, 2020 at 10:57 AM Tom Lane wrote: >> What is odd is that >> (AFAIR) we've never seen this before. Maybe somebody recently added >> an error cursor callback in a place that didn't have it before, and >> is involved in SQL-function processing? None of the

Re: backup manifests and contemporaneous buildfarm failures

2020-04-04 Thread Robert Haas
On Sat, Apr 4, 2020 at 10:57 AM Tom Lane wrote: > It's not so surprising that we could get a different result that way > from a CLOBBER_CACHE_ALWAYS animal like hyrax, since CCA-forced > cache reloads would cause extra stack expenditure at a lot of places. > And it could vary depending on totally

Re: backup manifests and contemporaneous buildfarm failures

2020-04-04 Thread Tom Lane
Robert Haas writes: > hyrax's last run was before any of this happened, so it seems to have > an unrelated problem. The last two runs, three and six days ago, both > failed like this: > -ERROR: stack depth limit exceeded > +ERROR: stack depth limit exceeded at character 8 > Not sure what

Re: backup manifests

2020-04-04 Thread Tom Lane
Robert Haas writes: > On Fri, Apr 3, 2020 at 8:18 PM Tom Lane wrote: >> I suppose that judicious s/time_t/pg_time_t/ would fix this. > I think you sent this email just after I pushed > db1531cae00941bfe4f6321fdef1e1ef355b6bed, or maybe after I'd committed > it locally and just before I pushed

Re: backup manifests and contemporaneous buildfarm failures

2020-04-04 Thread Robert Haas
On Fri, Apr 3, 2020 at 10:43 PM Robert Haas wrote: > I think I've done about as much as I can do for tonight, though. Most > things are green now, and the ones that aren't are failing because of > stuff that is at least plausibly fixed. By morning it should be > clearer how much broken stuff is

Re: backup manifests

2020-04-04 Thread Robert Haas
On Fri, Apr 3, 2020 at 8:18 PM Tom Lane wrote: > BTW, some of the buildfarm is showing a simpler portability problem: > they think you were too cavalier about the difference between time_t > and pg_time_t. (On a platform with 32-bit time_t, that's an actual > bug, probably.) lapwing is actually

Re: backup manifests and contemporaneous buildfarm failures

2020-04-04 Thread Robert Haas
On Fri, Apr 3, 2020 at 11:06 PM Andres Freund wrote: > On 2020-04-03 20:48:09 -0400, Robert Haas wrote: > > 'serinus' is also failing. This is less obviously related: > > Hm. Tests passed once since then. Yeah, but conchuela also failed once in what I think was a similar way. I suspect the fix I

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Petr Jelinek
On 04/04/2020 05:06, Andres Freund wrote: Hi, Peter, Petr, CCed you because it's probably a bug somewhere around the initial copy code for logical replication. On 2020-04-03 20:48:09 -0400, Robert Haas wrote: 'serinus' is also failing. This is less obviously related: Hm. Tests passed once

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Andres Freund
Hi, Peter, Petr, CCed you because it's probably a bug somewhere around the initial copy code for logical replication. On 2020-04-03 20:48:09 -0400, Robert Haas wrote: > 'serinus' is also failing. This is less obviously related: Hm. Tests passed once since then. > 2020-04-04 02:08:57.299 CEST

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Robert Haas
On Fri, Apr 3, 2020 at 9:52 PM Tom Lane wrote: > Robert Haas writes: > > 'prairiedog' is also unhappy, and it looks related: > > Yeah, gaur also failed in the same place. Both of those are > alignment-picky 32-bit hardware, so I'm thinking the problem is > pg_gmtime() trying to fetch a 64-bit

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Tom Lane
Robert Haas writes: > Interestingly, on my machine, rmtree coped with a mode 0 directory > just fine, but mode 0400 was more than its tiny brain could handle, so > the originally committed fix had code to revert 0400 back to 0700, but > I didn't add similar code to revert from 0 back to 0700

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Tom Lane
Robert Haas writes: > 'prairiedog' is also unhappy, and it looks related: Yeah, gaur also failed in the same place. Both of those are alignment-picky 32-bit hardware, so I'm thinking the problem is pg_gmtime() trying to fetch a 64-bit pg_time_t from an insufficiently aligned address. I'm

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Robert Haas
On Fri, Apr 3, 2020 at 8:12 PM Tom Lane wrote: > Yeah, so it would seem. The buildfarm script uses rmtree to clean out > the old build tree. The man page for File::Path suggests (but can't > quite bring itself to say in so many words) that by default, rmtree > will adjust the permissions on

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Robert Haas
On Fri, Apr 3, 2020 at 6:13 PM Tom Lane wrote: > Locally, I observe that "make clean" in src/bin/pg_validatebackup fails > to clean up the tmp_check directory left behind by "make check". Fixed. I also tried to fix 'lapwing', which was complaining about about a call to pg_gmtime, saying that it

Re: backup manifests

2020-04-03 Thread Tom Lane
BTW, some of the buildfarm is showing a simpler portability problem: they think you were too cavalier about the difference between time_t and pg_time_t. (On a platform with 32-bit time_t, that's an actual bug, probably.) lapwing is actually failing:

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Tom Lane
Robert Haas writes: > On Fri, Apr 3, 2020 at 6:48 PM Tom Lane wrote: >> I'm guessing that we're looking at a platform-specific difference in >> whether "rm -rf" fails outright on an unreadable subdirectory, or >> just tries to carry on by unlinking it anyway. > My intention was that it would be

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Robert Haas
On Fri, Apr 3, 2020 at 5:58 PM Fabien COELHO wrote: > seawasp just failed the same way. Good news, I can see "configure" under > "HEAD/pgsql". Ah, good. > The only strange thing under buildroot I found is: > >

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Robert Haas
On Fri, Apr 3, 2020 at 6:48 PM Tom Lane wrote: > I'm guessing that we're looking at a platform-specific difference in > whether "rm -rf" fails outright on an unreadable subdirectory, or > just tries to carry on by unlinking it anyway. My intention was that it would be cleaned by the TAP

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Tom Lane
I wrote: > I'm guessing that we're looking at a platform-specific difference in > whether "rm -rf" fails outright on an unreadable subdirectory, or > just tries to carry on by unlinking it anyway. Yeah... on my RHEL6 box, "make check" cleans up the working directories under tmp_check, but on a

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Tom Lane
Thomas Munro writes: > Same here, on elver. I see pg_subtrans has been chmod(0)'d, > presumably by the perl subroutine mutilate_open_directory_fails. I > see this in my inbox (the build farm wrote it to stderr or stdout > rather than the log file): > cannot chdir to child for >

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Stephen Frost
Greetings, * Thomas Munro (thomas.mu...@gmail.com) wrote: > On Sat, Apr 4, 2020 at 11:13 AM Tom Lane wrote: > > Fabien COELHO writes: > > > The only strange thing under buildroot I found is: > > > > >

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Thomas Munro
On Sat, Apr 4, 2020 at 11:13 AM Tom Lane wrote: > Fabien COELHO writes: > > The only strange thing under buildroot I found is: > > > HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans/ > > > this last directory perms are

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Alvaro Herrera
On 2020-Apr-03, Tom Lane wrote: > I wonder if VPATH versus not-VPATH might be a relevant factor ... Oh, absolutely. The ones that failed show, in the last successful run, the configure line invoked as "./configure", while the animals that are still running are invoking configure from some other

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Tom Lane
Fabien COELHO writes: > The only strange thing under buildroot I found is: > HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans/ > this last directory perms are d- which seems to break cleanup. Locally, I observe

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Fabien COELHO
Hello Robert, Done now. Meanwhile, two more machines have reported the mysterious message: sh: ./configure: not found ...that first appeared on spurfowl a few hours ago. The other two machines are eelpout and elver, both of which list Thomas Munro as a maintainer. spurfowl lists Stephen

Re: backup manifests

2020-04-03 Thread Justin Pryzby
On Fri, Apr 03, 2020 at 03:22:23PM -0400, Robert Haas wrote: > I've pushed all the patches. I didn't manage to look at this in advance but have some doc fixes. word-diff: diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 536de9a698..d84afb7b18 100644 ---

Re: backup manifests

2020-04-03 Thread Robert Haas
On Fri, Apr 3, 2020 at 4:54 PM Alvaro Herrera wrote: > Maybe it needs perl2host? *jaw drops* Wow, OK, yeah, that looks like the thing. Thanks for the suggestion; I didn't know that existed (and I kinda wish I still didn't). I'lll go see about adding that. -- Robert Haas EnterpriseDB:

Re: backup manifests

2020-04-03 Thread Alvaro Herrera
On 2020-Apr-03, Robert Haas wrote: > sub tempdir_short > { > > return File::Temp::tempdir(CLEANUP => 1); > } > > And File::Temp's documentation says that the temporary directory is > picked using File::Spec's tmpdir(), which says that it knows about > different operating systems and

Re: backup manifests

2020-04-03 Thread Robert Haas
On Fri, Apr 3, 2020 at 3:53 PM Robert Haas wrote: > 2. Also, a bunch of machines were super-unhappy with > 003_corruption.pl, failing with this sort of thing: > > pg_basebackup: error: could not get COPY data stream: ERROR: symbolic > link target too long for tar format: file name

Re: backup manifests

2020-04-03 Thread Robert Haas
On Fri, Apr 3, 2020 at 3:22 PM Robert Haas wrote: > It looks like the buildfarm is unhappy though, so I guess I'd better > go look at that. I fixed two things so far, and there seems to be at least one more possible issue that I don't understand. 1. Apparently, we have an automated perlcritic

Re: backup manifests

2020-04-03 Thread Robert Haas
On Thu, Apr 2, 2020 at 4:34 PM David Steele wrote: > +1. These would be great tests to have and a win for pg_basebackup > overall but I don't think they should be a prerequisite for this commit. Not to mention the server. I can't say that I have a lot of confidence that all of the server

Re: backup manifests

2020-04-02 Thread David Steele
On 4/2/20 3:47 PM, Andres Freund wrote: On 2020-04-02 15:42:48 -0400, Robert Haas wrote: I suspect I'm not doing quite what you had in mind here... thoughts? I have some ideas, but I think it's complicated enough that I'd not put it in the "pre commit path" for now. +1. These would be great

Re: backup manifests

2020-04-02 Thread Andres Freund
On 2020-04-02 15:42:48 -0400, Robert Haas wrote: > I suspect I'm not doing quite what you had in mind here... thoughts? I have some ideas, but I think it's complicated enough that I'd not put it in the "pre commit path" for now.

Re: backup manifests

2020-04-02 Thread Robert Haas
On Thu, Apr 2, 2020 at 2:55 PM Robert Haas wrote: > On Thu, Apr 2, 2020 at 2:23 PM Andres Freund wrote: > > > That might make the window fairly wide on normal systems, but I'm not > > > sure about Raspberry Pi BF members or things running > > > CLOBBER_CACHE_ALWAYS/RECURSIVELY. I guess I could

Re: backup manifests

2020-04-02 Thread Robert Haas
On Thu, Apr 2, 2020 at 3:26 PM David Steele wrote: > So, with the addition of the 0004 patch down-thread this looks > committable to me. Glad to hear it. Thank you. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company

Re: backup manifests

2020-04-02 Thread David Steele
On 4/2/20 1:04 PM, Robert Haas wrote: > There are still some things that not everybody is happy about. In particular, Stephen and David are unhappy about using CRC-32C as the default algorithm, but Andres and Noah both think it's a reasonable choice, even if not as robust as everybody will want.

Re: backup manifests

2020-04-02 Thread Andres Freund
On 2020-04-02 14:55:19 -0400, Robert Haas wrote: > > Yes, I am asking for something to be changed: I'd like the code that > > read()s the file when computing the checksum to add up how many bytes > > were read, and compare that to the size in the manifest. And if there's > > a difference report an

Re: backup manifests

2020-04-02 Thread Andres Freund
Hi, On 2020-04-02 14:16:27 -0400, Robert Haas wrote: > On Thu, Apr 2, 2020 at 1:23 PM Andres Freund wrote: > > I suspect its possible to control the timing by preventing the > > checkpoint at the end of recovery from completing within a relevant > > timeframe. I think configuring a large

Re: backup manifests

2020-04-02 Thread Robert Haas
On Thu, Apr 2, 2020 at 1:23 PM Andres Freund wrote: > I suspect its possible to control the timing by preventing the > checkpoint at the end of recovery from completing within a relevant > timeframe. I think configuring a large checkpoint_timeout and using a > non-fast base backup ought to do the

Re: backup manifests

2020-04-02 Thread Andres Freund
Hi, On 2020-04-02 13:04:45 -0400, Robert Haas wrote: > And here's another new patch set. After some experimentation, I was > able to manually test the timeline-switch-during-a-base-backup case > and found that it had bugs in both pg_validatebackup and the code I > added to the backend's

Re: backup manifests

2020-04-01 Thread David Steele
On 3/31/20 7:57 AM, Robert Haas wrote: On Mon, Mar 30, 2020 at 7:24 PM David Steele wrote: I'm confused as to why you're not seeing that. What's the exact sequence of steps? $ pg_basebackup -D test/backup5 --manifest-checksums=SHA256 $ vi test/backup5/backup_manifest * Add 'X' to the

Re: backup manifests

2020-04-01 Thread Andres Freund
Hi, On 2020-03-31 14:56:07 +0530, Amit Kapila wrote: > On Tue, Mar 31, 2020 at 11:10 AM Noah Misch wrote: > > On Mon, Mar 30, 2020 at 12:16:31PM -0700, Andres Freund wrote: > > > On 2020-03-30 15:04:55 -0400, Robert Haas wrote: > > > I'm mildly inclined to name it pg_validate, pg_validate_dbdir

Re: backup manifests

2020-04-01 Thread Andres Freund
Hi, On 2020-03-31 22:15:04 -0700, Noah Misch wrote: > On Tue, Mar 31, 2020 at 03:50:34PM -0700, Andres Freund wrote: > > On 2020-03-31 14:10:34 -0400, Robert Haas wrote: > > > +/* > > > + * Attempt to parse the WAL files required to restore from backup using > > > + * pg_waldump. > > > + */ > > >

Re: backup manifests

2020-03-31 Thread Noah Misch
On Tue, Mar 31, 2020 at 03:50:34PM -0700, Andres Freund wrote: > On 2020-03-31 14:10:34 -0400, Robert Haas wrote: > > +/* > > + * Attempt to parse the WAL files required to restore from backup using > > + * pg_waldump. > > + */ > > +static void > > +parse_required_wal(validator_context *context,

Re: backup manifests

2020-03-31 Thread Andres Freund
Hi, On 2020-03-31 14:10:34 -0400, Robert Haas wrote: > I made an attempt to implement this. Awesome! > In the attached patch set, 0001 I'm going to work on those things. I > would appreciate *very timely* feedback on anything people do or do > not like about this, because I want to commit this

Re: backup manifests

2020-03-31 Thread Stephen Frost
Greetings, * Amit Kapila (amit.kapil...@gmail.com) wrote: > On Tue, Mar 31, 2020 at 11:10 AM Noah Misch wrote: > > On Mon, Mar 30, 2020 at 12:16:31PM -0700, Andres Freund wrote: > > > On 2020-03-30 15:04:55 -0400, Robert Haas wrote: > > > > I guess I'd like to be clear here that I have no

Re: backup manifests

2020-03-31 Thread Robert Haas
On Mon, Mar 30, 2020 at 7:24 PM David Steele wrote: > > I'm confused as to why you're not seeing that. What's the exact > > sequence of steps? > > $ pg_basebackup -D test/backup5 --manifest-checksums=SHA256 > > $ vi test/backup5/backup_manifest > * Add 'X' to the checksum of backup_label > >

Re: backup manifests

2020-03-31 Thread Amit Kapila
On Tue, Mar 31, 2020 at 11:10 AM Noah Misch wrote: > > On Mon, Mar 30, 2020 at 12:16:31PM -0700, Andres Freund wrote: > > On 2020-03-30 15:04:55 -0400, Robert Haas wrote: > > > I guess I'd like to be clear here that I have no fundamental > > > disagreement with taking this tool in any direction

Re: backup manifests

2020-03-30 Thread Noah Misch
On Mon, Mar 30, 2020 at 12:16:31PM -0700, Andres Freund wrote: > On 2020-03-30 15:04:55 -0400, Robert Haas wrote: > > I guess I'd like to be clear here that I have no fundamental > > disagreement with taking this tool in any direction that people would > > like it to go. For me it's just a

Re: backup manifests

2020-03-30 Thread David Steele
On 3/30/20 4:16 PM, Robert Haas wrote: On Fri, Mar 27, 2020 at 3:51 PM David Steele wrote: > { "Path": "backup_label", "Size": 224, "Last-Modified": "2020-03-27 18:33:18 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "b914bec9" }, Storing the checksum type with each file seems pretty

Re: backup manifests

2020-03-30 Thread David Steele
On 3/30/20 5:08 PM, Andres Freund wrote: The data in the backup label isn't sufficient though. Without having parsed the timeline file there's no way to verify that the correct WAL is present. I guess we can also add client side tools to parse timelines, add command the fetch all of the

Re: backup manifests

2020-03-30 Thread Andres Freund
Hi, On 2020-03-30 15:23:08 -0400, Robert Haas wrote: > On Mon, Mar 30, 2020 at 2:59 PM Andres Freund wrote: > > I wonder if it'd not be best, independent of whether we build in this > > verification, to include that metadata in the manifest file. That's for > > sure better than having to build a

Re: backup manifests

2020-03-30 Thread Robert Haas
On Sun, Mar 29, 2020 at 9:05 PM David Steele wrote: > Yeah, that seems reasonable. > > In our case backups are nearly always compressed and/or encrypted so > even checking the original size is a bit of work. Getting the checksum > at the same time seems like an obvious win. Makes sense. If this

Re: backup manifests

2020-03-30 Thread Robert Haas
On Mon, Mar 30, 2020 at 2:59 PM Andres Freund wrote: > I wonder if it'd not be best, independent of whether we build in this > verification, to include that metadata in the manifest file. That's for > sure better than having to build a separate tool to parse timeline > history files. I don't

Re: backup manifests

2020-03-30 Thread Andres Freund
Hi, On 2020-03-30 15:04:55 -0400, Robert Haas wrote: > I guess I'd like to be clear here that I have no fundamental > disagreement with taking this tool in any direction that people would > like it to go. For me it's just a question of timing. Feature freeze > is now a week or so away, and

Re: backup manifests

2020-03-30 Thread Robert Haas
On Mon, Mar 30, 2020 at 2:24 AM Amit Kapila wrote: > > Between those two, I would use "pg_validatebackup" if there's a fair chance > > it > > will end up doing the pg_waldump check. Otherwise, I would use > > "pg_validatemanifest". > > +1. I guess I'd like to be clear here that I have no

Re: backup manifests

2020-03-30 Thread Andres Freund
Hi, On 2020-03-30 14:35:40 -0400, Robert Haas wrote: > On Sun, Mar 29, 2020 at 10:08 PM Andres Freund wrote: > > See the attached minimal prototype for what I am thinking of. > > > > This would not correctly handle the case where the timeline changes > > while taking a base backup. But I'm not

Re: backup manifests

2020-03-30 Thread Robert Haas
On Sun, Mar 29, 2020 at 10:08 PM Andres Freund wrote: > See the attached minimal prototype for what I am thinking of. > > This would not correctly handle the case where the timeline changes > while taking a base backup. But I'm not sure that'd be all that serious > a limitation for now? > > I'd

Re: backup manifests

2020-03-30 Thread Amit Kapila
On Mon, Mar 30, 2020 at 11:28 AM Noah Misch wrote: > > On Sun, Mar 29, 2020 at 08:42:35PM -0400, Robert Haas wrote: > > > But it feels clear that the name pg_validatebackup is not going > > over very well with anyone. I think I should rename it to > > pg_validatemanifest. > > Between those two, I

Re: backup manifests

2020-03-30 Thread Noah Misch
On Sun, Mar 29, 2020 at 08:42:35PM -0400, Robert Haas wrote: > On Sat, Mar 28, 2020 at 11:40 PM Noah Misch wrote: > > I think this functionality doesn't belong in its own program. If you > > suspect > > pg_basebackup or pg_restore will eventually gain the ability to merge > > incremental

Re: backup manifests

2020-03-29 Thread Andres Freund
Hi, On 2020-03-29 21:23:06 -0400, David Steele wrote: > On 3/29/20 9:07 PM, Andres Freund wrote: > > On 2020-03-29 20:42:35 -0400, Robert Haas wrote: > > > > What do you think of having the verification process also call > > > > pg_waldump to > > > > validate the WAL CRCs (shown upthread)? That

Re: backup manifests

2020-03-29 Thread David Steele
On 3/29/20 9:07 PM, Andres Freund wrote: On 2020-03-29 20:42:35 -0400, Robert Haas wrote: What do you think of having the verification process also call pg_waldump to validate the WAL CRCs (shown upthread)? That looked helpful and simple. I don't love calls to external binaries, but I think

Re: backup manifests

2020-03-29 Thread Andres Freund
Hi, On 2020-03-29 20:42:35 -0400, Robert Haas wrote: > > What do you think of having the verification process also call pg_waldump to > > validate the WAL CRCs (shown upthread)? That looked helpful and simple. > > I don't love calls to external binaries, but I think the thing that > really

Re: backup manifests

2020-03-29 Thread David Steele
On 3/29/20 8:47 PM, Robert Haas wrote: On Fri, Mar 27, 2020 at 4:02 PM David Steele wrote: I prefer to validate the size and checksum in the same pass, but I'm not sure it's that big a deal. If the backup is being corrupted under the validate process that would also apply to files that had

Re: backup manifests

2020-03-29 Thread Andres Freund
Hi, On 2020-03-29 20:47:40 -0400, Robert Haas wrote: > Maybe that was the wrong idea, but I thought people would like the > idea of running cheaper checks first. I wasn't worried about > concurrent modification of the backup because then you're super-hosed > no matter what. I do like that

Re: backup manifests

2020-03-29 Thread David Steele
On 3/29/20 8:42 PM, Robert Haas wrote: On Sat, Mar 28, 2020 at 11:40 PM Noah Misch wrote: I don't see this freed anywhere; is it? (It's useful to make peak memory consumption not grow in proportion to the number of files backed up.) We need the hash table to remain populated for the whole

Re: backup manifests

2020-03-29 Thread David Steele
On 3/29/20 8:33 PM, Robert Haas wrote: On Fri, Mar 27, 2020 at 4:32 PM Andres Freund wrote: FWIW, I was thinking of backup_manifest.checksum potentially being desirable for another reason: The need to embed the checksum inside the document imo adds a fair bit of rigidity to the file format.

Re: backup manifests

2020-03-29 Thread Robert Haas
On Fri, Mar 27, 2020 at 4:02 PM David Steele wrote: > I prefer to validate the size and checksum in the same pass, but I'm not > sure it's that big a deal. If the backup is being corrupted under the > validate process that would also apply to files that had already been > validated. I did it

Re: backup manifests

2020-03-29 Thread Robert Haas
On Sat, Mar 28, 2020 at 11:40 PM Noah Misch wrote: > Stephen Frost mentioned that a backup could pass validation even if > pg_basebackup were killed after writing the base backup and before finishing > the writing of pg_wal. One might avoid that by simply writing the manifest to > a temporary

  1   2   3   >