In 025584a168a4b3002e19350bb8db0ebf1fd10235, which shipped with v17, I changed the way that a base backup decides whether files have checksums. At the time, I thought this was harmless refactoring, but it turns out that it was better than harmless. The old way can cause pg_basebackup failures. To reproduce, I believe you just need:
1. A cluster created with v16 or earlier with checksums enabled (i.e. initdb -k). 2. A file inside "global", "base", or a tablespace directory whose name contains a period which is followed by something that atoi() is unable to convert to a positive number. Then pg_basebackup will fail like this: pg_basebackup: error: could not get COPY data stream: ERROR: invalid segment number 0 in file "pg_control.old" One way to fix this is to back-port 025584a168a4b3002e19350bb8db0ebf1fd10235. Before that commit, we assumed all files had checksums unless the filename like one of the files that we know isn't supposed to be checksummed. After that commit, we assume files don't have checksums unless the file format looks like something we know should be checksummed. That inherently avoids trying to checksum things we're not expecting to see, such as a pg_control.old file. If we want a narrower and thus less-risky fix, we could consider just adjusting this code here: segmentno = atoi(segmentpath + 1); if (segmentno == 0) ereport(ERROR, (errmsg("invalid segment number %d in file \"%s\"", segmentno, filename))); I think we could just replace the ereport() with verify_checksum = false (and update the comments). That would leave open the possibility of trying to checksum some random file with a name like this_file_should_not_be_here, which we'll interpret as segment 0 because the name does not contain a dot; but if the file were called this.file.should.not.be.here, we'd not try to checksum it at all because "here" is not an integer. That's a little random, but it avoids taking a position on whether 025584a168a4b3002e19350bb8db0ebf1fd10235 is fully correct. It simply takes the narrower position that we shouldn't fail the entire backup because we're unable to perform a checksum calculation that in the worst case would only produce a warning anyway. At the moment, I think I slightly prefer the narrower fix, but it might be a little too narrow, since the resultant behavior as described in the preceding paragraph is somewhat wonky and nonsensical. Opinions? -- Robert Haas EDB: http://www.enterprisedb.com