On 4/13/20 4:14 PM, Robert Haas wrote:
On Mon, Apr 13, 2020 at 3:34 PM Alvaro Herrera <alvhe...@2ndquadrant.com> wrote:

Also, I
see no mention of prettification-chars such as newlines or indentation.
I suppose if I pass a manifest file through prettification (or Windows
newline conversion), the checksum may break.

It would indeed break. I'm not sure what you want me to say here,
though. If you're trying to parse a manifest, you shouldn't care about
how the whitespace is arranged. If you're trying to generate one, you
can arrange it any way you like, as long as you also include it in the
checksum.

pgBackRest ignores whitespace but this is a legacy of the way Perl calculated checksums, not an intentional feature. This worked well when the manifest was loaded as a whole, converted to JSON, and checksummed, but it is a major pain for the streaming code we now have in C.

I guarantee that that our next manifest version will do a simple checksum of bytes as Robert has done in this feature.

So, I'm +1 as implemented.

Why is the top-level checksum only allowed to be SHA-256, if the files
can use up to SHA-512?

<snip>

I agree that it's a little bit weird that you can have a stronger
checksum for the files instead of the manifest itself, but I also
wonder what the use case would be for using a stronger checksum on the
manifest. David Steele argued that strong checksums on the files could
be useful to software that wants to rifle through all the backups
you've ever taken and find another copy of that file by looking for
something with a matching checksum. CRC-32C wouldn't be strong enough
for that, because eventually you could have enough files that you
start to have collisions. The SHA algorithms output enough bits to
make that quite unlikely. But this argument only makes sense for the
files, not the manifest.

Agreed. I think SHA-256 is *more* than enough to protect the manifest against corruption. That said, since the cost of SHA-256 vs. SHA-512 in the context on the manifest is negligible we could just use the stronger algorithm to deflect a similar question going forward.

That choice might not age well, but we could always say, well, we picked it because it was the strongest available at the time. Allowing a choice of which algorithm to use for to manifest checksum seems like it will just make verifying the file harder with no tangible benefit.

Maybe just a comment in the docs about why SHA-256 was used would be fine.

(Also, did we intentionally omit the dash in
hash names, so "SHA-256" to make it SHA256?  This will also be critical
for checksumming the manifest itself.)

I debated this with myself, settled on this spelling, and nobody
complained until now. It could be changed, though. I didn't have any
particular reason for choosing it except the feeling that people would
probably prefer to type --manifest-checksum=sha256 rather than
--manifest-checksum=sha-256.

+1 for sha256 rather than sha-256.

Regards,
--
-David
da...@pgmasters.net


Reply via email to