On Wed, Aug 21, 2024 at 7:08 AM Amul Sul <sula...@gmail.com> wrote: > I have reworked a few comments, revised error messages, and made some > minor tweaks in the attached version.
Thanks. > Additionally, I would like to discuss a couple of concerns regarding > error placement and function names to gather your suggestions. > > 0007 patch: Regarding error placement: > > 1. I'm a bit unsure about the (bytes_read != m->size) check that I > placed in verify_checksum() and whether it's in the right place. Per > our previous discussion, this check is applicable to plain backup > files since they can change while being read, but not for files > belonging to tar backups. For consistency, I included the check for > tar backups as well, as it doesn't cause any harm. Is it okay to keep > this check in verify_checksum(), or should I move it back to > verify_file_checksum() and apply it only to the plain backup format? I think it's a good sanity check. For a long time I thought it was triggerable until I eventually realized that you just get this message if the file size is wrong: pg_verifybackup: error: "pg_xact/0000" has size 8203 on disk but size 8192 in the manifest After realizing that, I agree with you that this doesn't really seem reachable for tar backups, but I don't think it hurts anything either. While I was investigating this question, I discovered this problem: $ pg_basebackup -cfast -Ft -Dx $ pg_verifybackup -n x backup successfully verified $ mkdir x/tmpdir $ tar -C x/tmpdir -xf x/base.tar $ rm x/base.tar $ tar -C x/tmpdir -cf x/base.tar . $ pg_verifybackup -n x <lots of errors> It appears that the reason why this fails is that the paths in the original base.tar from the server do not include "./" at the beginning, and the ones that I get when I create my own tarfile have that. But that shouldn't matter. Anyway, I was able to work around it like this: $ tar -C x/tmpdir -cf x/base.tar `(cd x/tmpdir; echo *)` Then the result verifies. But I feel like we should have some test cases that do this kind of stuff so that there is automated verification. In fact, the current patch seems to have no negative test cases at all. I think we should test all the cases in 003_corruption.pl with tar format backups as well as with plain format backups, which we could do by untarring one of the archives, messing something up, and then retarring it. I also think we should have some negative test case specific to tar-format backup. I suggest running a coverage analysis and trying to craft test cases that hit as much of the code as possible. There will probably be some errors you can't hit, but you should try to hit the ones you can. > 2. For the verify_checksum() function, I kept the argument name as > bytes_read. Should we rename it to something more meaningful like > computed_bytes, computed_size, or checksum_computed_size? I think it's fine the way you have it. > 0011 patch: Regarding function names: > > 1. named the function verify_tar_backup_file() to align with > verify_plain_backup_file(), but it does not perform the complete > verification as verify_plain_backup_file does. Not sure if it is the > right name. I was thinking of something like precheck_tar_backup_file(). > 2. verify_tar_file_contents() is the second and final part of tar > backup verification. Should its name be aligned with > verify_tar_backup_file()? I’m unsure what the best name would be. > Perhaps verify_tar_backup_file_final(), but then > verify_tar_backup_file() would need to be renamed to something like > verify_tar_backup_file_initial(), which might be too lengthy. verify_tar_file_contents() actually verifies the contents of all the tar files, not just one, so the name is a bit confusing. Maybe verify_all_tar_files(). > 3. verify_tar_contents() is the core of verify_tar_file_contents() > that handles the actual verification. I’m unsure about the current > naming. Should we rename it to something like > verify_tar_contents_core()? It wouldn’t be an issue if we renamed > verify_tar_file_contents() as pointed in point #2. verify_one_tar_file()? But with those renames, I think you really start to see why I'm not very comfortable with verify_backup_directory(). The tar and plain format cases aren't really doing the same thing. We're just gluing them into a single function anyway. I am also still uncomfortable with the way you've refactored some of this so that we end up with very small amounts of code far away from other code that they influence. Like you end up with this: /* Check the backup manifest entry for this file. */ m = verify_manifest_entry(context, relpath, sb.st_size); /* Validate the pg_control information */ if (should_verify_control_data(context->manifest, m)) ... if (show_progress && !context->skip_checksums && should_verify_checksum(m)) But verify_manifest_entry can return NULL or it can set m->bad and either of those change the result of should_verify_control_data() and should_verify_checksum(), but none of that is obvious when you just look at this. Admittedly, the code in master isn't brilliant in terms of making it obvious what's happening either, but I think this is worse. I'm not really sure what I think we should do about that yet, but I'm uncomfortable with it. -- Robert Haas EDB: http://www.enterprisedb.com