retitle 1010024 pristine-tar: fails to handle paths with non-ASCII in quit
Hi,I spent some more time looking at this. The problem that's causing this particular issue is that unquote_filename(); doesn't handle the escaped high-bit characters properly. That eventually ends up with the \[0-7]{3} escapes being treated literally.
The attached tarball (rclone_1.60.0.orig.tar.gz) demonstrates the problem. A minimal works-for-me patch is diff --git a/pristine-tar b/pristine-tar index 081dca1..4810215 100755 --- a/pristine-tar +++ b/pristine-tar @@ -370,6 +370,7 @@ sub unquote_filename { $filename =~ s/\\t/\t/g; $filename =~ s/\\v/\x11/g; $filename =~ s/\\\\/\\/g; + $filename =~ s/\\([0-7]{3})/chr oct $1/eg; return $filename; }...but actually there's a deeper problem here, as alluded to in https://salsa.debian.org/debian/pristine-tar/-/merge_requests/4
which is that pristine-tar is now somewhat confused as to whether the manifest entries are meant to be quoted or not (which results in patches like the fix to #933031). The problem is that either naive approach doesn't work:
i) if you use quoted paths, you then cannot use --verbatim-files-from (since it doesn't unquote them) and lose on paths starting -
ii) if you used unquoted paths, then you need to use --verbatim-files-from (otherwise you get stuck on paths starting -) and then lose on paths containing newline
Instead, what pristine-tar needs to do is to take quoted paths, unquote them and put \0 between records (rather than \n), and then use the resulting manifest with tar --null -T
To demonstrate, see the attached hazard.tar.gz, which contains one file: -bar/test\nnewline\\x2fooThere is no argument to tar -tf that produces a manifest file you can feed to tar -T [0] - a transcript demonstrating is attached (paste_1259177.txt).
If, however, you follow my take quoted -> unquote -> null-separate approach, it works; the attached mangle.pl (taking the fixed unquote_filename from https://salsa.debian.org/debian/pristine-tar/-/merge_requests/4 and genmanifest from pristine-tar modified to put NULL between records) makes a manifest "haz_zero" which you can then use with:
tar -xf hazard.tar.gz --null -T haz_zeroSo I think this is the approach that pristine-tar needs to take; that would also I think mean we can remove the slightly hack fix to 851286.
[I will look at trying to update the above-mentioned MR] HTH, Matthew[0] with old enough tar (1.29 or earlier) this is not quite true, as --verbatim-files-from used to unescape; that was "fixed" for 1.30
rclone_1.60.0.orig.tar.gz
Description: application/gzip
matthew@tsk:~/hazard$ ls hazard.tar.gz matthew@tsk:~/hazard$ tar -tf hazard.tar.gz -bar/test\nnewline\\x2foo matthew@tsk:~/hazard$ tar -xf hazard.tar.gz matthew@tsk:~/hazard$ ls -- -bar 'test'$'\n''newline\x2foo' matthew@tsk:~/hazard$ rm -rf -- -bar matthew@tsk:~/hazard$ tar -tf hazard.tar.gz > haz_quoted matthew@tsk:~/hazard$ tar -tf hazard.tar.gz --quoting-style=literal > haz_unquoted matthew@tsk:~/hazard$ cat -vet haz_quoted -bar/test\nnewline\\x2foo$ matthew@tsk:~/hazard$ cat -vet haz_unquoted -bar/test$ newline\x2foo$ matthew@tsk:~/hazard$ cat haz_quoted -bar/test\nnewline\\x2foo matthew@tsk:~/hazard$ cat haz_unquoted -bar/test newline\x2foo matthew@tsk:~/hazard$ tar -xf hazard.tar.gz -T haz_quoted tar: haz_quoted:1: unrecognized option tar: Exiting with failure status due to previous errors matthew@tsk:~/hazard$ tar -xf hazard.tar.gz --verbatim-files-from -T haz_quoted tar: -bar/test\\nnewline\\\\x2foo: Not found in archive tar: Exiting with failure status due to previous errors matthew@tsk:~/hazard$ tar -xf hazard.tar.gz -T haz_unquoted tar: haz_unquoted:1: unrecognized option tar: newline\\x2foo: Not found in archive tar: Exiting with failure status due to previous errors matthew@tsk:~/hazard$ tar -xf hazard.tar.gz --verbatim-files-from -T haz_unquoted tar: -bar/test: Not found in archive tar: newline\\x2foo: Not found in archive tar: Exiting with failure status due to previous errors
mangle.pl
Description: Perl program
hazard.tar.gz
Description: application/gzip