Re: [PATCH 1/3] t5004: test ZIP archives with many entries
Eric Sunshine sunsh...@sunshineco.com writes: On Sun, Aug 23, 2015 at 5:29 AM, René Scharfe l@web.de wrote: I suspected that zipinfo's output might be formatted differently on different platforms and tried to guard against it by checking for the number zero there. Git's ZIP file creation is platform independent (modulo bugs), so having a test run at least somewhere should suffice. In theory. We could add support for the one-line-summary variant on OS X easily, though. Probably, although it's looking like testing on Mac OS X won't be fruitful (see below). Can we move this topic forward by introducing a new prerequisite ZIPINFO and used at the beginning of these tests (make it a lazy prereq)? Run zipinfo on a trivial archive and see if its output is something we recognize to decide if the platform supports that ZIPINFO prerequisite and do this test only on them. After all, what _is_ being tested, i.e. our archive creation, would not change across platforms, so having a test that runs on a known subset of platforms is better than not having anything at all. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] t5004: test ZIP archives with many entries
Junio C Hamano gits...@pobox.com writes: Eric Sunshine sunsh...@sunshineco.com writes: On Sun, Aug 23, 2015 at 5:29 AM, René Scharfe l@web.de wrote: I suspected that zipinfo's output might be formatted differently on different platforms and tried to guard against it by checking for the number zero there. Git's ZIP file creation is platform independent (modulo bugs), so having a test run at least somewhere should suffice. In theory. We could add support for the one-line-summary variant on OS X easily, though. Probably, although it's looking like testing on Mac OS X won't be fruitful (see below). Can we move this topic forward by introducing a new prerequisite ZIPINFO and used at the beginning of these tests (make it a lazy prereq)? Run zipinfo on a trivial archive and see if its output is something we recognize to decide if the platform supports that ZIPINFO prerequisite and do this test only on them. Heh, that is exactly what the patch under discussion does. So... After all, what _is_ being tested, i.e. our archive creation, would not change across platforms, so having a test that runs on a known subset of platforms is better than not having anything at all. Thanks. ...I'd say we can take this patch as-is, and those who want to have a working test on MacOS can come up with an enhancement to the way the script parses output from zipinfo that would also work on their platforms. Thanks and sorry for the noise ;-) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] t5004: test ZIP archives with many entries
On Fri, Aug 28, 2015 at 11:57 AM, Junio C Hamano gits...@pobox.com wrote: Junio C Hamano gits...@pobox.com writes: Eric Sunshine sunsh...@sunshineco.com writes: On Sun, Aug 23, 2015 at 5:29 AM, René Scharfe l@web.de wrote: I suspected that zipinfo's output might be formatted differently on different platforms and tried to guard against it by checking for the number zero there. Git's ZIP file creation is platform independent (modulo bugs), so having a test run at least somewhere should suffice. In theory. We could add support for the one-line-summary variant on OS X easily, though. Probably, although it's looking like testing on Mac OS X won't be fruitful (see below). Can we move this topic forward by introducing a new prerequisite ZIPINFO and used at the beginning of these tests (make it a lazy prereq)? Run zipinfo on a trivial archive and see if its output is something we recognize to decide if the platform supports that ZIPINFO prerequisite and do this test only on them. Heh, that is exactly what the patch under discussion does. So... After all, what _is_ being tested, i.e. our archive creation, would not change across platforms, so having a test that runs on a known subset of platforms is better than not having anything at all. ...I'd say we can take this patch as-is, and those who want to have a working test on MacOS can come up with an enhancement to the way the script parses output from zipinfo that would also work on their platforms. Right, the new test is correctly skipped on Mac OS X and FreeBSD, so the patch is suitable as-is. We might, however, want to augment the commit message with some of the knowledge learned from this thread. Perhaps modify the last sentence of the second paragraph and then insert additional information following it, like this? ... at least provides *some* way to check this field, although presently only on Linux. zipinfo on current Mac OS X (Yosemite 10.10.5) does not support this field, and, when encountered, caps the printed file count at 65535 (and spits out warnings and errors), thus is not useful for testing. (Its output also differs from zipinfo on Linux, thus requires changes to the 'sed' recognition and extraction expressions, but that's a minor issue.) zipinfo on FreeBSD seems to have been retired altogether in favor of unzip -Z, however, only in the emasculated form unzip -Z -1 which lists archive entries but does not provide a file count, thus is not useful for this test. (I also snuck a s/can// fix in there for the last sentence of the second paragraph.) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] t5004: test ZIP archives with many entries
On Sun, Aug 23, 2015 at 5:29 AM, René Scharfe l@web.de wrote: Am 23.08.2015 um 07:54 schrieb Eric Sunshine: On Sat, Aug 22, 2015 at 3:06 PM, René Scharfe l@web.de wrote: +test_lazy_prereq ZIPINFO ' + n=$($ZIPINFO $TEST_DIRECTORY/t5004/empty.zip | sed -n 2s/.* //p) + test x$n = x0 +' Unfortunately, this sed expression isn't portable due to dissimilar output of various zipinfo implementations. On Linux, the output of zipinfo is: $ zipinfo t/t5004/empty.zip Archive: t/t5004/empty.zip Zip file size: 62 bytes, number of entries: 0 Empty zipfile. $ however, on Mac OS X: $ zipinfo t/t5004/empty.zip Archive: t/t5004/empty.zip 62 bytes 0 files Empty zipfile. $ and on FreeBSD, the zipinfo command seems to have been removed altogether in favor of unzip -Z (emulate zipinfo). I suspected that zipinfo's output might be formatted differently on different platforms and tried to guard against it by checking for the number zero there. Git's ZIP file creation is platform independent (modulo bugs), so having a test run at least somewhere should suffice. In theory. We could add support for the one-line-summary variant on OS X easily, though. Probably, although it's looking like testing on Mac OS X won't be fruitful (see below). One might hope that unzip -Z would be a reasonable replacement for zipinfo, however, it is apparently only partially implemented on FreeBSD, and requires that -1 be passed, as well. Even with unzip -Z -1, there are issues. The output on Linux and Mac OS X is: $ unzip -Z -1 t/t5004/empty.zip Empty zipfile. $ but FreeBSD differs: $ unzip -Z -1 t/t5004/empty.zip $ With a non-empty zip file, the output is identical on all platforms: $ unzip -Z -1 twofiles.zip file1 file2 $ So, if you combine that with wc -l or test_line_count, you may have a portable and reliable entry counter. Counting all entries is slow, and more importantly it's not what we want. In this test we need the number of entries recorded in the ZIP directory, not the actual number of entries found by scanning the archive, or the directory. Ah, right. The commit message did state this clearly enough... On Linux unzip -Z -1 many.zip | wc -l reports 65792 even before adding ZIP64 support; only without -1 we get the interesting numbers (specifically with unzip -Z many.zip | sed -n '2p;$p'): Zip file size: 6841366 bytes, number of entries: 256 65792 files, 0 bytes uncompressed, 0 bytes compressed: 0.0% With these three patches applied, Mac OS X has trouble with 'many.zip': $ unzip -Z -1 many.zip warning [many.zip]: 76 extra bytes at beginning or within zipfile (attempting to process anyway) error [many.zip]: reported length of central directory is -76 bytes too long (Atari STZip zipfile? J.H.Holm ZIPSPLIT 1.1 zipfile?). Compensating... 00/ 00/00 ... ff/ff error: expected central file header signature not found (file #65793). (please check that you have transferred or created the zipfile in the appropriate BINARY mode and that you have compiled UnZip properly) And FreeBSD doesn't like it either: $ unzip -Z -1 many.zip unzip: Invalid central directory signature $ Looks like they don't support ZIP64. Or I got some of the fields wrong after all. A 65536 file zip created on Mac OS X with Mac's zip command given to unzip or zipinfo results in exactly the same warnings/errors as above (including the bit about 76 extra bytes and -76 bytes too long), so it doesn't seem to be a problem with your implementation. https://en.wikipedia.org/wiki/Zip_%28file_format%29#ZIP64 says: OS X Yosemite does support the creation of ZIP64 archives, but does not support unzipping these archives using the shipped unzip command-line utility or graphical Archive Utility.[citation needed]. How does unzip react to a ZIP file with more than 65535 entries that was created natively on these platforms? And what does zipinfo (a real one, without -1) report at the top for such files? On Mac OS X, unzip does extract all the files (although complains as noted above). zipinfo caps out at reporting 65535 for the number of files (although it lists them all fine). With the warnings/errors filtered out for clarity: $ zipinfo biggy.zip Archive: biggy.zip 9642874 bytes 65535 files ... -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] t5004: test ZIP archives with many entries
Am 23.08.2015 um 07:54 schrieb Eric Sunshine: On Sat, Aug 22, 2015 at 3:06 PM, René Scharfe l@web.de wrote: diff --git a/t/t5004-archive-corner-cases.sh b/t/t5004-archive-corner-cases.sh index 654adda..c6bd729 100755 --- a/t/t5004-archive-corner-cases.sh +++ b/t/t5004-archive-corner-cases.sh @@ -115,4 +115,44 @@ test_expect_success 'archive empty subtree by direct pathspec' ' check_dir extract sub ' +ZIPINFO=zipinfo + +test_lazy_prereq ZIPINFO ' + n=$($ZIPINFO $TEST_DIRECTORY/t5004/empty.zip | sed -n 2s/.* //p) + test x$n = x0 +' Unfortunately, this sed expression isn't portable due to dissimilar output of various zipinfo implementations. On Linux, the output of zipinfo is: $ zipinfo t/t5004/empty.zip Archive: t/t5004/empty.zip Zip file size: 62 bytes, number of entries: 0 Empty zipfile. $ however, on Mac OS X: $ zipinfo t/t5004/empty.zip Archive: t/t5004/empty.zip 62 bytes 0 files Empty zipfile. $ and on FreeBSD, the zipinfo command seems to have been removed altogether in favor of unzip -Z (emulate zipinfo). Thanks for your thorough checks! I suspected that zipinfo's output might be formatted differently on different platforms and tried to guard against it by checking for the number zero there. Git's ZIP file creation is platform independent (modulo bugs), so having a test run at least somewhere should suffice. In theory. We could add support for the one-line-summary variant on OS X easily, though. One might hope that unzip -Z would be a reasonable replacement for zipinfo, however, it is apparently only partially implemented on FreeBSD, and requires that -1 be passed, as well. Even with unzip -Z -1, there are issues. The output on Linux and Mac OS X is: $ unzip -Z -1 t/t5004/empty.zip Empty zipfile. $ but FreeBSD differs: $ unzip -Z -1 t/t5004/empty.zip $ With a non-empty zip file, the output is identical on all platforms: $ unzip -Z -1 twofiles.zip file1 file2 $ So, if you combine that with wc -l or test_line_count, you may have a portable and reliable entry counter. Counting all entries is slow, and more importantly it's not what we want. In this test we need the number of entries recorded in the ZIP directory, not the actual number of entries found by scanning the archive, or the directory. On Linux unzip -Z -1 many.zip | wc -l reports 65792 even before adding ZIP64 support; only without -1 we get the interesting numbers (specifically with unzip -Z many.zip | sed -n '2p;$p'): Zip file size: 6841366 bytes, number of entries: 256 65792 files, 0 bytes uncompressed, 0 bytes compressed: 0.0% With these three patches applied, Mac OS X has trouble with 'many.zip': $ unzip -Z -1 many.zip warning [many.zip]: 76 extra bytes at beginning or within zipfile (attempting to process anyway) error [many.zip]: reported length of central directory is -76 bytes too long (Atari STZip zipfile? J.H.Holm ZIPSPLIT 1.1 zipfile?). Compensating... 00/ 00/00 ... ff/ff error: expected central file header signature not found (file #65793). (please check that you have transferred or created the zipfile in the appropriate BINARY mode and that you have compiled UnZip properly) And FreeBSD doesn't like it either: $ unzip -Z -1 many.zip unzip: Invalid central directory signature $ Looks like they don't support ZIP64. Or I got some of the fields wrong after all. https://en.wikipedia.org/wiki/Zip_%28file_format%29#ZIP64 says: OS X Yosemite does support the creation of ZIP64 archives, but does not support unzipping these archives using the shipped unzip command-line utility or graphical Archive Utility.[citation needed]. How does unzip react to a ZIP file with more than 65535 entries that was created natively on these platforms? And what does zipinfo (a real one, without -1) report at the top for such files? Thanks, René -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] t5004: test ZIP archives with many entries
A ZIP file directory has a 16-bit field for the number of entries it contains. There are 64-bit extensions to deal with that. Demonstrate that git archive --format=zip currently doesn't use them and instead overflows the field. InfoZIP's unzip doesn't care about this field and extracts all files anyway. Software that uses the directory for presenting a filesystem like view quickly -- notably Windows -- depends on it, but doesn't lend itself to an automatic test case easily. Use InfoZIP's zipinfo, which probably isn't available everywhere but at least can provides *some* way to check this field. To speed things up a bit create and commit only a subset of the files and build a fake tree out of duplicates and pass that to git archive. Signed-off-by: Rene Scharfe l@web.de --- t/t5004-archive-corner-cases.sh | 40 1 file changed, 40 insertions(+) diff --git a/t/t5004-archive-corner-cases.sh b/t/t5004-archive-corner-cases.sh index 654adda..c6bd729 100755 --- a/t/t5004-archive-corner-cases.sh +++ b/t/t5004-archive-corner-cases.sh @@ -115,4 +115,44 @@ test_expect_success 'archive empty subtree by direct pathspec' ' check_dir extract sub ' +ZIPINFO=zipinfo + +test_lazy_prereq ZIPINFO ' + n=$($ZIPINFO $TEST_DIRECTORY/t5004/empty.zip | sed -n 2s/.* //p) + test x$n = x0 +' + +test_expect_failure ZIPINFO 'zip archive with many entries' ' + # add a directory with 256 files + mkdir 00 + for a in 0 1 2 3 4 5 6 7 8 9 a b c d e f + do + for b in 0 1 2 3 4 5 6 7 8 9 a b c d e f + do + : 00/$a$b + done + done + git add 00 + git commit -m 256 files in 1 directory + + # duplicate it to get 65536 files in 256 directories + subtree=$(git write-tree --prefix=00/) + for c in 0 1 2 3 4 5 6 7 8 9 a b c d e f + do + for d in 0 1 2 3 4 5 6 7 8 9 a b c d e f + do + echo 04 tree $subtree $c$d + done + done tree + tree=$(git mktree tree) + + # zip them + git archive -o many.zip $tree + + # check the number of entries in the ZIP file directory + expr 65536 + 256 expect + $ZIPINFO many.zip | head -2 | sed -n 2s/.* //p actual + test_cmp expect actual +' + test_done -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] t5004: test ZIP archives with many entries
On Sat, Aug 22, 2015 at 3:06 PM, René Scharfe l@web.de wrote: A ZIP file directory has a 16-bit field for the number of entries it contains. There are 64-bit extensions to deal with that. Demonstrate that git archive --format=zip currently doesn't use them and instead overflows the field. InfoZIP's unzip doesn't care about this field and extracts all files anyway. Software that uses the directory for presenting a filesystem like view quickly -- notably Windows -- depends on it, but doesn't lend itself to an automatic test case easily. Use InfoZIP's zipinfo, which probably isn't available everywhere but at least can provides *some* way to check this field. To speed things up a bit create and commit only a subset of the files and build a fake tree out of duplicates and pass that to git archive. Signed-off-by: Rene Scharfe l@web.de --- diff --git a/t/t5004-archive-corner-cases.sh b/t/t5004-archive-corner-cases.sh index 654adda..c6bd729 100755 --- a/t/t5004-archive-corner-cases.sh +++ b/t/t5004-archive-corner-cases.sh @@ -115,4 +115,44 @@ test_expect_success 'archive empty subtree by direct pathspec' ' check_dir extract sub ' +ZIPINFO=zipinfo + +test_lazy_prereq ZIPINFO ' + n=$($ZIPINFO $TEST_DIRECTORY/t5004/empty.zip | sed -n 2s/.* //p) + test x$n = x0 +' Unfortunately, this sed expression isn't portable due to dissimilar output of various zipinfo implementations. On Linux, the output of zipinfo is: $ zipinfo t/t5004/empty.zip Archive: t/t5004/empty.zip Zip file size: 62 bytes, number of entries: 0 Empty zipfile. $ however, on Mac OS X: $ zipinfo t/t5004/empty.zip Archive: t/t5004/empty.zip 62 bytes 0 files Empty zipfile. $ and on FreeBSD, the zipinfo command seems to have been removed altogether in favor of unzip -Z (emulate zipinfo). One might hope that unzip -Z would be a reasonable replacement for zipinfo, however, it is apparently only partially implemented on FreeBSD, and requires that -1 be passed, as well. Even with unzip -Z -1, there are issues. The output on Linux and Mac OS X is: $ unzip -Z -1 t/t5004/empty.zip Empty zipfile. $ but FreeBSD differs: $ unzip -Z -1 t/t5004/empty.zip $ With a non-empty zip file, the output is identical on all platforms: $ unzip -Z -1 twofiles.zip file1 file2 $ So, if you combine that with wc -l or test_line_count, you may have a portable and reliable entry counter. More below... +test_expect_failure ZIPINFO 'zip archive with many entries' ' + # add a directory with 256 files + mkdir 00 + for a in 0 1 2 3 4 5 6 7 8 9 a b c d e f + do + for b in 0 1 2 3 4 5 6 7 8 9 a b c d e f + do + : 00/$a$b + done + done + git add 00 + git commit -m 256 files in 1 directory + + # duplicate it to get 65536 files in 256 directories + subtree=$(git write-tree --prefix=00/) + for c in 0 1 2 3 4 5 6 7 8 9 a b c d e f + do + for d in 0 1 2 3 4 5 6 7 8 9 a b c d e f + do + echo 04 tree $subtree $c$d + done + done tree + tree=$(git mktree tree) + + # zip them + git archive -o many.zip $tree + + # check the number of entries in the ZIP file directory + expr 65536 + 256 expect + $ZIPINFO many.zip | head -2 | sed -n 2s/.* //p actual With these three patches applied, Mac OS X has trouble with 'many.zip': $ unzip -Z -1 many.zip warning [many.zip]: 76 extra bytes at beginning or within zipfile (attempting to process anyway) error [many.zip]: reported length of central directory is -76 bytes too long (Atari STZip zipfile? J.H.Holm ZIPSPLIT 1.1 zipfile?). Compensating... 00/ 00/00 ... ff/ff error: expected central file header signature not found (file #65793). (please check that you have transferred or created the zipfile in the appropriate BINARY mode and that you have compiled UnZip properly) And FreeBSD doesn't like it either: $ unzip -Z -1 many.zip unzip: Invalid central directory signature $ + test_cmp expect actual +' + test_done -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html