Re: [PATCH 1/3] t5004: test ZIP archives with many entries

2015-08-28 Thread Junio C Hamano
Eric Sunshine sunsh...@sunshineco.com writes:

 On Sun, Aug 23, 2015 at 5:29 AM, René Scharfe l@web.de wrote:
 I suspected that zipinfo's output might be formatted differently on
 different platforms and tried to guard against it by checking for the
 number zero there. Git's ZIP file creation is platform independent
 (modulo bugs), so having a test run at least somewhere should
 suffice. In theory.

 We could add support for the one-line-summary variant on OS X easily,
 though.

 Probably, although it's looking like testing on Mac OS X won't be
 fruitful (see below).

Can we move this topic forward by introducing a new prerequisite
ZIPINFO and used at the beginning of these tests (make it a lazy
prereq)?  Run zipinfo on a trivial archive and see if its output is
something we recognize to decide if the platform supports that
ZIPINFO prerequisite and do this test only on them.

After all, what _is_ being tested, i.e. our archive creation, would
not change across platforms, so having a test that runs on a known
subset of platforms is better than not having anything at all.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] t5004: test ZIP archives with many entries

2015-08-28 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Eric Sunshine sunsh...@sunshineco.com writes:

 On Sun, Aug 23, 2015 at 5:29 AM, René Scharfe l@web.de wrote:
 I suspected that zipinfo's output might be formatted differently on
 different platforms and tried to guard against it by checking for the
 number zero there. Git's ZIP file creation is platform independent
 (modulo bugs), so having a test run at least somewhere should
 suffice. In theory.

 We could add support for the one-line-summary variant on OS X easily,
 though.

 Probably, although it's looking like testing on Mac OS X won't be
 fruitful (see below).

 Can we move this topic forward by introducing a new prerequisite
 ZIPINFO and used at the beginning of these tests (make it a lazy
 prereq)?  Run zipinfo on a trivial archive and see if its output is
 something we recognize to decide if the platform supports that
 ZIPINFO prerequisite and do this test only on them.

Heh, that is exactly what the patch under discussion does.  So...

 After all, what _is_ being tested, i.e. our archive creation, would
 not change across platforms, so having a test that runs on a known
 subset of platforms is better than not having anything at all.

 Thanks.

...I'd say we can take this patch as-is, and those who want to have
a working test on MacOS can come up with an enhancement to the way
the script parses output from zipinfo that would also work on their
platforms.

Thanks and sorry for the noise ;-)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] t5004: test ZIP archives with many entries

2015-08-28 Thread Eric Sunshine
On Fri, Aug 28, 2015 at 11:57 AM, Junio C Hamano gits...@pobox.com wrote:
 Junio C Hamano gits...@pobox.com writes:
 Eric Sunshine sunsh...@sunshineco.com writes:
 On Sun, Aug 23, 2015 at 5:29 AM, René Scharfe l@web.de wrote:
 I suspected that zipinfo's output might be formatted differently on
 different platforms and tried to guard against it by checking for the
 number zero there. Git's ZIP file creation is platform independent
 (modulo bugs), so having a test run at least somewhere should
 suffice. In theory.

 We could add support for the one-line-summary variant on OS X easily,
 though.

 Probably, although it's looking like testing on Mac OS X won't be
 fruitful (see below).

 Can we move this topic forward by introducing a new prerequisite
 ZIPINFO and used at the beginning of these tests (make it a lazy
 prereq)?  Run zipinfo on a trivial archive and see if its output is
 something we recognize to decide if the platform supports that
 ZIPINFO prerequisite and do this test only on them.

 Heh, that is exactly what the patch under discussion does.  So...

 After all, what _is_ being tested, i.e. our archive creation, would
 not change across platforms, so having a test that runs on a known
 subset of platforms is better than not having anything at all.

 ...I'd say we can take this patch as-is, and those who want to have
 a working test on MacOS can come up with an enhancement to the way
 the script parses output from zipinfo that would also work on their
 platforms.

Right, the new test is correctly skipped on Mac OS X and FreeBSD, so
the patch is suitable as-is. We might, however, want to augment the
commit message with some of the knowledge learned from this thread.
Perhaps modify the last sentence of the second paragraph and then
insert additional information following it, like this?

... at least provides
*some* way to check this field, although presently only on Linux.

zipinfo on current Mac OS X (Yosemite 10.10.5) does not support
this field, and, when encountered, caps the printed file count at
65535 (and spits out warnings and errors), thus is not useful for
testing. (Its output also differs from zipinfo on Linux, thus
requires changes to the 'sed' recognition and extraction
expressions, but that's a minor issue.)

zipinfo on FreeBSD seems to have been retired altogether in favor
of unzip -Z, however, only in the emasculated form unzip -Z
-1 which lists archive entries but does not provide a file
count, thus is not useful for this test.

(I also snuck a s/can// fix in there for the last sentence of the
second paragraph.)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] t5004: test ZIP archives with many entries

2015-08-23 Thread Eric Sunshine
On Sun, Aug 23, 2015 at 5:29 AM, René Scharfe l@web.de wrote:
 Am 23.08.2015 um 07:54 schrieb Eric Sunshine:
 On Sat, Aug 22, 2015 at 3:06 PM, René Scharfe l@web.de wrote:
 +test_lazy_prereq ZIPINFO '
 +   n=$($ZIPINFO $TEST_DIRECTORY/t5004/empty.zip | sed -n 2s/.* 
 //p)
 +   test x$n = x0
 +'

 Unfortunately, this sed expression isn't portable due to dissimilar
 output of various zipinfo implementations. On Linux, the output of
 zipinfo is:

  $ zipinfo t/t5004/empty.zip
  Archive:  t/t5004/empty.zip
  Zip file size: 62 bytes, number of entries: 0
  Empty zipfile.
  $

 however, on Mac OS X:

  $ zipinfo t/t5004/empty.zip
  Archive:  t/t5004/empty.zip   62 bytes   0 files
  Empty zipfile.
  $

 and on FreeBSD, the zipinfo command seems to have been removed
 altogether in favor of unzip -Z (emulate zipinfo).

 I suspected that zipinfo's output might be formatted differently on
 different platforms and tried to guard against it by checking for the
 number zero there. Git's ZIP file creation is platform independent
 (modulo bugs), so having a test run at least somewhere should
 suffice. In theory.

 We could add support for the one-line-summary variant on OS X easily,
 though.

Probably, although it's looking like testing on Mac OS X won't be
fruitful (see below).

 One might hope that unzip -Z would be a reasonable replacement for
 zipinfo, however, it is apparently only partially implemented on
 FreeBSD, and requires that -1 be passed, as well. Even with unzip -Z
 -1, there are issues. The output on Linux and Mac OS X is:

  $ unzip -Z -1 t/t5004/empty.zip
  Empty zipfile.
  $

 but FreeBSD differs:

  $ unzip -Z -1 t/t5004/empty.zip
  $

 With a non-empty zip file, the output is identical on all platforms:

  $ unzip -Z -1 twofiles.zip
  file1
  file2
  $

 So, if you combine that with wc -l or test_line_count, you may have
 a portable and reliable entry counter.

 Counting all entries is slow, and more importantly it's not what we
 want. In this test we need the number of entries recorded in the ZIP
 directory, not the actual number of entries found by scanning the
 archive, or the directory.

Ah, right. The commit message did state this clearly enough...

 On Linux unzip -Z -1 many.zip | wc -l reports 65792 even before
 adding ZIP64 support; only without -1 we get the interesting numbers
 (specifically with unzip -Z many.zip | sed -n '2p;$p'):

 Zip file size: 6841366 bytes, number of entries: 256
 65792 files, 0 bytes uncompressed, 0 bytes compressed: 0.0%

 With these three patches applied, Mac OS X has trouble with 'many.zip':

  $ unzip -Z -1 many.zip
  warning [many.zip]:  76 extra bytes at beginning or within zipfile
(attempting to process anyway)
  error [many.zip]:  reported length of central directory is
-76 bytes too long (Atari STZip zipfile?  J.H.Holm ZIPSPLIT 1.1
zipfile?).  Compensating...
  00/
  00/00
  ...
  ff/ff
  error: expected central file header signature not found (file
#65793). (please check that you have transferred or created the
zipfile in the appropriate BINARY mode and that you have compiled
UnZip properly)

 And FreeBSD doesn't like it either:

  $ unzip -Z -1 many.zip
  unzip: Invalid central directory signature
  $


 Looks like they don't support ZIP64. Or I got some of the fields wrong
 after all.

A 65536 file zip created on Mac OS X with Mac's zip command given
to unzip or zipinfo results in exactly the same warnings/errors as
above (including the bit about 76 extra bytes and -76 bytes too
long), so it doesn't seem to be a problem with your implementation.

 https://en.wikipedia.org/wiki/Zip_%28file_format%29#ZIP64 says: OS X
 Yosemite does support the creation of ZIP64 archives, but does not
 support unzipping these archives using the shipped unzip command-line
 utility or graphical Archive Utility.[citation needed].

 How does unzip react to a ZIP file with more than 65535 entries that
 was created natively on these platforms? And what does zipinfo (a real
 one, without -1) report at the top for such files?

On Mac OS X, unzip does extract all the files (although complains as
noted above). zipinfo caps out at reporting 65535 for the number of
files (although it lists them all fine). With the warnings/errors
filtered out for clarity:

$ zipinfo biggy.zip
Archive:  biggy.zip   9642874 bytes   65535 files
...
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] t5004: test ZIP archives with many entries

2015-08-23 Thread René Scharfe
Am 23.08.2015 um 07:54 schrieb Eric Sunshine:
 On Sat, Aug 22, 2015 at 3:06 PM, René Scharfe l@web.de wrote:
 diff --git a/t/t5004-archive-corner-cases.sh 
 b/t/t5004-archive-corner-cases.sh
 index 654adda..c6bd729 100755
 --- a/t/t5004-archive-corner-cases.sh
 +++ b/t/t5004-archive-corner-cases.sh
 @@ -115,4 +115,44 @@ test_expect_success 'archive empty subtree by direct 
 pathspec' '
  check_dir extract sub
   '

 +ZIPINFO=zipinfo
 +
 +test_lazy_prereq ZIPINFO '
 +   n=$($ZIPINFO $TEST_DIRECTORY/t5004/empty.zip | sed -n 2s/.* 
 //p)
 +   test x$n = x0
 +'
 
 Unfortunately, this sed expression isn't portable due to dissimilar
 output of various zipinfo implementations. On Linux, the output of
 zipinfo is:
 
  $ zipinfo t/t5004/empty.zip
  Archive:  t/t5004/empty.zip
  Zip file size: 62 bytes, number of entries: 0
  Empty zipfile.
  $
 
 however, on Mac OS X:
 
  $ zipinfo t/t5004/empty.zip
  Archive:  t/t5004/empty.zip   62 bytes   0 files
  Empty zipfile.
  $
 
 and on FreeBSD, the zipinfo command seems to have been removed
 altogether in favor of unzip -Z (emulate zipinfo).

Thanks for your thorough checks!

I suspected that zipinfo's output might be formatted differently on
different platforms and tried to guard against it by checking for the
number zero there. Git's ZIP file creation is platform independent
(modulo bugs), so having a test run at least somewhere should
suffice. In theory.

We could add support for the one-line-summary variant on OS X easily,
though.

 One might hope that unzip -Z would be a reasonable replacement for
 zipinfo, however, it is apparently only partially implemented on
 FreeBSD, and requires that -1 be passed, as well. Even with unzip -Z
 -1, there are issues. The output on Linux and Mac OS X is:
 
  $ unzip -Z -1 t/t5004/empty.zip
  Empty zipfile.
  $
 
 but FreeBSD differs:
 
  $ unzip -Z -1 t/t5004/empty.zip
  $
 
 With a non-empty zip file, the output is identical on all platforms:
 
  $ unzip -Z -1 twofiles.zip
  file1
  file2
  $
 
 So, if you combine that with wc -l or test_line_count, you may have
 a portable and reliable entry counter.

Counting all entries is slow, and more importantly it's not what we
want. In this test we need the number of entries recorded in the ZIP
directory, not the actual number of entries found by scanning the
archive, or the directory.

On Linux unzip -Z -1 many.zip | wc -l reports 65792 even before
adding ZIP64 support; only without -1 we get the interesting numbers
(specifically with unzip -Z many.zip | sed -n '2p;$p'):

Zip file size: 6841366 bytes, number of entries: 256
65792 files, 0 bytes uncompressed, 0 bytes compressed: 0.0%

 With these three patches applied, Mac OS X has trouble with 'many.zip':
 
  $ unzip -Z -1 many.zip
  warning [many.zip]:  76 extra bytes at beginning or within zipfile
(attempting to process anyway)
  error [many.zip]:  reported length of central directory is
-76 bytes too long (Atari STZip zipfile?  J.H.Holm ZIPSPLIT 1.1
zipfile?).  Compensating...
  00/
  00/00
  ...
  ff/ff
  error: expected central file header signature not found (file
#65793). (please check that you have transferred or created the
zipfile in the appropriate BINARY mode and that you have compiled
UnZip properly)
 
 And FreeBSD doesn't like it either:
 
  $ unzip -Z -1 many.zip
  unzip: Invalid central directory signature
  $
 

Looks like they don't support ZIP64. Or I got some of the fields wrong
after all.

https://en.wikipedia.org/wiki/Zip_%28file_format%29#ZIP64 says: OS X
Yosemite does support the creation of ZIP64 archives, but does not
support unzipping these archives using the shipped unzip command-line
utility or graphical Archive Utility.[citation needed].

How does unzip react to a ZIP file with more than 65535 entries that
was created natively on these platforms? And what does zipinfo (a real
one, without -1) report at the top for such files?

Thanks,
René

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] t5004: test ZIP archives with many entries

2015-08-22 Thread René Scharfe
A ZIP file directory has a 16-bit field for the number of entries it
contains.  There are 64-bit extensions to deal with that.  Demonstrate
that git archive --format=zip currently doesn't use them and instead
overflows the field.

InfoZIP's unzip doesn't care about this field and extracts all files
anyway.  Software that uses the directory for presenting a filesystem
like view quickly -- notably Windows -- depends on it, but doesn't
lend itself to an automatic test case easily.  Use InfoZIP's zipinfo,
which probably isn't available everywhere but at least can provides
*some* way to check this field.

To speed things up a bit create and commit only a subset of the files
and build a fake tree out of duplicates and pass that to git archive.

Signed-off-by: Rene Scharfe l@web.de
---
 t/t5004-archive-corner-cases.sh | 40 
 1 file changed, 40 insertions(+)

diff --git a/t/t5004-archive-corner-cases.sh b/t/t5004-archive-corner-cases.sh
index 654adda..c6bd729 100755
--- a/t/t5004-archive-corner-cases.sh
+++ b/t/t5004-archive-corner-cases.sh
@@ -115,4 +115,44 @@ test_expect_success 'archive empty subtree by direct 
pathspec' '
check_dir extract sub
 '
 
+ZIPINFO=zipinfo
+
+test_lazy_prereq ZIPINFO '
+   n=$($ZIPINFO $TEST_DIRECTORY/t5004/empty.zip | sed -n 2s/.* //p)
+   test x$n = x0
+'
+
+test_expect_failure ZIPINFO 'zip archive with many entries' '
+   # add a directory with 256 files
+   mkdir 00 
+   for a in 0 1 2 3 4 5 6 7 8 9 a b c d e f
+   do
+   for b in 0 1 2 3 4 5 6 7 8 9 a b c d e f
+   do
+   : 00/$a$b
+   done
+   done 
+   git add 00 
+   git commit -m 256 files in 1 directory 
+
+   # duplicate it to get 65536 files in 256 directories
+   subtree=$(git write-tree --prefix=00/) 
+   for c in 0 1 2 3 4 5 6 7 8 9 a b c d e f
+   do
+   for d in 0 1 2 3 4 5 6 7 8 9 a b c d e f
+   do
+   echo 04 tree $subtree  $c$d
+   done
+   done tree 
+   tree=$(git mktree tree) 
+
+   # zip them
+   git archive -o many.zip $tree 
+
+   # check the number of entries in the ZIP file directory
+   expr 65536 + 256 expect 
+   $ZIPINFO many.zip | head -2 | sed -n 2s/.* //p actual 
+   test_cmp expect actual
+'
+
 test_done
-- 
2.5.0


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] t5004: test ZIP archives with many entries

2015-08-22 Thread Eric Sunshine
On Sat, Aug 22, 2015 at 3:06 PM, René Scharfe l@web.de wrote:
 A ZIP file directory has a 16-bit field for the number of entries it
 contains.  There are 64-bit extensions to deal with that.  Demonstrate
 that git archive --format=zip currently doesn't use them and instead
 overflows the field.

 InfoZIP's unzip doesn't care about this field and extracts all files
 anyway.  Software that uses the directory for presenting a filesystem
 like view quickly -- notably Windows -- depends on it, but doesn't
 lend itself to an automatic test case easily.  Use InfoZIP's zipinfo,
 which probably isn't available everywhere but at least can provides
 *some* way to check this field.

 To speed things up a bit create and commit only a subset of the files
 and build a fake tree out of duplicates and pass that to git archive.

 Signed-off-by: Rene Scharfe l@web.de
 ---
 diff --git a/t/t5004-archive-corner-cases.sh b/t/t5004-archive-corner-cases.sh
 index 654adda..c6bd729 100755
 --- a/t/t5004-archive-corner-cases.sh
 +++ b/t/t5004-archive-corner-cases.sh
 @@ -115,4 +115,44 @@ test_expect_success 'archive empty subtree by direct 
 pathspec' '
 check_dir extract sub
  '

 +ZIPINFO=zipinfo
 +
 +test_lazy_prereq ZIPINFO '
 +   n=$($ZIPINFO $TEST_DIRECTORY/t5004/empty.zip | sed -n 2s/.* //p)
 +   test x$n = x0
 +'

Unfortunately, this sed expression isn't portable due to dissimilar
output of various zipinfo implementations. On Linux, the output of
zipinfo is:

$ zipinfo t/t5004/empty.zip
Archive:  t/t5004/empty.zip
Zip file size: 62 bytes, number of entries: 0
Empty zipfile.
$

however, on Mac OS X:

$ zipinfo t/t5004/empty.zip
Archive:  t/t5004/empty.zip   62 bytes   0 files
Empty zipfile.
$

and on FreeBSD, the zipinfo command seems to have been removed
altogether in favor of unzip -Z (emulate zipinfo).

One might hope that unzip -Z would be a reasonable replacement for
zipinfo, however, it is apparently only partially implemented on
FreeBSD, and requires that -1 be passed, as well. Even with unzip -Z
-1, there are issues. The output on Linux and Mac OS X is:

$ unzip -Z -1 t/t5004/empty.zip
Empty zipfile.
$

but FreeBSD differs:

$ unzip -Z -1 t/t5004/empty.zip
$

With a non-empty zip file, the output is identical on all platforms:

$ unzip -Z -1 twofiles.zip
file1
file2
$

So, if you combine that with wc -l or test_line_count, you may have
a portable and reliable entry counter.

More below...

 +test_expect_failure ZIPINFO 'zip archive with many entries' '
 +   # add a directory with 256 files
 +   mkdir 00 
 +   for a in 0 1 2 3 4 5 6 7 8 9 a b c d e f
 +   do
 +   for b in 0 1 2 3 4 5 6 7 8 9 a b c d e f
 +   do
 +   : 00/$a$b
 +   done
 +   done 
 +   git add 00 
 +   git commit -m 256 files in 1 directory 
 +
 +   # duplicate it to get 65536 files in 256 directories
 +   subtree=$(git write-tree --prefix=00/) 
 +   for c in 0 1 2 3 4 5 6 7 8 9 a b c d e f
 +   do
 +   for d in 0 1 2 3 4 5 6 7 8 9 a b c d e f
 +   do
 +   echo 04 tree $subtree  $c$d
 +   done
 +   done tree 
 +   tree=$(git mktree tree) 
 +
 +   # zip them
 +   git archive -o many.zip $tree 
 +
 +   # check the number of entries in the ZIP file directory
 +   expr 65536 + 256 expect 
 +   $ZIPINFO many.zip | head -2 | sed -n 2s/.* //p actual 

With these three patches applied, Mac OS X has trouble with 'many.zip':

$ unzip -Z -1 many.zip
warning [many.zip]:  76 extra bytes at beginning or within zipfile
  (attempting to process anyway)
error [many.zip]:  reported length of central directory is
  -76 bytes too long (Atari STZip zipfile?  J.H.Holm ZIPSPLIT 1.1
  zipfile?).  Compensating...
00/
00/00
...
ff/ff
error: expected central file header signature not found (file
  #65793). (please check that you have transferred or created the
  zipfile in the appropriate BINARY mode and that you have compiled
  UnZip properly)

And FreeBSD doesn't like it either:

$ unzip -Z -1 many.zip
unzip: Invalid central directory signature
$

 +   test_cmp expect actual
 +'
 +
  test_done
 --
 2.5.0
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html