[jira] [Comment Edited] (COMPRESS-380) Support for ENHANCED_DEFLATED (Deflate64) in ZIP files

2018-01-09 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319080#comment-16319080
 ] 

Dawid Weiss edited comment on COMPRESS-380 at 1/9/18 9:00 PM:
--

The thing is: I really like when the compiler makes it explicit for me. I do 
read the docs, but mistakes do happen. Also, this really doesn't look trappy in 
the current form:

{code}
try (ZipFile zfile = new ZipFile("/my/file.zip")) { 
  Enumeration entries = zfile.getEntries();
  while (entries.hasMoreElements()) {
ZipArchiveEntry e = entries.nextElement();
try (InputStream is = zfile.getInputStream(e)) {
// .. read is in blocks or wrap in a BufferedInputStream... doesn't matter, 
// it'll be slow.
{code}

and it is trappy. That constructor on ZipFile creates an unbuffered stream and 
this causes 10x slower performance than it could have been if the stream was 
buffered. I don't see how it can be fixed from the user side, actually, even if 
you do wrap the output from zfile.getInputStream in a buffered input stream (or 
read in large byte[] blocks), the performance will still be very, very poor.



was (Author: dweiss):
The thing is: I really like when the compiler makes it explicit for me. I do 
read the docs, but mistakes do happen. Also, this really doesn't look trappy in 
the current form:

{code}
try (ZipFile zfile = new ZipFile("/my/file.zip")) { 
  Enumeration entries = zfile.getEntries();
  while (entries.hasMoreElements()) {
ZipArchiveEntry e = entries.nextElement();
try (InputStream is = zfile.getInputStream(e)) {
// .. read is in blocks or wrap in a BufferedInputStream... doesn't matter, 
// it'll be slow.
{code}

and it is trappy. That constructor on ZipFile creates an unbuffered stream and 
this causes 10x slower performance than it could have been if the stream was 
buffered. I don't see how it can be fixed from the user side, actually, as even 
if you do wrap the output from zfile.getInputStream in a buffered input stream 
(or read in large byte[] blocks), the performance will still be very, very poor.


> Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
> --
>
> Key: COMPRESS-380
> URL: https://issues.apache.org/jira/browse/COMPRESS-380
> Project: Commons Compress
>  Issue Type: New Feature
>Reporter: Dawid Weiss
> Fix For: 1.16
>
> Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, 
> archive64.zip, compress-380.diff, hello.world, input2
>
>
> Some of the (large) ZIP files we try to process currently will throw this:
> {code}
> UnsupportedZipFeatureException: unsupported feature method 
> 'ENHANCED_DEFLATED' 
> {code}
> which is a bummer since JDK's implementation also doesn't support Deflate64. 
> This seems to be PKWare's extensions, although code to decrypt it exists in 
> zlib (and is appropriately licensed, I believe).
> https://github.com/madler/zlib/tree/master/contrib/infback9



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (COMPRESS-380) Support for ENHANCED_DEFLATED (Deflate64) in ZIP files

2018-01-08 Thread Stefan Bodewig (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316203#comment-16316203
 ] 

Stefan Bodewig edited comment on COMPRESS-380 at 1/8/18 12:25 PM:
--

I'm looking at the zlib infback9 files linked in the description of this issue.

DEFLATE64 isn't documented officially by PKWARE, 
http://binaryessence.com/dct/imp/en000225.htm is useful to see the differences 
to DEFLATE. A DEFLATE64 decoder should be able to decode a DEFLATE stream that 
doesn't use the length code 285 (i.e. with no back-reference length of exactly 
258 bytes).


was (Author: bodewig):
I'm looking at the zlib infback9 files linked in the description of this issue.

DEFLATE64 isn't documented officially by PKWARE, 
http://binaryessence.com/dct/imp/en000225.htm is useful to see the differences 
to DEFLATE. A DEFLATE64 decoder should be able to decode a DEFLATE stream that 
doesn't use the length code 285 (i.e. with no distance of exactly 258 bytes).

> Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
> --
>
> Key: COMPRESS-380
> URL: https://issues.apache.org/jira/browse/COMPRESS-380
> Project: Commons Compress
>  Issue Type: New Feature
>Reporter: Dawid Weiss
> Fix For: 1.16
>
> Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, 
> archive64.zip, compress-380.diff, hello.world, input2
>
>
> Some of the (large) ZIP files we try to process currently will throw this:
> {code}
> UnsupportedZipFeatureException: unsupported feature method 
> 'ENHANCED_DEFLATED' 
> {code}
> which is a bummer since JDK's implementation also doesn't support Deflate64. 
> This seems to be PKWare's extensions, although code to decrypt it exists in 
> zlib (and is appropriately licensed, I believe).
> https://github.com/madler/zlib/tree/master/contrib/infback9



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (COMPRESS-380) Support for ENHANCED_DEFLATED (Deflate64) in ZIP files

2018-01-06 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314726#comment-16314726
 ] 

Dawid Weiss edited comment on COMPRESS-380 at 1/6/18 4:19 PM:
--

I isolated a smaller example file that still fails at runtime, with a more 
complex exception. Total commander (info-zip) decompresses this file just fine 
(it's a png file) so it has to be something in the decoding routine.

https://github.com/apache/commons-compress/pull/58

{code}
java.lang.IllegalStateException: Attempt to read beyond memory: dist=5955
at 
org.apache.commons.compress.compressors.deflate64.HuffmanDecoder$DecodingMemory.recordToBuffer(HuffmanDecoder.java:471)
at 
org.apache.commons.compress.compressors.deflate64.HuffmanDecoder$HuffmanCodes.decodeNext(HuffmanDecoder.java:292)
at 
org.apache.commons.compress.compressors.deflate64.HuffmanDecoder$HuffmanCodes.read(HuffmanDecoder.java:264)
at 
org.apache.commons.compress.compressors.deflate64.HuffmanDecoder.decode(HuffmanDecoder.java:165)
at 
org.apache.commons.compress.compressors.deflate64.Deflate64CompressorInputStream.read(Deflate64CompressorInputStream.java:77)
at java.io.InputStream.read(InputStream.java:101)
at 
org.apache.commons.compress.compressors.deflate64.Deflate64BugTest.readBeyondMemoryException(Deflate64BugTest.java:23)
{code}


was (Author: dweiss):
I isolated a smaller example file that still fails at runtime, with a more 
complex exception. Total commander (info-zip) decompresses this file just fine 
(it's a png file) so it has to be something in the decoding routine.

https://github.com/apache/commons-compress/pull/58

> Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
> --
>
> Key: COMPRESS-380
> URL: https://issues.apache.org/jira/browse/COMPRESS-380
> Project: Commons Compress
>  Issue Type: New Feature
>Reporter: Dawid Weiss
> Fix For: 1.16
>
> Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, 
> archive64.zip, compress-380.diff, hello.world, input2
>
>
> Some of the (large) ZIP files we try to process currently will throw this:
> {code}
> UnsupportedZipFeatureException: unsupported feature method 
> 'ENHANCED_DEFLATED' 
> {code}
> which is a bummer since JDK's implementation also doesn't support Deflate64. 
> This seems to be PKWare's extensions, although code to decrypt it exists in 
> zlib (and is appropriately licensed, I believe).
> https://github.com/madler/zlib/tree/master/contrib/infback9



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (COMPRESS-380) Support for ENHANCED_DEFLATED (Deflate64) in ZIP files

2018-01-04 Thread Stefan Bodewig (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16311543#comment-16311543
 ] 

Stefan Bodewig edited comment on COMPRESS-380 at 1/4/18 3:54 PM:
-

https://github.com/apache/commons-compress/commit/07cc1a278b217d45cb090ff6cb3a7934105cb2d0
 changes {{available}}, does this look OK? I'll certainly have to add a few 
more tests.


was (Author: bodewig):
https://github.com/apache/commons-compress/commit/07cc1a278b217d45cb090ff6cb3a7934105cb2d0
 changes {{available}}, does this look OK?

> Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
> --
>
> Key: COMPRESS-380
> URL: https://issues.apache.org/jira/browse/COMPRESS-380
> Project: Commons Compress
>  Issue Type: New Feature
>Reporter: Dawid Weiss
> Fix For: 1.16
>
> Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, 
> archive64.zip, compress-380.diff, hello.world, input2
>
>
> Some of the (large) ZIP files we try to process currently will throw this:
> {code}
> UnsupportedZipFeatureException: unsupported feature method 
> 'ENHANCED_DEFLATED' 
> {code}
> which is a bummer since JDK's implementation also doesn't support Deflate64. 
> This seems to be PKWare's extensions, although code to decrypt it exists in 
> zlib (and is appropriately licensed, I believe).
> https://github.com/madler/zlib/tree/master/contrib/infback9



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (COMPRESS-380) Support for ENHANCED_DEFLATED (Deflate64) in ZIP files

2018-01-04 Thread Stefan Bodewig (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16311099#comment-16311099
 ] 

Stefan Bodewig edited comment on COMPRESS-380 at 1/4/18 9:56 AM:
-

[~chalmagr84] I'd like to remove the {{uncompressedSize}} from the stream's 
constructor for two reasons:

* ze.getSize() when using the stream in a {{ZipArchiveInputStream}} context the 
uncompressed size may be unknown as it may be stpred inside of a data 
descriptor rather than the local file header
* the stream only uses it inside of {{available}} which is supposed to return 
the number of bytes that can be read without blocking. The implementation of 
{{available}} is probably not correct for general {{InputStream}} s as we may 
well be blocking while trying to read bits from it, it is probably OK for the 
seekable input underlying {{ZipFile}}

I'd make {{available}} return 0 unconditionally. Alternatively the 
{{DecoderState}} s may know a bit more about data they have already read and 
could provide a less pessimistic answer.

Any objections?


was (Author: bodewig):
[~chalmagr84] I'd like to remove the {{uncompressedSize}} from the stream's 
constructor for two reasons:

* ze.getSize() when using the stream in a {{ZipArchiveInputStream}} context the 
uncompressed size may be unknown as it may be stpred inside of a data 
descriptor rather than the local file header
* the stream only uses it inside of {{available}} which is supposed to return 
the number of bytes that can be read without blocking. The implementation of 
{{available}} is probably not correct for general {{InputStream}}s as we may 
well be blocking while trying to read bits from it, it is probably OK for the 
seekable input underlying {{ZipFile}}

I'd make {{available}} return 0 unconditionally. Alternatively the 
{{DecoderState}}s may know a bit more about data they have already read and 
could provide a less pessimistic answer.

Any objections?

> Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
> --
>
> Key: COMPRESS-380
> URL: https://issues.apache.org/jira/browse/COMPRESS-380
> Project: Commons Compress
>  Issue Type: New Feature
>Reporter: Dawid Weiss
> Fix For: 1.16
>
> Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, 
> archive64.zip, compress-380.diff, hello.world, input2
>
>
> Some of the (large) ZIP files we try to process currently will throw this:
> {code}
> UnsupportedZipFeatureException: unsupported feature method 
> 'ENHANCED_DEFLATED' 
> {code}
> which is a bummer since JDK's implementation also doesn't support Deflate64. 
> This seems to be PKWare's extensions, although code to decrypt it exists in 
> zlib (and is appropriately licensed, I believe).
> https://github.com/madler/zlib/tree/master/contrib/infback9



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (COMPRESS-380) Support for ENHANCED_DEFLATED (Deflate64) in ZIP files

2017-12-28 Thread Stefan Bodewig (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305196#comment-16305196
 ] 

Stefan Bodewig edited comment on COMPRESS-380 at 12/28/17 11:28 AM:


[~chalmagr84] many thanks. I'm still comparing your code with the one in zlib 
and not completely through. This looks really good.

I'd probably change the package name from ...compressors.zip to 
compressors.deflate64 (and maybe add a standalone {{CompressorInputStream}} as 
well) - and certainly add the license headers and some javadocs. I can do all 
of these steps myself unless you want to update the patch.


was (Author: bodewig):
@chalmagr84 many thanks. I'm still comparing your code with the one in zlib and 
not completely through. This looks really good.

I'd probably change the package name from ...compressors.zip to 
compressors.deflate64 (and maybe add a standalone {{CompressorInputStream}} as 
well) - and certainly add the license headers and some javadocs. I can do all 
of these steps myself unless you want to update the patch.

> Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
> --
>
> Key: COMPRESS-380
> URL: https://issues.apache.org/jira/browse/COMPRESS-380
> Project: Commons Compress
>  Issue Type: New Feature
>Reporter: Dawid Weiss
> Fix For: 1.16
>
> Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, 
> archive64.zip, compress-380.diff, hello.world, input2
>
>
> Some of the (large) ZIP files we try to process currently will throw this:
> {code}
> UnsupportedZipFeatureException: unsupported feature method 
> 'ENHANCED_DEFLATED' 
> {code}
> which is a bummer since JDK's implementation also doesn't support Deflate64. 
> This seems to be PKWare's extensions, although code to decrypt it exists in 
> zlib (and is appropriately licensed, I believe).
> https://github.com/madler/zlib/tree/master/contrib/infback9



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (COMPRESS-380) Support for ENHANCED_DEFLATED (Deflate64) in ZIP files

2017-12-28 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305302#comment-16305302
 ] 

Dawid Weiss edited comment on COMPRESS-380 at 12/28/17 10:25 AM:
-

These (archive.zip, archive64.zip, input2) attached streams aren't 
binary-identical, perhaps it'll be of greater help. I generated a random 
sequence and folded it around other random sequence. The input is also attached 
for reference.

{code}
rm -f archive*.zip 

dd if=/dev/urandom bs=1024 count=1 2>/dev/null > input1

cat input1 > input2
dd if=/dev/urandom bs=1024 count=1 2>/dev/null >> input2
cat input1 >> input2
rm input1

7za a -mm=deflate64 archive64.zip input2
7za a -mm=deflate archive.zip input2

ls -l archive*.zip
{code}


was (Author: dweiss):
These streams aren't binary-identical. I generated a random sequence and folded 
it around other random sequence. The input is also attached for reference.

{code}
rm -f archive*.zip 

dd if=/dev/urandom bs=1024 count=1 2>/dev/null > input1

cat input1 > input2
dd if=/dev/urandom bs=1024 count=1 2>/dev/null >> input2
cat input1 >> input2
rm input1

7za a -mm=deflate64 archive64.zip input2
7za a -mm=deflate archive.zip input2

ls -l archive*.zip
{code}

> Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
> --
>
> Key: COMPRESS-380
> URL: https://issues.apache.org/jira/browse/COMPRESS-380
> Project: Commons Compress
>  Issue Type: New Feature
>Reporter: Dawid Weiss
> Fix For: 1.16
>
> Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, 
> archive64.zip, compress-380.diff, hello.world, input2
>
>
> Some of the (large) ZIP files we try to process currently will throw this:
> {code}
> UnsupportedZipFeatureException: unsupported feature method 
> 'ENHANCED_DEFLATED' 
> {code}
> which is a bummer since JDK's implementation also doesn't support Deflate64. 
> This seems to be PKWare's extensions, although code to decrypt it exists in 
> zlib (and is appropriately licensed, I believe).
> https://github.com/madler/zlib/tree/master/contrib/infback9



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (COMPRESS-380) Support for ENHANCED_DEFLATED (Deflate64) in ZIP files

2017-12-13 Thread Christian Marquez Grabia (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289388#comment-16289388
 ] 

Christian Marquez Grabia edited comment on COMPRESS-380 at 12/13/17 3:30 PM:
-

[~dawid.weiss] Thanks for the updates. I have created this classes based on 
different research on multiple deflate de-compression samples I was able to 
find using the masking concepts for the literals / distance tables as well as 
the state concept, all brought into Java 'style'.

I left it all to jira since this is more about the providing patches to the 
apache library I figured it would be applicable mostly to jira users (didn't 
want to add noise around it)

-- EDIT --
About the Apache Headers, I'm ok with it. I will try to get those from other 
classes and update the patch later today.


was (Author: chalmagr84):
[~dawid.weiss] Thanks for the updates. I have created this classes based on 
different research on multiple deflate de-compression samples I was able to 
find using the masking concepts for the literals / distance tables as well as 
the state concept, all brought into Java 'style'.

I left it all to jira since this is more about the providing patches to the 
apache library I figured it would be applicable mostly to jira users (didn't 
want to add noise around it)

> Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
> --
>
> Key: COMPRESS-380
> URL: https://issues.apache.org/jira/browse/COMPRESS-380
> Project: Commons Compress
>  Issue Type: New Feature
>Reporter: Dawid Weiss
> Attachments: compress-380.diff
>
>
> Some of the (large) ZIP files we try to process currently will throw this:
> {code}
> UnsupportedZipFeatureException: unsupported feature method 
> 'ENHANCED_DEFLATED' 
> {code}
> which is a bummer since JDK's implementation also doesn't support Deflate64. 
> This seems to be PKWare's extensions, although code to decrypt it exists in 
> zlib (and is appropriately licensed, I believe).
> https://github.com/madler/zlib/tree/master/contrib/infback9



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)