[jira] [Comment Edited] (COMPRESS-380) Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319080#comment-16319080 ] Dawid Weiss edited comment on COMPRESS-380 at 1/9/18 9:00 PM: -- The thing is: I really like when the compiler makes it explicit for me. I do read the docs, but mistakes do happen. Also, this really doesn't look trappy in the current form: {code} try (ZipFile zfile = new ZipFile("/my/file.zip")) { Enumeration entries = zfile.getEntries(); while (entries.hasMoreElements()) { ZipArchiveEntry e = entries.nextElement(); try (InputStream is = zfile.getInputStream(e)) { // .. read is in blocks or wrap in a BufferedInputStream... doesn't matter, // it'll be slow. {code} and it is trappy. That constructor on ZipFile creates an unbuffered stream and this causes 10x slower performance than it could have been if the stream was buffered. I don't see how it can be fixed from the user side, actually, even if you do wrap the output from zfile.getInputStream in a buffered input stream (or read in large byte[] blocks), the performance will still be very, very poor. was (Author: dweiss): The thing is: I really like when the compiler makes it explicit for me. I do read the docs, but mistakes do happen. Also, this really doesn't look trappy in the current form: {code} try (ZipFile zfile = new ZipFile("/my/file.zip")) { Enumeration entries = zfile.getEntries(); while (entries.hasMoreElements()) { ZipArchiveEntry e = entries.nextElement(); try (InputStream is = zfile.getInputStream(e)) { // .. read is in blocks or wrap in a BufferedInputStream... doesn't matter, // it'll be slow. {code} and it is trappy. That constructor on ZipFile creates an unbuffered stream and this causes 10x slower performance than it could have been if the stream was buffered. I don't see how it can be fixed from the user side, actually, as even if you do wrap the output from zfile.getInputStream in a buffered input stream (or read in large byte[] blocks), the performance will still be very, very poor. > Support for ENHANCED_DEFLATED (Deflate64) in ZIP files > -- > > Key: COMPRESS-380 > URL: https://issues.apache.org/jira/browse/COMPRESS-380 > Project: Commons Compress > Issue Type: New Feature >Reporter: Dawid Weiss > Fix For: 1.16 > > Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, > archive64.zip, compress-380.diff, hello.world, input2 > > > Some of the (large) ZIP files we try to process currently will throw this: > {code} > UnsupportedZipFeatureException: unsupported feature method > 'ENHANCED_DEFLATED' > {code} > which is a bummer since JDK's implementation also doesn't support Deflate64. > This seems to be PKWare's extensions, although code to decrypt it exists in > zlib (and is appropriately licensed, I believe). > https://github.com/madler/zlib/tree/master/contrib/infback9 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (COMPRESS-380) Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316203#comment-16316203 ] Stefan Bodewig edited comment on COMPRESS-380 at 1/8/18 12:25 PM: -- I'm looking at the zlib infback9 files linked in the description of this issue. DEFLATE64 isn't documented officially by PKWARE, http://binaryessence.com/dct/imp/en000225.htm is useful to see the differences to DEFLATE. A DEFLATE64 decoder should be able to decode a DEFLATE stream that doesn't use the length code 285 (i.e. with no back-reference length of exactly 258 bytes). was (Author: bodewig): I'm looking at the zlib infback9 files linked in the description of this issue. DEFLATE64 isn't documented officially by PKWARE, http://binaryessence.com/dct/imp/en000225.htm is useful to see the differences to DEFLATE. A DEFLATE64 decoder should be able to decode a DEFLATE stream that doesn't use the length code 285 (i.e. with no distance of exactly 258 bytes). > Support for ENHANCED_DEFLATED (Deflate64) in ZIP files > -- > > Key: COMPRESS-380 > URL: https://issues.apache.org/jira/browse/COMPRESS-380 > Project: Commons Compress > Issue Type: New Feature >Reporter: Dawid Weiss > Fix For: 1.16 > > Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, > archive64.zip, compress-380.diff, hello.world, input2 > > > Some of the (large) ZIP files we try to process currently will throw this: > {code} > UnsupportedZipFeatureException: unsupported feature method > 'ENHANCED_DEFLATED' > {code} > which is a bummer since JDK's implementation also doesn't support Deflate64. > This seems to be PKWare's extensions, although code to decrypt it exists in > zlib (and is appropriately licensed, I believe). > https://github.com/madler/zlib/tree/master/contrib/infback9 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (COMPRESS-380) Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314726#comment-16314726 ] Dawid Weiss edited comment on COMPRESS-380 at 1/6/18 4:19 PM: -- I isolated a smaller example file that still fails at runtime, with a more complex exception. Total commander (info-zip) decompresses this file just fine (it's a png file) so it has to be something in the decoding routine. https://github.com/apache/commons-compress/pull/58 {code} java.lang.IllegalStateException: Attempt to read beyond memory: dist=5955 at org.apache.commons.compress.compressors.deflate64.HuffmanDecoder$DecodingMemory.recordToBuffer(HuffmanDecoder.java:471) at org.apache.commons.compress.compressors.deflate64.HuffmanDecoder$HuffmanCodes.decodeNext(HuffmanDecoder.java:292) at org.apache.commons.compress.compressors.deflate64.HuffmanDecoder$HuffmanCodes.read(HuffmanDecoder.java:264) at org.apache.commons.compress.compressors.deflate64.HuffmanDecoder.decode(HuffmanDecoder.java:165) at org.apache.commons.compress.compressors.deflate64.Deflate64CompressorInputStream.read(Deflate64CompressorInputStream.java:77) at java.io.InputStream.read(InputStream.java:101) at org.apache.commons.compress.compressors.deflate64.Deflate64BugTest.readBeyondMemoryException(Deflate64BugTest.java:23) {code} was (Author: dweiss): I isolated a smaller example file that still fails at runtime, with a more complex exception. Total commander (info-zip) decompresses this file just fine (it's a png file) so it has to be something in the decoding routine. https://github.com/apache/commons-compress/pull/58 > Support for ENHANCED_DEFLATED (Deflate64) in ZIP files > -- > > Key: COMPRESS-380 > URL: https://issues.apache.org/jira/browse/COMPRESS-380 > Project: Commons Compress > Issue Type: New Feature >Reporter: Dawid Weiss > Fix For: 1.16 > > Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, > archive64.zip, compress-380.diff, hello.world, input2 > > > Some of the (large) ZIP files we try to process currently will throw this: > {code} > UnsupportedZipFeatureException: unsupported feature method > 'ENHANCED_DEFLATED' > {code} > which is a bummer since JDK's implementation also doesn't support Deflate64. > This seems to be PKWare's extensions, although code to decrypt it exists in > zlib (and is appropriately licensed, I believe). > https://github.com/madler/zlib/tree/master/contrib/infback9 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (COMPRESS-380) Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16311543#comment-16311543 ] Stefan Bodewig edited comment on COMPRESS-380 at 1/4/18 3:54 PM: - https://github.com/apache/commons-compress/commit/07cc1a278b217d45cb090ff6cb3a7934105cb2d0 changes {{available}}, does this look OK? I'll certainly have to add a few more tests. was (Author: bodewig): https://github.com/apache/commons-compress/commit/07cc1a278b217d45cb090ff6cb3a7934105cb2d0 changes {{available}}, does this look OK? > Support for ENHANCED_DEFLATED (Deflate64) in ZIP files > -- > > Key: COMPRESS-380 > URL: https://issues.apache.org/jira/browse/COMPRESS-380 > Project: Commons Compress > Issue Type: New Feature >Reporter: Dawid Weiss > Fix For: 1.16 > > Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, > archive64.zip, compress-380.diff, hello.world, input2 > > > Some of the (large) ZIP files we try to process currently will throw this: > {code} > UnsupportedZipFeatureException: unsupported feature method > 'ENHANCED_DEFLATED' > {code} > which is a bummer since JDK's implementation also doesn't support Deflate64. > This seems to be PKWare's extensions, although code to decrypt it exists in > zlib (and is appropriately licensed, I believe). > https://github.com/madler/zlib/tree/master/contrib/infback9 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (COMPRESS-380) Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16311099#comment-16311099 ] Stefan Bodewig edited comment on COMPRESS-380 at 1/4/18 9:56 AM: - [~chalmagr84] I'd like to remove the {{uncompressedSize}} from the stream's constructor for two reasons: * ze.getSize() when using the stream in a {{ZipArchiveInputStream}} context the uncompressed size may be unknown as it may be stpred inside of a data descriptor rather than the local file header * the stream only uses it inside of {{available}} which is supposed to return the number of bytes that can be read without blocking. The implementation of {{available}} is probably not correct for general {{InputStream}} s as we may well be blocking while trying to read bits from it, it is probably OK for the seekable input underlying {{ZipFile}} I'd make {{available}} return 0 unconditionally. Alternatively the {{DecoderState}} s may know a bit more about data they have already read and could provide a less pessimistic answer. Any objections? was (Author: bodewig): [~chalmagr84] I'd like to remove the {{uncompressedSize}} from the stream's constructor for two reasons: * ze.getSize() when using the stream in a {{ZipArchiveInputStream}} context the uncompressed size may be unknown as it may be stpred inside of a data descriptor rather than the local file header * the stream only uses it inside of {{available}} which is supposed to return the number of bytes that can be read without blocking. The implementation of {{available}} is probably not correct for general {{InputStream}}s as we may well be blocking while trying to read bits from it, it is probably OK for the seekable input underlying {{ZipFile}} I'd make {{available}} return 0 unconditionally. Alternatively the {{DecoderState}}s may know a bit more about data they have already read and could provide a less pessimistic answer. Any objections? > Support for ENHANCED_DEFLATED (Deflate64) in ZIP files > -- > > Key: COMPRESS-380 > URL: https://issues.apache.org/jira/browse/COMPRESS-380 > Project: Commons Compress > Issue Type: New Feature >Reporter: Dawid Weiss > Fix For: 1.16 > > Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, > archive64.zip, compress-380.diff, hello.world, input2 > > > Some of the (large) ZIP files we try to process currently will throw this: > {code} > UnsupportedZipFeatureException: unsupported feature method > 'ENHANCED_DEFLATED' > {code} > which is a bummer since JDK's implementation also doesn't support Deflate64. > This seems to be PKWare's extensions, although code to decrypt it exists in > zlib (and is appropriately licensed, I believe). > https://github.com/madler/zlib/tree/master/contrib/infback9 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (COMPRESS-380) Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305196#comment-16305196 ] Stefan Bodewig edited comment on COMPRESS-380 at 12/28/17 11:28 AM: [~chalmagr84] many thanks. I'm still comparing your code with the one in zlib and not completely through. This looks really good. I'd probably change the package name from ...compressors.zip to compressors.deflate64 (and maybe add a standalone {{CompressorInputStream}} as well) - and certainly add the license headers and some javadocs. I can do all of these steps myself unless you want to update the patch. was (Author: bodewig): @chalmagr84 many thanks. I'm still comparing your code with the one in zlib and not completely through. This looks really good. I'd probably change the package name from ...compressors.zip to compressors.deflate64 (and maybe add a standalone {{CompressorInputStream}} as well) - and certainly add the license headers and some javadocs. I can do all of these steps myself unless you want to update the patch. > Support for ENHANCED_DEFLATED (Deflate64) in ZIP files > -- > > Key: COMPRESS-380 > URL: https://issues.apache.org/jira/browse/COMPRESS-380 > Project: Commons Compress > Issue Type: New Feature >Reporter: Dawid Weiss > Fix For: 1.16 > > Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, > archive64.zip, compress-380.diff, hello.world, input2 > > > Some of the (large) ZIP files we try to process currently will throw this: > {code} > UnsupportedZipFeatureException: unsupported feature method > 'ENHANCED_DEFLATED' > {code} > which is a bummer since JDK's implementation also doesn't support Deflate64. > This seems to be PKWare's extensions, although code to decrypt it exists in > zlib (and is appropriately licensed, I believe). > https://github.com/madler/zlib/tree/master/contrib/infback9 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (COMPRESS-380) Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305302#comment-16305302 ] Dawid Weiss edited comment on COMPRESS-380 at 12/28/17 10:25 AM: - These (archive.zip, archive64.zip, input2) attached streams aren't binary-identical, perhaps it'll be of greater help. I generated a random sequence and folded it around other random sequence. The input is also attached for reference. {code} rm -f archive*.zip dd if=/dev/urandom bs=1024 count=1 2>/dev/null > input1 cat input1 > input2 dd if=/dev/urandom bs=1024 count=1 2>/dev/null >> input2 cat input1 >> input2 rm input1 7za a -mm=deflate64 archive64.zip input2 7za a -mm=deflate archive.zip input2 ls -l archive*.zip {code} was (Author: dweiss): These streams aren't binary-identical. I generated a random sequence and folded it around other random sequence. The input is also attached for reference. {code} rm -f archive*.zip dd if=/dev/urandom bs=1024 count=1 2>/dev/null > input1 cat input1 > input2 dd if=/dev/urandom bs=1024 count=1 2>/dev/null >> input2 cat input1 >> input2 rm input1 7za a -mm=deflate64 archive64.zip input2 7za a -mm=deflate archive.zip input2 ls -l archive*.zip {code} > Support for ENHANCED_DEFLATED (Deflate64) in ZIP files > -- > > Key: COMPRESS-380 > URL: https://issues.apache.org/jira/browse/COMPRESS-380 > Project: Commons Compress > Issue Type: New Feature >Reporter: Dawid Weiss > Fix For: 1.16 > > Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, > archive64.zip, compress-380.diff, hello.world, input2 > > > Some of the (large) ZIP files we try to process currently will throw this: > {code} > UnsupportedZipFeatureException: unsupported feature method > 'ENHANCED_DEFLATED' > {code} > which is a bummer since JDK's implementation also doesn't support Deflate64. > This seems to be PKWare's extensions, although code to decrypt it exists in > zlib (and is appropriately licensed, I believe). > https://github.com/madler/zlib/tree/master/contrib/infback9 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (COMPRESS-380) Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289388#comment-16289388 ] Christian Marquez Grabia edited comment on COMPRESS-380 at 12/13/17 3:30 PM: - [~dawid.weiss] Thanks for the updates. I have created this classes based on different research on multiple deflate de-compression samples I was able to find using the masking concepts for the literals / distance tables as well as the state concept, all brought into Java 'style'. I left it all to jira since this is more about the providing patches to the apache library I figured it would be applicable mostly to jira users (didn't want to add noise around it) -- EDIT -- About the Apache Headers, I'm ok with it. I will try to get those from other classes and update the patch later today. was (Author: chalmagr84): [~dawid.weiss] Thanks for the updates. I have created this classes based on different research on multiple deflate de-compression samples I was able to find using the masking concepts for the literals / distance tables as well as the state concept, all brought into Java 'style'. I left it all to jira since this is more about the providing patches to the apache library I figured it would be applicable mostly to jira users (didn't want to add noise around it) > Support for ENHANCED_DEFLATED (Deflate64) in ZIP files > -- > > Key: COMPRESS-380 > URL: https://issues.apache.org/jira/browse/COMPRESS-380 > Project: Commons Compress > Issue Type: New Feature >Reporter: Dawid Weiss > Attachments: compress-380.diff > > > Some of the (large) ZIP files we try to process currently will throw this: > {code} > UnsupportedZipFeatureException: unsupported feature method > 'ENHANCED_DEFLATED' > {code} > which is a bummer since JDK's implementation also doesn't support Deflate64. > This seems to be PKWare's extensions, although code to decrypt it exists in > zlib (and is appropriately licensed, I believe). > https://github.com/madler/zlib/tree/master/contrib/infback9 -- This message was sent by Atlassian JIRA (v6.4.14#64029)