[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2021-06-14 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362915#comment-17362915
 ] 

A Kelday commented on COMPRESS-514:
---

Hi,

I see there's a new github comment about rebasing the current (very old) PR. 
I'll hopefully have some time to look at this again soon, but I no longer have 
the "test case" 7zip to work with. It contained 3rd party data which cannot be 
retained or shared, plus it was over 1TB.

I'll attempt to create a smaller test case 7zip which reproduces the original 
problem, but it could take some time since it probably requires at least 20 
million central directory paths.

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-06-01 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17121361#comment-17121361
 ] 

A Kelday commented on COMPRESS-514:
---

Thanks [~bodewig] and [~peterlee] for the input. Happy to work on it more at 
some point if you choose an option (I'll keep thinking about it anyway and do 
some more checks).

Peter explained my concern exactly: that in most cases given corrupt data, we 
could expect an exception other than the one triggered by the CRC check to 
happen _before_ the end of stream is ever reached (because we aren't just 
transferring data, we're branching based on it). That's really a best case, 
because worse than that is some garbage filename list being created. What I'm 
very conscious of is making the common use case code worse.

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-29 Thread Peter Lee (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120104#comment-17120104
 ] 

Peter Lee commented on COMPRESS-514:


> _There are a few places where the 7z format says a certain value is a UINT64 
>and we store it inside of a Java long at best. Even if we fix this particular 
>case in some way there will be more problems lurking (that I hope we all catch 
>before they cause ArrayIndexOutOufBoundsExceptions or similar things). Because 
>of this I'd be fine with listing the known limitations._

I think you are talking about the _assertFitsIntoInt_ in SevenZFile(caused we 
are using arrays and java has a limitation of array length).

That's a comlicated problem and I will try to find a solution.

 

>  _As long as we detect a bad CRC inside of SevenZFile's constructor, your 
>option 3 sounds reasonable._

 

+1 for this. We are doing the similiar thing in _CRC32VerifyingInputStream_. I 
think [~akelday] is worried that the result of CRC check can only be known if 
all the data in _HeaderChannelBuffer_ is exhausted - and it means we have done 
a lot of work on the corrupted data. But it seems we do not have other options 
if we are handling a giant amout of data.

 

And for this particular issue, I'm not sure if we should merge the PR# 98 or 
not : for encoded header the PR is OK cause it's hard to image the header for 
the encoded header (header of header LOL :)) is bigger than 16MB, but it may 
cause some problems for normal header(not encoded) cause we can no longer 
obtain the CRC if its size is more than 16MB.

I'm not sure if this is a good idea or not : we could pass the expected CRC to 
HeaderChannelBuffer's constructor and throw exception when the data in 
HeaderChannelBuffer is exhausted - acting similiar to the 
_CRC32VerifyingInputStream._

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-29 Thread Peter Lee (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120095#comment-17120095
 ] 

Peter Lee commented on COMPRESS-514:


> _There are a few places where the 7z format says a certain value is a UINT64 
>and we store it inside of a Java long at best. Even if we fix this particular 
>case in some way there will be more problems lurking (that I hope we all catch 
>before they cause ArrayIndexOutOufBoundsExceptions or similar things). Because 
>of this I'd be fine with listing the known limitations._

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-23 Thread Stefan Bodewig (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114860#comment-17114860
 ] 

Stefan Bodewig commented on COMPRESS-514:
-

There are a few places where the 7z format says a certain value is a UINT64 and 
we store it inside of a Java long at best. Even if we fix this particular case 
in some way there will be more problems lurking (that I hope we all catch 
before they cause ArrayIndexOutOufBoundsExceptions or similar things). Because 
of this I'd be fine with listing the known limitations.

If we decide to deal with this specific problem then I'd prefer a solution that 
doesn't penalize the normal case too much and doesn't hide problems we can find 
early with the existing code. As long as we detect a bad CRC inside of 
SevenZFile's constructor, your option 3 sounds reasonable. Nobody would see and 
use the "bad" data, or am I overlooking something?

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-21 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113466#comment-17113466
 ] 

A Kelday commented on COMPRESS-514:
---

After digging in a bit more this takes me back to the same CRC problem as 
before, but with some new info after looking at the 7zip source.

It looks like 7zip does nearly the same as the current Commons Compress; read 
the whole header buffer into ram and CRC before parsing. The difference is 
that's an unsigned int, so maximum 4GiB (above that is unsupported). Indeed 
7zip uses over 5GiB ram simply to show the files list of this 1.2TB archive.

That leads to at least three options:
 # 7zip method: read all into ram (with multiple buffers up to 4G) for CRC and 
parse
 # Read the header twice if necessary: once streamed for CRC, the next using a 
small buffer to parse. If the header fits in our small buffer entirely no extra 
read is required.
 # Read the header and compute CRC at the same time (bad because you don't find 
out the data is wrong until it's too late)

It would be great to have some opinion here, because this is more than I'd 
hoped it would require to fix. There's always the choice to just not support 
over 2G...

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-18 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110652#comment-17110652
 ] 

A Kelday commented on COMPRESS-514:
---

Hi [~peterlee] ,

As mentioned in the PR the resource close is not handled correctly for encoded 
headers, so that's definitely not fit to merge (sorry about that). The main 
reason is `readEncodedHeader` now returns with an open inputstream.

A quick fix would be to add `Closeable` for `HeaderBuffer`, but would need to 
ensure `close()` is called only when an _encoded_ header was read (otherwise 
the underlying file channel would be closed!). I'll go with that plan if 
nothing else springs to mind.

If you have better ideas I'd be glad to hear them.

 

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-13 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106270#comment-17106270
 ] 

A Kelday commented on COMPRESS-514:
---

[~ggregory] I think the current PR fixes this issue without any new problems.

It sidesteps the problem of the end header being fully in memory for the CRC 
check (that's what current master branch does anyway), but should make it 
easier to tackle that later. I think that ought to be a separate issue. I might 
have time to work on that myself at some point if nobody else does.

[~peterlee] thanks very much for your PR comments so far, it's been most 
helpful!

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-11 Thread Peter Lee (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17104997#comment-17104997
 ] 

Peter Lee commented on COMPRESS-514:


> JDK 14 doesn't appear to build, the rest do.

Easy. It's not your problem. The failure of build in JDK 14 is caused by some 
other things.

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-11 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17104944#comment-17104944
 ] 

A Kelday commented on COMPRESS-514:
---

[~ggregory] for what it's worth, that the PR in.

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-11 Thread Gary D. Gregory (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17104920#comment-17104920
 ] 

Gary D. Gregory commented on COMPRESS-514:
--

A PR is best since it will be tested by the CI system.

 

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-11 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17104904#comment-17104904
 ] 

A Kelday commented on COMPRESS-514:
---

Hi [~ggregory] ,

Attached now is the main class I patched in (in place of direct ByteBuffer 
usage). There's obviously more to it so a diff or PR would make more sense, but 
I expect you folks will have a nicer solution anyway!

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-11 Thread Gary D. Gregory (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17104879#comment-17104879
 ] 

Gary D. Gregory commented on COMPRESS-514:
--

Why don't we change our code to handle larger entries and also add some 
configurable limits to avoid zip bombs?

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-11 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17104320#comment-17104320
 ] 

A Kelday commented on COMPRESS-514:
---

Hi Peter,

There's no problem for me - as I said above, I patched commons compress to make 
it work. My question is whether you would expect this to be possible or not 
(e.g. maybe it's been decided not to support it).

If you would prefer to support it then I have code to do so, which I'm happy to 
share :)

There are however some questions regarding CRC checks if we follow that path...

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-11 Thread Peter Lee (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17104168#comment-17104168
 ] 

Peter Lee commented on COMPRESS-514:


Unfortunately your 7zip file is too big for Commons Compress to handle.

The maximum unpack size of the encorded header that Commons Compress can handle 
is 2,147,483,647 bytes, which is 2GB. In your case, the unpack size 
2,416,988,886, which is larger than the data Commons Compress can handle.

Commons Compress will try to read all the unpack data of encorded header into a 
byte array, which has a size limit of 2GB.

Actually it's a little surprised for me that there are 7zip archives with 
headers that a larger than 2GB. For most cases, the headers are very small, as 
they are only the meta data of the compressed info, not the compressed file 
data.

I can not provide any good suggestions. Maybe you can try to divide the large 
7z archive into 2 small ones(I'm not talking about creating split 7z archive, 
it's not helping)?

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)