[jira] [Commented] (COMPRESS-542) Corrupt 7z allocates huge amount of SevenZEntries

2021-07-12 Thread Stefan Bodewig (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379573#comment-17379573
 ] 

Stefan Bodewig commented on COMPRESS-542:
-

This issue has been assigned the name 
[CVE-2021-35516|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-35516]

> Corrupt 7z allocates huge amount of SevenZEntries
> -
>
> Key: COMPRESS-542
> URL: https://issues.apache.org/jira/browse/COMPRESS-542
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.20
>Reporter: Robin Schimpf
>Priority: Major
> Fix For: 1.21
>
> Attachments: 
> Reduced_memory_allocation_for_corrupted_7z_archives.patch, 
> endheadercorrupted.7z, endheadercorrupted2.7z
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We ran into a problem where a 1.43GB corrupt 7z file tried to allocate about 
> 138 million SevenZArchiveEntries which will use about 12GB of memory. Sadly 
> I'm unable to share the file. If you have enough Memory available the 
> following exception is thrown.
> {code:java}
> java.io.IOException: Start header corrupt and unable to guess end Header
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.tryToLocateEndHeader(SevenZFile.java:511)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:470)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:336)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:128)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:369)
> {code}
> 7z itself aborts really quick when I'm trying to list the content of the file.
> {code:java}
> 7z l "corrupt.7z"
> 7-Zip 18.01 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2018-01-28
> Scanning the drive for archives:
> 1 file, 1537752212 bytes (1467 MiB)
> Listing archive: corrupt.7z
> ERROR: corrupt.7z : corrupt.7z
> Open ERROR: Can not open the file as [7z] archive
> ERRORS:
> Is not archive
> Errors: 1
> {code}
> I hacked together the attached patch which will reduce the memory allocation 
> to about 1GB. So lazy instantiation of the entries could be a good solution 
> to the problem. Optimal would be to only create the entries if the headers 
> could be parsed correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-542) Corrupt 7z allocates huge amount of SevenZEntries

2021-06-27 Thread Stefan Bodewig (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370314#comment-17370314
 ] 

Stefan Bodewig commented on COMPRESS-542:
-

OK, my benchmarks 
https://github.com/bodewig/commons-compress-benchmarks/wiki/COMPRESS-542 show 
there is a certain performance hit if there are very many entries - then again 
I believe this is acceptable as we need to do the checks anyway and it is 
better to perform them before we allocate too much memory.

> Corrupt 7z allocates huge amount of SevenZEntries
> -
>
> Key: COMPRESS-542
> URL: https://issues.apache.org/jira/browse/COMPRESS-542
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.20
>Reporter: Robin Schimpf
>Priority: Major
> Attachments: 
> Reduced_memory_allocation_for_corrupted_7z_archives.patch, 
> endheadercorrupted.7z, endheadercorrupted2.7z
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We ran into a problem where a 1.43GB corrupt 7z file tried to allocate about 
> 138 million SevenZArchiveEntries which will use about 12GB of memory. Sadly 
> I'm unable to share the file. If you have enough Memory available the 
> following exception is thrown.
> {code:java}
> java.io.IOException: Start header corrupt and unable to guess end Header
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.tryToLocateEndHeader(SevenZFile.java:511)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:470)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:336)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:128)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:369)
> {code}
> 7z itself aborts really quick when I'm trying to list the content of the file.
> {code:java}
> 7z l "corrupt.7z"
> 7-Zip 18.01 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2018-01-28
> Scanning the drive for archives:
> 1 file, 1537752212 bytes (1467 MiB)
> Listing archive: corrupt.7z
> ERROR: corrupt.7z : corrupt.7z
> Open ERROR: Can not open the file as [7z] archive
> ERRORS:
> Is not archive
> Errors: 1
> {code}
> I hacked together the attached patch which will reduce the memory allocation 
> to about 1GB. So lazy instantiation of the entries could be a good solution 
> to the problem. Optimal would be to only create the entries if the headers 
> could be parsed correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-542) Corrupt 7z allocates huge amount of SevenZEntries

2021-05-17 Thread Robin Schimpf (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346406#comment-17346406
 ] 

Robin Schimpf commented on COMPRESS-542:


Didn't have the time for a close look but just running the test and see it go 
down to ~100ms is impressive. Great work!

> Corrupt 7z allocates huge amount of SevenZEntries
> -
>
> Key: COMPRESS-542
> URL: https://issues.apache.org/jira/browse/COMPRESS-542
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.20
>Reporter: Robin Schimpf
>Priority: Major
> Attachments: 
> Reduced_memory_allocation_for_corrupted_7z_archives.patch, 
> endheadercorrupted.7z, endheadercorrupted2.7z
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We ran into a problem where a 1.43GB corrupt 7z file tried to allocate about 
> 138 million SevenZArchiveEntries which will use about 12GB of memory. Sadly 
> I'm unable to share the file. If you have enough Memory available the 
> following exception is thrown.
> {code:java}
> java.io.IOException: Start header corrupt and unable to guess end Header
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.tryToLocateEndHeader(SevenZFile.java:511)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:470)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:336)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:128)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:369)
> {code}
> 7z itself aborts really quick when I'm trying to list the content of the file.
> {code:java}
> 7z l "corrupt.7z"
> 7-Zip 18.01 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2018-01-28
> Scanning the drive for archives:
> 1 file, 1537752212 bytes (1467 MiB)
> Listing archive: corrupt.7z
> ERROR: corrupt.7z : corrupt.7z
> Open ERROR: Can not open the file as [7z] archive
> ERRORS:
> Is not archive
> Errors: 1
> {code}
> I hacked together the attached patch which will reduce the memory allocation 
> to about 1GB. So lazy instantiation of the entries could be a good solution 
> to the problem. Optimal would be to only create the entries if the headers 
> could be parsed correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-542) Corrupt 7z allocates huge amount of SevenZEntries

2021-05-16 Thread Stefan Bodewig (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345655#comment-17345655
 ] 

Stefan Bodewig commented on COMPRESS-542:
-

I've added a few more counts during the sanity check phase and the result is 
convincing IMHO. The unit test {{testNoOOMOnCorruptedHeader}} added earlier 
initially took something like 15s on my machine and now finishes with less than 
half a second (and almost no memory  being consumed at all).

I'll run a few benchmarks to see whether - and if how much - the additional 
parser run hurts for valid archives to get some data on whether to enable 
sanity checking for each archive (it is right now) or only for suspect broken 
archives.

> Corrupt 7z allocates huge amount of SevenZEntries
> -
>
> Key: COMPRESS-542
> URL: https://issues.apache.org/jira/browse/COMPRESS-542
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.20
>Reporter: Robin Schimpf
>Priority: Major
> Attachments: 
> Reduced_memory_allocation_for_corrupted_7z_archives.patch, 
> endheadercorrupted.7z, endheadercorrupted2.7z
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We ran into a problem where a 1.43GB corrupt 7z file tried to allocate about 
> 138 million SevenZArchiveEntries which will use about 12GB of memory. Sadly 
> I'm unable to share the file. If you have enough Memory available the 
> following exception is thrown.
> {code:java}
> java.io.IOException: Start header corrupt and unable to guess end Header
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.tryToLocateEndHeader(SevenZFile.java:511)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:470)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:336)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:128)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:369)
> {code}
> 7z itself aborts really quick when I'm trying to list the content of the file.
> {code:java}
> 7z l "corrupt.7z"
> 7-Zip 18.01 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2018-01-28
> Scanning the drive for archives:
> 1 file, 1537752212 bytes (1467 MiB)
> Listing archive: corrupt.7z
> ERROR: corrupt.7z : corrupt.7z
> Open ERROR: Can not open the file as [7z] archive
> ERRORS:
> Is not archive
> Errors: 1
> {code}
> I hacked together the attached patch which will reduce the memory allocation 
> to about 1GB. So lazy instantiation of the entries could be a good solution 
> to the problem. Optimal would be to only create the entries if the headers 
> could be parsed correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-542) Corrupt 7z allocates huge amount of SevenZEntries

2021-05-13 Thread Stefan Bodewig (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344081#comment-17344081
 ] 

Stefan Bodewig commented on COMPRESS-542:
-

Commit 26924e96 contains an extended sanity check which gets away with lot less 
memory allocations (basically a smallish constant number and a few bytes - up 
to two longs plus a bit - times the claimed number of folders inside of the 
archive currently). Right now it could check a few more things and I'll work on 
that. I've enabled the check unconditionally but will revisit that once done as 
it seems as if it would slow down normal operation - we'll have to see how 
significant the effect is, and how much it will be reduced by removing the 
checks that are now performed for a second time when actually filling the 
metadata structures.

> Corrupt 7z allocates huge amount of SevenZEntries
> -
>
> Key: COMPRESS-542
> URL: https://issues.apache.org/jira/browse/COMPRESS-542
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.20
>Reporter: Robin Schimpf
>Priority: Major
> Attachments: 
> Reduced_memory_allocation_for_corrupted_7z_archives.patch, 
> endheadercorrupted.7z, endheadercorrupted2.7z
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We ran into a problem where a 1.43GB corrupt 7z file tried to allocate about 
> 138 million SevenZArchiveEntries which will use about 12GB of memory. Sadly 
> I'm unable to share the file. If you have enough Memory available the 
> following exception is thrown.
> {code:java}
> java.io.IOException: Start header corrupt and unable to guess end Header
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.tryToLocateEndHeader(SevenZFile.java:511)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:470)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:336)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:128)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:369)
> {code}
> 7z itself aborts really quick when I'm trying to list the content of the file.
> {code:java}
> 7z l "corrupt.7z"
> 7-Zip 18.01 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2018-01-28
> Scanning the drive for archives:
> 1 file, 1537752212 bytes (1467 MiB)
> Listing archive: corrupt.7z
> ERROR: corrupt.7z : corrupt.7z
> Open ERROR: Can not open the file as [7z] archive
> ERRORS:
> Is not archive
> Errors: 1
> {code}
> I hacked together the attached patch which will reduce the memory allocation 
> to about 1GB. So lazy instantiation of the entries could be a good solution 
> to the problem. Optimal would be to only create the entries if the headers 
> could be parsed correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-542) Corrupt 7z allocates huge amount of SevenZEntries

2021-05-02 Thread Stefan Bodewig (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337996#comment-17337996
 ] 

Stefan Bodewig commented on COMPRESS-542:
-

As [~akelday] points out in the github PR one problem is that {{SevnZFile}} 
tries to recover from a corrupted archive and gives up an important CRC check 
that way. Of course an OOM could happen on a legitimate archive that is really 
big. For the latter case we could reuse the memory limit of the 
{{SevenZFileOption}} we currently only use when decoding streams.

One idea I'm currently pondering is if either a limit has been set or we know 
the archive is corrupt then we parse the metadata in two passes. The first pass 
would only collect enough information to guess the amount of memory we'd need 
to allocate and at the same time sanity check values - like array sizes must be 
positive, positions of other "places" must be inside of the bounds of the 
archive and so on.

This would probably slow down opening a new archive in certain cases but would 
allow us to reject corrupted archives without running out of memory - and would 
allow users to commit to a certain amount of memory. Of course we'd only be 
guessing the amount of memory we require but this sounds better than not 
providing any control at all.

> Corrupt 7z allocates huge amount of SevenZEntries
> -
>
> Key: COMPRESS-542
> URL: https://issues.apache.org/jira/browse/COMPRESS-542
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.20
>Reporter: Robin Schimpf
>Priority: Major
> Attachments: 
> Reduced_memory_allocation_for_corrupted_7z_archives.patch, 
> endheadercorrupted.7z, endheadercorrupted2.7z
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We ran into a problem where a 1.43GB corrupt 7z file tried to allocate about 
> 138 million SevenZArchiveEntries which will use about 12GB of memory. Sadly 
> I'm unable to share the file. If you have enough Memory available the 
> following exception is thrown.
> {code:java}
> java.io.IOException: Start header corrupt and unable to guess end Header
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.tryToLocateEndHeader(SevenZFile.java:511)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:470)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:336)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:128)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:369)
> {code}
> 7z itself aborts really quick when I'm trying to list the content of the file.
> {code:java}
> 7z l "corrupt.7z"
> 7-Zip 18.01 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2018-01-28
> Scanning the drive for archives:
> 1 file, 1537752212 bytes (1467 MiB)
> Listing archive: corrupt.7z
> ERROR: corrupt.7z : corrupt.7z
> Open ERROR: Can not open the file as [7z] archive
> ERRORS:
> Is not archive
> Errors: 1
> {code}
> I hacked together the attached patch which will reduce the memory allocation 
> to about 1GB. So lazy instantiation of the entries could be a good solution 
> to the problem. Optimal would be to only create the entries if the headers 
> could be parsed correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-542) Corrupt 7z allocates huge amount of SevenZEntries

2020-08-06 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17172118#comment-17172118
 ] 

A Kelday commented on COMPRESS-542:
---

Here are a couple test files which will I think reproduce the issue:

[^endheadercorrupted.7z]

[^endheadercorrupted2.7z]

> Corrupt 7z allocates huge amount of SevenZEntries
> -
>
> Key: COMPRESS-542
> URL: https://issues.apache.org/jira/browse/COMPRESS-542
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.20
>Reporter: Robin Schimpf
>Priority: Major
> Attachments: 
> Reduced_memory_allocation_for_corrupted_7z_archives.patch, 
> endheadercorrupted.7z, endheadercorrupted2.7z
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We ran into a problem where a 1.43GB corrupt 7z file tried to allocate about 
> 138 million SevenZArchiveEntries which will use about 12GB of memory. Sadly 
> I'm unable to share the file. If you have enough Memory available the 
> following exception is thrown.
> {code:java}
> java.io.IOException: Start header corrupt and unable to guess end Header
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.tryToLocateEndHeader(SevenZFile.java:511)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:470)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:336)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:128)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:369)
> {code}
> 7z itself aborts really quick when I'm trying to list the content of the file.
> {code:java}
> 7z l "corrupt.7z"
> 7-Zip 18.01 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2018-01-28
> Scanning the drive for archives:
> 1 file, 1537752212 bytes (1467 MiB)
> Listing archive: corrupt.7z
> ERROR: corrupt.7z : corrupt.7z
> Open ERROR: Can not open the file as [7z] archive
> ERRORS:
> Is not archive
> Errors: 1
> {code}
> I hacked together the attached patch which will reduce the memory allocation 
> to about 1GB. So lazy instantiation of the entries could be a good solution 
> to the problem. Optimal would be to only create the entries if the headers 
> could be parsed correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)