[jira] [Commented] (COMPRESS-555) ZipArchiveInputStream should allow stored entries with data descriptor by default

2020-09-11 Thread Trevor Bentley (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194396#comment-17194396
 ] 

Trevor Bentley commented on COMPRESS-555:
-

I agree [~ggregory].

[~bodewig]- Appreciate the additional info on the STORED entries. In digging 
deeper in Tika, yhis seems like something that could be handled on the Tika 
end. When the UnsupportedZipException is thrown because of the data descriptor 
we could try to read the zip using a ZipArchiveInputStream with the 
allowStoredEntriesWithDataDescriptor enabled. 

Created a new ticket for this - https://issues.apache.org/jira/browse/TIKA-3196

Will close this issue since this is the wrong route to take to solve the issue.

> ZipArchiveInputStream should allow stored entries with data descriptor by 
> default
> -
>
> Key: COMPRESS-555
> URL: https://issues.apache.org/jira/browse/COMPRESS-555
> Project: Commons Compress
>  Issue Type: Improvement
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: Trevor Bentley
>Priority: Major
> Fix For: 1.21
>
>
> We are currently using tika for text extraction which uses commons-compress 
> for handling zips. Currently some sites are returning zips that have entries 
> with stored data descriptors which fail to extract due to the 
> ZipArchiveInputStream defaulting to false for 
> 'allowStoredEntriesWithDataDescriptor'.
> Allowing the reading of stored entries on Zip archives should be enabled by 
> default.
> PR: https://github.com/apache/commons-compress/pull/137



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-555) ZipArchiveInputStream should allow stored entries with data descriptor by default

2020-09-11 Thread Gary D. Gregory (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194276#comment-17194276
 ] 

Gary D. Gregory commented on COMPRESS-555:
--

The comments here and 
[https://github.com/apache/commons-compress/pull/137#issuecomment-690835644] 
make me think we should capture this information in the class-level Javadoc.

> ZipArchiveInputStream should allow stored entries with data descriptor by 
> default
> -
>
> Key: COMPRESS-555
> URL: https://issues.apache.org/jira/browse/COMPRESS-555
> Project: Commons Compress
>  Issue Type: Improvement
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: Trevor Bentley
>Priority: Major
> Fix For: 1.21
>
>
> We are currently using tika for text extraction which uses commons-compress 
> for handling zips. Currently some sites are returning zips that have entries 
> with stored data descriptors which fail to extract due to the 
> ZipArchiveInputStream defaulting to false for 
> 'allowStoredEntriesWithDataDescriptor'.
> Allowing the reading of stored entries on Zip archives should be enabled by 
> default.
> PR: https://github.com/apache/commons-compress/pull/137



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-555) ZipArchiveInputStream should allow stored entries with data descriptor by default

2020-09-10 Thread Stefan Bodewig (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194010#comment-17194010
 ] 

Stefan Bodewig commented on COMPRESS-555:
-

Unfortunately trying to read STORED entries that use a data descriptor is 
unreliable to say the least. It is very easy to do if you can read the central 
directory at the end of the archive - and thus ZipFile handles them just fine, 
but reading the archive as a stream is a very different issue.

The default right now will tell you "I don't think I can handle this entry" if 
you use the {{canReadEntryData}} method. The non-default option will read 
forward until it finds something that looks like the signature of the next ZIP 
entry. This will completely break down if the STORED entry contains such a 
sequence of bytes - ZIPs in ZIPs is the primary example for this (think WARs 
containing JARs for example). In recent versions we'll try to verify the 
claimed size we read from what we believe to be the data descriptor matches the 
length we've read, but then you are faced with an IOException for reading an 
entry that the stream claimed to be able to handle.

Personally I believe the option will lead to too much confusion to enable it by 
default. I prefer to have users take the deliberate choice and realize what 
they are signing up for. Even better they would find a way to make the initial 
stream seekable and use Zipfile rather than ZipArchiveInputStream.

> ZipArchiveInputStream should allow stored entries with data descriptor by 
> default
> -
>
> Key: COMPRESS-555
> URL: https://issues.apache.org/jira/browse/COMPRESS-555
> Project: Commons Compress
>  Issue Type: Improvement
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: Trevor Bentley
>Priority: Major
> Fix For: 1.21
>
>
> We are currently using tika for text extraction which uses commons-compress 
> for handling zips. Currently some sites are returning zips that have entries 
> with stored data descriptors which fail to extract due to the 
> ZipArchiveInputStream defaulting to false for 
> 'allowStoredEntriesWithDataDescriptor'.
> Allowing the reading of stored entries on Zip archives should be enabled by 
> default.
> PR: https://github.com/apache/commons-compress/pull/137



--
This message was sent by Atlassian Jira
(v8.3.4#803005)