[jira] [Commented] (COMPRESS-555) ZipArchiveInputStream should allow stored entries with data descriptor by default
[ https://issues.apache.org/jira/browse/COMPRESS-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194396#comment-17194396 ] Trevor Bentley commented on COMPRESS-555: - I agree [~ggregory]. [~bodewig]- Appreciate the additional info on the STORED entries. In digging deeper in Tika, yhis seems like something that could be handled on the Tika end. When the UnsupportedZipException is thrown because of the data descriptor we could try to read the zip using a ZipArchiveInputStream with the allowStoredEntriesWithDataDescriptor enabled. Created a new ticket for this - https://issues.apache.org/jira/browse/TIKA-3196 Will close this issue since this is the wrong route to take to solve the issue. > ZipArchiveInputStream should allow stored entries with data descriptor by > default > - > > Key: COMPRESS-555 > URL: https://issues.apache.org/jira/browse/COMPRESS-555 > Project: Commons Compress > Issue Type: Improvement > Components: Archivers >Affects Versions: 1.20 >Reporter: Trevor Bentley >Priority: Major > Fix For: 1.21 > > > We are currently using tika for text extraction which uses commons-compress > for handling zips. Currently some sites are returning zips that have entries > with stored data descriptors which fail to extract due to the > ZipArchiveInputStream defaulting to false for > 'allowStoredEntriesWithDataDescriptor'. > Allowing the reading of stored entries on Zip archives should be enabled by > default. > PR: https://github.com/apache/commons-compress/pull/137 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (COMPRESS-555) ZipArchiveInputStream should allow stored entries with data descriptor by default
[ https://issues.apache.org/jira/browse/COMPRESS-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194276#comment-17194276 ] Gary D. Gregory commented on COMPRESS-555: -- The comments here and [https://github.com/apache/commons-compress/pull/137#issuecomment-690835644] make me think we should capture this information in the class-level Javadoc. > ZipArchiveInputStream should allow stored entries with data descriptor by > default > - > > Key: COMPRESS-555 > URL: https://issues.apache.org/jira/browse/COMPRESS-555 > Project: Commons Compress > Issue Type: Improvement > Components: Archivers >Affects Versions: 1.20 >Reporter: Trevor Bentley >Priority: Major > Fix For: 1.21 > > > We are currently using tika for text extraction which uses commons-compress > for handling zips. Currently some sites are returning zips that have entries > with stored data descriptors which fail to extract due to the > ZipArchiveInputStream defaulting to false for > 'allowStoredEntriesWithDataDescriptor'. > Allowing the reading of stored entries on Zip archives should be enabled by > default. > PR: https://github.com/apache/commons-compress/pull/137 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (COMPRESS-555) ZipArchiveInputStream should allow stored entries with data descriptor by default
[ https://issues.apache.org/jira/browse/COMPRESS-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194010#comment-17194010 ] Stefan Bodewig commented on COMPRESS-555: - Unfortunately trying to read STORED entries that use a data descriptor is unreliable to say the least. It is very easy to do if you can read the central directory at the end of the archive - and thus ZipFile handles them just fine, but reading the archive as a stream is a very different issue. The default right now will tell you "I don't think I can handle this entry" if you use the {{canReadEntryData}} method. The non-default option will read forward until it finds something that looks like the signature of the next ZIP entry. This will completely break down if the STORED entry contains such a sequence of bytes - ZIPs in ZIPs is the primary example for this (think WARs containing JARs for example). In recent versions we'll try to verify the claimed size we read from what we believe to be the data descriptor matches the length we've read, but then you are faced with an IOException for reading an entry that the stream claimed to be able to handle. Personally I believe the option will lead to too much confusion to enable it by default. I prefer to have users take the deliberate choice and realize what they are signing up for. Even better they would find a way to make the initial stream seekable and use Zipfile rather than ZipArchiveInputStream. > ZipArchiveInputStream should allow stored entries with data descriptor by > default > - > > Key: COMPRESS-555 > URL: https://issues.apache.org/jira/browse/COMPRESS-555 > Project: Commons Compress > Issue Type: Improvement > Components: Archivers >Affects Versions: 1.20 >Reporter: Trevor Bentley >Priority: Major > Fix For: 1.21 > > > We are currently using tika for text extraction which uses commons-compress > for handling zips. Currently some sites are returning zips that have entries > with stored data descriptors which fail to extract due to the > ZipArchiveInputStream defaulting to false for > 'allowStoredEntriesWithDataDescriptor'. > Allowing the reading of stored entries on Zip archives should be enabled by > default. > PR: https://github.com/apache/commons-compress/pull/137 -- This message was sent by Atlassian Jira (v8.3.4#803005)