[ https://issues.apache.org/jira/browse/TIKA-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting updated TIKA-346: ------------------------------- Attachment: TIKA-346.patch The attached patch fixes this problem after recent Commons Compress changes related to COMPRESS-93. We can apply the patch once Commons Compress 1.1 is available. > ZipParser throws "invalid compression method" error for some archives > --------------------------------------------------------------------- > > Key: TIKA-346 > URL: https://issues.apache.org/jira/browse/TIKA-346 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.5 > Environment: Windows XP, JVM 1.6.16 > Reporter: Robert Trickey > Attachments: moby.zip, TIKA-346.patch > > > This could be a bug in the underlying apache-commons code. When trying to > parse the attached file to extract text content, an error is thrown with the > following stacktrace: > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.pkg.zippar...@1b963c4 > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101) > at my.code.wherever..... > Caused by: java.lang.IllegalArgumentException: invalid compression method > at java.util.zip.ZipEntry.setMethod(ZipEntry.java:209) > at > org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextZipEntry(ZipArchiveInputStream.java:146) > at > org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextEntry(ZipArchiveInputStream.java:188) > at > org.apache.tika.parser.pkg.PackageParser.parseArchive(PackageParser.java:66) > at org.apache.tika.parser.pkg.ZipParser.parse(ZipParser.java:49) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120) > ... 25 more > I have extracted the content of the zip and ran the autodetect parser against > all content files without problems, so it is definitely the zip that is the > problem. > The attached zip is from Project Gutenberg and hence public domain. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.