ZipParser throws "invalid compression method" error for some archives ---------------------------------------------------------------------
Key: TIKA-346 URL: https://issues.apache.org/jira/browse/TIKA-346 Project: Tika Issue Type: Bug Components: parser Affects Versions: 0.5 Environment: Windows XP, JVM 1.6.16 Reporter: Robert Trickey Attachments: moby.zip This could be a bug in the underlying apache-commons code. When trying to parse the attached file to extract text content, an error is thrown with the following stacktrace: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pkg.zippar...@1b963c4 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101) at my.code.wherever..... Caused by: java.lang.IllegalArgumentException: invalid compression method at java.util.zip.ZipEntry.setMethod(ZipEntry.java:209) at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextZipEntry(ZipArchiveInputStream.java:146) at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextEntry(ZipArchiveInputStream.java:188) at org.apache.tika.parser.pkg.PackageParser.parseArchive(PackageParser.java:66) at org.apache.tika.parser.pkg.ZipParser.parse(ZipParser.java:49) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120) ... 25 more I have extracted the content of the zip and ran the autodetect parser against all content files without problems, so it is definitely the zip that is the problem. The attached zip is from Project Gutenberg and hence public domain. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.