ZipParser throws "invalid compression method" error for some archives
---------------------------------------------------------------------

                 Key: TIKA-346
                 URL: https://issues.apache.org/jira/browse/TIKA-346
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.5
         Environment: Windows XP, JVM 1.6.16
            Reporter: Robert Trickey
         Attachments: moby.zip

This could be a bug in the underlying apache-commons code. When trying to parse 
the attached file to extract text content, an error is thrown with the 
following stacktrace:

org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.pkg.zippar...@1b963c4
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
        at my.code.wherever.....
Caused by: java.lang.IllegalArgumentException: invalid compression method
        at java.util.zip.ZipEntry.setMethod(ZipEntry.java:209)
        at 
org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextZipEntry(ZipArchiveInputStream.java:146)
        at 
org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextEntry(ZipArchiveInputStream.java:188)
        at 
org.apache.tika.parser.pkg.PackageParser.parseArchive(PackageParser.java:66)
        at org.apache.tika.parser.pkg.ZipParser.parse(ZipParser.java:49)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
        ... 25 more

I have extracted the content of the zip and ran the autodetect parser against 
all content files without problems, so it is definitely the zip that is the 
problem.

The attached zip is from Project Gutenberg and hence public domain.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to