[ 
https://issues.apache.org/jira/browse/TIKA-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting updated TIKA-346:
-------------------------------

    Attachment: TIKA-346.patch

The attached patch fixes this problem after recent Commons Compress changes 
related to COMPRESS-93. We can apply the patch once Commons Compress 1.1 is 
available.

> ZipParser throws "invalid compression method" error for some archives
> ---------------------------------------------------------------------
>
>                 Key: TIKA-346
>                 URL: https://issues.apache.org/jira/browse/TIKA-346
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.5
>         Environment: Windows XP, JVM 1.6.16
>            Reporter: Robert Trickey
>         Attachments: moby.zip, TIKA-346.patch
>
>
> This could be a bug in the underlying apache-commons code. When trying to 
> parse the attached file to extract text content, an error is thrown with the 
> following stacktrace:
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.pkg.zippar...@1b963c4
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
>       at my.code.wherever.....
> Caused by: java.lang.IllegalArgumentException: invalid compression method
>       at java.util.zip.ZipEntry.setMethod(ZipEntry.java:209)
>       at 
> org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextZipEntry(ZipArchiveInputStream.java:146)
>       at 
> org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextEntry(ZipArchiveInputStream.java:188)
>       at 
> org.apache.tika.parser.pkg.PackageParser.parseArchive(PackageParser.java:66)
>       at org.apache.tika.parser.pkg.ZipParser.parse(ZipParser.java:49)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
>       ... 25 more
> I have extracted the content of the zip and ran the autodetect parser against 
> all content files without problems, so it is definitely the zip that is the 
> problem.
> The attached zip is from Project Gutenberg and hence public domain.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to