Ok, so it looks like there are two issues here, am I understanding this correctly?
First one is a bug: Parser doesn't complete parsing of rest the document if it encounters encrypted zip file, returning empty result instead. Second one is feature request of adding the possibility of giving the parser the password for decrypting the zip. This part is blocked by COMPRESS-88? If the bug was fixed then I could deal with this in my Python code, without needing to wait for the feature request to complete, but fix would only be needed until the feature request completes. Would the bug get fixed or is this in limbo until zip feature completes? -- Juha On Nov 21, 2012, at 12:54 PM, Nick Burch <[email protected]> wrote: > On Wed, 21 Nov 2012, Juha Haaga wrote: >> Caused by: >> org.apache.commons.compress.archivers.zip.UnsupportedZipFeatureException: >> unsupported feature encryption used in entry … >> >> Is this error caused by lack of password or lack of zip decrypting >> functionality? Is it possible to provide the zip file password to the >> tika-server in the http headers? > > Some Tika parsers do support decrypting password protected files. This works > by you supplying a PasswordProvider object on the ParseContext, which is used > to get the decryption password during parsing > > However, it looks like the zip parser doesn't do this. I'd suggest you open > an enhancement request in JIRA for it. It looks like it'll need a bit of work > on Tika to add the password fetching to the compress parser, and some work > with Commons Compress to finish off COMPRESS-88 so that the unerlying library > supports it. > > It may take some time, but if it's important to you I'd suggest you give the > commons team a hand! > > Nick
