There seems to be a bug with the current 1.4.1 release. You cannot extract any content at all, regardless of content type.

Try to get a fresh version from the SVN repository. I did that earlier today and can verify that Tika now will extract the content. I'm not sure about zip files.

Tika version 0.8 is not included in the latest release/trunk from SVN.

Erlend

On 25.01.11 11.19, Gary Taylor wrote:
Hi,

I posted a question in November last year about indexing content from
multiple binary files into a single Solr document and Jayendra responded
with a simple solution to zip them up and send that single file to Solr.

I understand that the Tika 0.4 JARs supplied with Solr 1.4.1 don't
currently allow this to work and only the file names of the zipped files
are indexed (and not their contents).

I've tried downloading and building the latest Tika (0.8) and replacing
the tika-parsers and tika-core JARS in
<solr-root>\contrib\extraction\lib but this still isn't indexing the
file contents, and not doesn't even index the file names!

Is there a version of Tika that works with the Solr 1.4.1 released
distribution which does index the contents of the zipped files?

Thanks and kind regards,
Gary



--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

Reply via email to