There seems to be a bug with the current 1.4.1 release. You cannot extract any content at all, regardless of content type.
Try to get a fresh version from the SVN repository. I did that earlier today and can verify that Tika now will extract the content. I'm not sure about zip files.
Tika version 0.8 is not included in the latest release/trunk from SVN. Erlend On 25.01.11 11.19, Gary Taylor wrote:
Hi, I posted a question in November last year about indexing content from multiple binary files into a single Solr document and Jayendra responded with a simple solution to zip them up and send that single file to Solr. I understand that the Tika 0.4 JARs supplied with Solr 1.4.1 don't currently allow this to work and only the file names of the zipped files are indexed (and not their contents). I've tried downloading and building the latest Tika (0.8) and replacing the tika-parsers and tika-core JARS in <solr-root>\contrib\extraction\lib but this still isn't indexing the file contents, and not doesn't even index the file names! Is there a version of Tika that works with the Solr 1.4.1 released distribution which does index the contents of the zipped files? Thanks and kind regards, Gary
-- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050