I'm incorporating Tika into a Joomla indexing component I am building and would like to allow users to select a local Tika path to speed up attachment parsing.

I'm running the latest version of Apache Solr (3.1.0) which I believe runs Tika 0.8. My initial test script sends a file to Solr for extraction and I am returned various information about the document; name, size, creator, title, description, etc.

However, when I run the tika-app jar against the document I only get Content Length, Content-Type and resourceName, and am unable to retrieve creator, title, description, creation date, etc.

I run tika from the command line using:

java -jar tika-app-0.9.jar /var/www/test.odt

Should tika-app.jar provide me with more information or do I need to develop against the API to get more information?

Cheers


Hayden

Reply via email to