I'm incorporating Tika into a Joomla indexing component I am building
and would like to allow users to select a local Tika path to speed up
attachment parsing.
I'm running the latest version of Apache Solr (3.1.0) which I believe
runs Tika 0.8. My initial test script sends a file to Solr for
extraction and I am returned various information about the document;
name, size, creator, title, description, etc.
However, when I run the tika-app jar against the document I only get
Content Length, Content-Type and resourceName, and am unable to retrieve
creator, title, description, creation date, etc.
I run tika from the command line using:
java -jar tika-app-0.9.jar /var/www/test.odt
Should tika-app.jar provide me with more information or do I need to
develop against the API to get more information?
Cheers
Hayden