On Thu, 7 Aug 2014, Bratislav Stojanovic wrote:
Hmm, I apologize, but I'm afraid this does not work. If you specify :
*java -jar tika-app-1.5-SNAPSHOT.jar --text --metadata --extract
--extract-dir=out example.doc*
...it will only extract attachments, not everything (text + meta +
attachments). Any flags I'm missing?
With the Tika App, you'll need to run it three times, once for text, once
for metadata, once for embedded resource extraction
If you want to do all 3 in one go, you'll need to write a few lines of
Java
Nick