On Fri, 15 Apr 2016, Thaddaeus Fillmore - US wrote:
Thanks for the reply! I actually got it to work using ExtractorFactory though. (I had a typo in the path to the jar files). Is Tika just for Office documents or can it also read other formats? Ideally I'd like something that could process plain text, Word documents, pdfs, and images, but as of right now I'm able to handle all of those formats using a variety of means.
Apache Tika can probably get text out of your kitchen sink! Especially if it's panamanian... ;-)
Nick Current formats = http://tika.apache.org/1.12/formats.html Tika's use on panama papers = https://source.opennews.org/en-US/articles/people-and-tech-behind-panama-papers/ --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
