On Tue, 5 Mar 2013, CL wrote:
Thanks for your feedback. I may go that route if I have to, but I'm not finding any good converters. I was hoping to avoid writing my own, which is why I'm trying Tika. Do you know if there's a relatively simple way to extend a Tika class to filter out hidden content?

There are several examples in Apache POI, and the code behind Tika is open source. Skipping certain slides should be fairly easy, other things will depend on "hiding" ends up written into the file

Seems like that should be the default anyway, doesn't it?

A lot of people use Tika to feed indexing systems, and want all the text they can get!

Nick

Reply via email to