On Wed, 20 Nov 2013, Kudrettin Güleryüz wrote:
As a first step I'd like to disable indexing of non-text documents. Can Tika help with this? Is there a method like isBinary() that Tika provides?

Could you perhaps do detection as normal, then use some logic on the media type like "type == text or (type of any parent == text)" ?

(Use the media types registry to look up the parent of any given type, to walk up the tree)

Nick

Reply via email to