On Fri, 3 Jun 2022, Cihad Guzel wrote:
I want to pass the content's words through some filters while parsing in
Tika. How can I add custom filtering?
Does the content handler work for this? Is there a document about this?
A custom content handler is a pretty good way to do that. Tika just uses
regular Java XML content handlers, so you don't need a Tika-specific
tutorial on writing one
Depending on what you're wanting to do, you can use Tika's
TeeContentHandler to send the events to both your custom handler and a
normal one. ContentHandlerDecorator can also be used to override just some
bits
Nick