On Fri, 3 Jun 2022, Cihad Guzel wrote:
I want to pass the content's words through some filters while parsing in Tika. How can I add custom filtering?

Does the content handler work for this? Is there a document about this?

A custom content handler is a pretty good way to do that. Tika just uses regular Java XML content handlers, so you don't need a Tika-specific tutorial on writing one

Depending on what you're wanting to do, you can use Tika's TeeContentHandler to send the events to both your custom handler and a normal one. ContentHandlerDecorator can also be used to override just some bits

Nick

Reply via email to