First draft of that page is up. Let me know if you have any questions. On Fri, Jun 3, 2022 at 2:03 PM Tim Allison <[email protected]> wrote:
> I just added the ability to wrap a content handler via tika-config.xml and > it will be out in 2.4.1 shortly. Let me document it on our wiki. I've > started a stub here: > https://cwiki.apache.org/confluence/display/TIKA/ModifyingContentWithHandlersAndMetadataFilters > > On Fri, Jun 3, 2022 at 1:41 PM Cihad Guzel <[email protected]> wrote: > >> Hi Nick, >> >> Thanks for your information. >> >> If i use embedded tika, i think that i can set the custom content handler >> using the api. >> >> On the other hand If i use tika server, how can i set the custom content >> handler to the tika server? Is there a way to the it from the config file? >> >> Regards, >> Cihad Guzel >> >> >> 3 Haz 2022 Cum 19:09 tarihinde Nick Burch <[email protected]> şunu >> yazdı: >> >>> On Fri, 3 Jun 2022, Cihad Guzel wrote: >>> > I want to pass the content's words through some filters while parsing >>> in >>> > Tika. How can I add custom filtering? >>> > >>> > Does the content handler work for this? Is there a document about this? >>> >>> A custom content handler is a pretty good way to do that. Tika just uses >>> regular Java XML content handlers, so you don't need a Tika-specific >>> tutorial on writing one >>> >>> Depending on what you're wanting to do, you can use Tika's >>> TeeContentHandler to send the events to both your custom handler and a >>> normal one. ContentHandlerDecorator can also be used to override just >>> some >>> bits >>> >>> Nick >>> >>>
