Hi Tim,

This document looks pretty good. Maybe an example can be added for
TeeContentHandler as well.

Regards,
Cihad Guzel


Tim Allison <[email protected]>, 3 Haz 2022 Cum, 22:24 tarihinde şunu
yazdı:

> First draft of that page is up.  Let me know if you have any questions.
>
> On Fri, Jun 3, 2022 at 2:03 PM Tim Allison <[email protected]> wrote:
>
>> I just added the ability to wrap a content handler via tika-config.xml
>> and it will be out in 2.4.1 shortly.  Let me document it on our wiki.  I've
>> started a stub here:
>> https://cwiki.apache.org/confluence/display/TIKA/ModifyingContentWithHandlersAndMetadataFilters
>>
>> On Fri, Jun 3, 2022 at 1:41 PM Cihad Guzel <[email protected]> wrote:
>>
>>> Hi Nick,
>>>
>>> Thanks for your information.
>>>
>>> If i use embedded tika, i think that i can set the custom content
>>> handler using the api.
>>>
>>> On the other hand If i use tika server, how can i set the custom content
>>> handler to the tika server? Is there a way to the it from the config file?
>>>
>>> Regards,
>>> Cihad Guzel
>>>
>>>
>>> 3 Haz 2022 Cum 19:09 tarihinde Nick Burch <[email protected]> şunu
>>> yazdı:
>>>
>>>> On Fri, 3 Jun 2022, Cihad Guzel wrote:
>>>> > I want to pass the content's words through some filters while parsing
>>>> in
>>>> > Tika. How can I add custom filtering?
>>>> >
>>>> > Does the content handler work for this? Is there a document about
>>>> this?
>>>>
>>>> A custom content handler is a pretty good way to do that. Tika just
>>>> uses
>>>> regular Java XML content handlers, so you don't need a Tika-specific
>>>> tutorial on writing one
>>>>
>>>> Depending on what you're wanting to do, you can use Tika's
>>>> TeeContentHandler to send the events to both your custom handler and a
>>>> normal one. ContentHandlerDecorator can also be used to override just
>>>> some
>>>> bits
>>>>
>>>> Nick
>>>>
>>>>

Reply via email to