First draft of that page is up.  Let me know if you have any questions.

On Fri, Jun 3, 2022 at 2:03 PM Tim Allison <[email protected]> wrote:

> I just added the ability to wrap a content handler via tika-config.xml and
> it will be out in 2.4.1 shortly.  Let me document it on our wiki.  I've
> started a stub here:
> https://cwiki.apache.org/confluence/display/TIKA/ModifyingContentWithHandlersAndMetadataFilters
>
> On Fri, Jun 3, 2022 at 1:41 PM Cihad Guzel <[email protected]> wrote:
>
>> Hi Nick,
>>
>> Thanks for your information.
>>
>> If i use embedded tika, i think that i can set the custom content handler
>> using the api.
>>
>> On the other hand If i use tika server, how can i set the custom content
>> handler to the tika server? Is there a way to the it from the config file?
>>
>> Regards,
>> Cihad Guzel
>>
>>
>> 3 Haz 2022 Cum 19:09 tarihinde Nick Burch <[email protected]> şunu
>> yazdı:
>>
>>> On Fri, 3 Jun 2022, Cihad Guzel wrote:
>>> > I want to pass the content's words through some filters while parsing
>>> in
>>> > Tika. How can I add custom filtering?
>>> >
>>> > Does the content handler work for this? Is there a document about this?
>>>
>>> A custom content handler is a pretty good way to do that. Tika just uses
>>> regular Java XML content handlers, so you don't need a Tika-specific
>>> tutorial on writing one
>>>
>>> Depending on what you're wanting to do, you can use Tika's
>>> TeeContentHandler to send the events to both your custom handler and a
>>> normal one. ContentHandlerDecorator can also be used to override just
>>> some
>>> bits
>>>
>>> Nick
>>>
>>>

Reply via email to