I was wrong. "excluded" parsers are not loaded. We have a unit test for
this. If we find that they are loaded, that's a bug that needs to be fixed.

As mentioned elsewhere, https://issues.apache.org/jira/browse/TIKA-4215 is
the source of your problem. We were loading two "default" tika configs just
to get the version number in tika-server. That issue fixes that problem.
This fix may improve the loading speed of tika-server, too. :D

On Wed, Mar 20, 2024 at 3:54 PM Tim Allison <talli...@apache.org> wrote:

> Looking at TikaConfig, it looks like the "excluded" parsers are actually
> loaded and initialized, but they are not added to the composite parser if
> they're on the exclude list.
>
> We should try to avoid loading them at all if they are excluded. IIRC,
> this is a bit complex in TikaConfig. Let me take a look...
>
> On Wed, Mar 20, 2024 at 3:25 PM Josh Burchard <burch...@pnp-hcl.com>
> wrote:
>
>> Hi all,
>>
>> I've got Tika 2.9.1 server running on Linux and Tika is checking for the
>> presence of ImageMagick. I tried disabling the TesseractOCR parser in my
>> xml config file, but the check is still happening. I can certainly try
>> changing my request headers to disable it but that's in compiled code and I
>> was hoping to make the xml change as a more immediate workaround.
>>
>> I can reply with my config if interested, but I'm just using exactly
>> what's mentioned in the doc
>> <https://cwiki.apache.org/confluence/display/TIKA/TikaOCR> as far as the
>> <parsers> element is concerned.
>>
>> Josh Burchard
>> HCL Domnio
>>
>

Reply via email to