What is your use case? I've done massive crawls of TB of millions of office
docs and it inspired Tika pipes - what you may be looking for.

But I'd like to hear more about the ask here.

On Fri, Feb 24, 2023, 8:04 AM Radim Řehůřek <[email protected]> wrote:

> On Fri, Feb 24, 2023 at 1:20 PM Tim Allison <[email protected]> wrote:
>
>> Can you tell if tika-server is restarting from the logs, and if so,
>> what is the cause of the restarts?
>>
>
> Unfortunately I can't.
>
>
> Tika server is built from Apache CXF and doesn't directly manage
>> threads or concurrency limits.
>>
>
> OK. I dug into the code for Tika server a little and the issue might be
> related to timeouts. The default "taskTimeoutMillis" seems to be 5 minutes
> per task, is that right? Is it known how many "tasks" each request
> comprises, for a total per-request timeout? E.g. 10 tasks per file = 10*5 =
> max 50 minute timeout?
>
> Because when a file is not processed within 1 minute, we kill its
> processing on our end (incl. its Tika request) and process another file. So
> I'm thinking that maybe Tika is munching on some slow requests, for 5
> minutes or more (depending on what "tasks" are involved per file), and we
> keep piling more requests. Depending on what the Apache CFX does, this
> might fill up some internal server thread/process pool or queue or
> something, leading to the observed errors.
>
> Does that sound plausible? Is there a way to set a per-request timeout,
> rather than per-task?
>
> Thanks,
> Radim
>
>
>
>
>>
>> On Fri, Feb 24, 2023 at 6:31 AM Radim Řehůřek <[email protected]>
>> wrote:
>> >
>> > Hi all,
>> >
>> > I use Tika server's /rmeta endpoint (default forked mode, version
>> 2.7.0), but my requests frequently fails with 503 or even "[Errno 111]
>> Connection refused".
>> >
>> > I was thinking maybe I'm hammering the Tika server too hard and should
>> increase its parallelization level, but couldn't find any info on that.
>> >
>> > Does Tika server use threads or processes for concurrency?
>> >
>> > Is there a concurrency limit and if so, how do I increase it?
>> >
>> > Many thanks,
>> > Radim
>> >
>>
>

Reply via email to