On Fri, Feb 24, 2023 at 1:20 PM Tim Allison <[email protected]> wrote:

> Can you tell if tika-server is restarting from the logs, and if so,
> what is the cause of the restarts?
>

Unfortunately I can't.


Tika server is built from Apache CXF and doesn't directly manage
> threads or concurrency limits.
>

OK. I dug into the code for Tika server a little and the issue might be
related to timeouts. The default "taskTimeoutMillis" seems to be 5 minutes
per task, is that right? Is it known how many "tasks" each request
comprises, for a total per-request timeout? E.g. 10 tasks per file = 10*5 =
max 50 minute timeout?

Because when a file is not processed within 1 minute, we kill its
processing on our end (incl. its Tika request) and process another file. So
I'm thinking that maybe Tika is munching on some slow requests, for 5
minutes or more (depending on what "tasks" are involved per file), and we
keep piling more requests. Depending on what the Apache CFX does, this
might fill up some internal server thread/process pool or queue or
something, leading to the observed errors.

Does that sound plausible? Is there a way to set a per-request timeout,
rather than per-task?

Thanks,
Radim




>
> On Fri, Feb 24, 2023 at 6:31 AM Radim Řehůřek <[email protected]> wrote:
> >
> > Hi all,
> >
> > I use Tika server's /rmeta endpoint (default forked mode, version
> 2.7.0), but my requests frequently fails with 503 or even "[Errno 111]
> Connection refused".
> >
> > I was thinking maybe I'm hammering the Tika server too hard and should
> increase its parallelization level, but couldn't find any info on that.
> >
> > Does Tika server use threads or processes for concurrency?
> >
> > Is there a concurrency limit and if so, how do I increase it?
> >
> > Many thanks,
> > Radim
> >
>

Reply via email to