What is your use case? I've done massive crawls of TB of millions of office docs and it inspired Tika pipes - what you may be looking for.
But I'd like to hear more about the ask here. On Fri, Feb 24, 2023, 8:04 AM Radim Řehůřek <[email protected]> wrote: > On Fri, Feb 24, 2023 at 1:20 PM Tim Allison <[email protected]> wrote: > >> Can you tell if tika-server is restarting from the logs, and if so, >> what is the cause of the restarts? >> > > Unfortunately I can't. > > > Tika server is built from Apache CXF and doesn't directly manage >> threads or concurrency limits. >> > > OK. I dug into the code for Tika server a little and the issue might be > related to timeouts. The default "taskTimeoutMillis" seems to be 5 minutes > per task, is that right? Is it known how many "tasks" each request > comprises, for a total per-request timeout? E.g. 10 tasks per file = 10*5 = > max 50 minute timeout? > > Because when a file is not processed within 1 minute, we kill its > processing on our end (incl. its Tika request) and process another file. So > I'm thinking that maybe Tika is munching on some slow requests, for 5 > minutes or more (depending on what "tasks" are involved per file), and we > keep piling more requests. Depending on what the Apache CFX does, this > might fill up some internal server thread/process pool or queue or > something, leading to the observed errors. > > Does that sound plausible? Is there a way to set a per-request timeout, > rather than per-task? > > Thanks, > Radim > > > > >> >> On Fri, Feb 24, 2023 at 6:31 AM Radim Řehůřek <[email protected]> >> wrote: >> > >> > Hi all, >> > >> > I use Tika server's /rmeta endpoint (default forked mode, version >> 2.7.0), but my requests frequently fails with 503 or even "[Errno 111] >> Connection refused". >> > >> > I was thinking maybe I'm hammering the Tika server too hard and should >> increase its parallelization level, but couldn't find any info on that. >> > >> > Does Tika server use threads or processes for concurrency? >> > >> > Is there a concurrency limit and if so, how do I increase it? >> > >> > Many thanks, >> > Radim >> > >> >
