Sorry… Sergey Beryozkin On Wed, Jun 23, 2021 at 6:46 AM Tim Allison <[email protected]> wrote:
> Hi Cristi, > > I regret that I don't have precise answers for these questions. > tika-server uses Apache cxf and most of your questions are handled at > that level. There is no logic in Tika for number of connections, > identifying contention or even keeping track of the number of parallel > requests. > > If you're running in --spawnChild mode in 1.x or running in default > in 2.x, the server can go down and drop connections if a file has > caused a catastrophic problem (timeout, oom or other crash), but that > doesn't necessarily mean that CPU will be saturated. > > In practice, I've found that it is better to run multiple > tika-servers (on different ports?) and have one tika-server per client > so that you effectively avoid multithreading...this also enables you > to know which file caused a catastrophic problem. If you're running > multiple requests on a single server, and one of the files causes a > shutdown/restart, you won't know which of the active files caused the > problem. > > Nicholas DiPiazza has experience with pegging tika-servers. He > might be willing to chime in? > > Sergey Beryokin is our cxf expert...he might have better insight on > the cxf layer. > > The above input applies to the standard /tika, /rmeta endpoints. > The new pipes /pipes and /async handlers fork multiple sub-processes > and do the parsing there. I have not yet experimented with > overwhelming them in practice/production, but the /async handler at > least has a return value for "queue is full, please don't send any > more requests". > > Best, > > Tim > > On Tue, Jun 22, 2021 at 3:28 AM Cristian Zamfir <[email protected]> > wrote: > > > > Hello, please let me know if somebody has looked into this or I should > look at the source code instead? Thanks! > > > > On Fri, Jun 18, 2021 at 5:04 PM Cristian Zamfir <[email protected]> > wrote: > >> > >> Hi, > >> > >> I have a few questions about the concurrency level of tika-server in > the default configuration: > >> - how many connections will it accept before not accepting new > connections? > >> - how many files can be scanned in parallel? > >> - what is the return code to expect when there is contention on the > server? > >> - is it a safe assumption that for connections to be dropped, CPU will > be saturated? > >> > >> Thanks, > >> Cristi > >> >
