Sorry… Sergey Beryozkin

On Wed, Jun 23, 2021 at 6:46 AM Tim Allison <[email protected]> wrote:

> Hi Cristi,
>
>    I regret that I don't have precise answers for these questions.
> tika-server uses Apache cxf and most of your questions are handled at
> that level.  There is no logic in Tika for number of connections,
> identifying contention or even keeping track of the number of parallel
> requests.
>
>    If you're running in --spawnChild mode in 1.x or running in default
> in 2.x, the server can go down and drop connections if a file has
> caused a catastrophic problem (timeout, oom or other crash), but that
> doesn't necessarily mean that CPU will be saturated.
>
>    In practice, I've found that it is better to run multiple
> tika-servers (on different ports?) and have one tika-server per client
> so that you effectively avoid multithreading...this also enables you
> to know which file caused a catastrophic problem.  If you're running
> multiple requests on a single server, and one of the files causes a
> shutdown/restart, you won't know which of the active files caused the
> problem.
>
>    Nicholas DiPiazza has experience with pegging tika-servers.  He
> might be willing to chime in?
>
>    Sergey Beryokin is our cxf expert...he might have better insight on
> the cxf layer.
>
>    The above input applies to the standard /tika, /rmeta endpoints.
> The new pipes /pipes and /async handlers fork multiple sub-processes
> and do the parsing there.  I have not yet experimented with
> overwhelming them in practice/production, but the /async handler at
> least has a return value for "queue is full, please don't send any
> more requests".
>
>      Best,
>
>           Tim
>
> On Tue, Jun 22, 2021 at 3:28 AM Cristian Zamfir <[email protected]>
> wrote:
> >
> > Hello, please let me know if somebody has looked into this or I should
> look at the source code instead? Thanks!
> >
> > On Fri, Jun 18, 2021 at 5:04 PM Cristian Zamfir <[email protected]>
> wrote:
> >>
> >> Hi,
> >>
> >> I have a few questions about the concurrency level of tika-server in
> the default configuration:
> >> - how many connections will it accept before not accepting new
> connections?
> >> - how many files can be scanned in parallel?
> >> - what is the return code to expect when there is contention on the
> server?
> >> - is it a safe assumption that for connections to be dropped, CPU will
> be saturated?
> >>
> >> Thanks,
> >> Cristi
> >>
>

Reply via email to