Re: best practices for avoiding OOM for tika docker

2021-06-10 Thread Cristian Zamfir
Actually maybe this is related: java -jar ./tika-server-standard-2.0.0-BETA.jar SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Tika 1.x

Re: best practices for avoiding OOM for tika docker

2021-06-10 Thread Cristian Zamfir
I see, thanks. Has TIKA_CHILD_JVM_OPTS=-JXmx been replaced by the configuration option forkedJvmArgs or do they still both work? Guessing that it is fully replaced. When I switched to a config file for the server I noticed that some of the options I can see in the github repo do not seem to work.

Re: best practices for avoiding OOM for tika docker

2021-06-10 Thread Tim Allison
I just updated the wiki. I haven't put in an anchor yet, but see: https://cwiki.apache.org/confluence/display/TIKA/TikaServer and search for 'status' at the bottom of the page. Please let us know if you have any questions. Best, Tim On Thu, Jun 10, 2021 at 11:22 AM Cristian Zamf

Re: best practices for avoiding OOM for tika docker

2021-06-10 Thread Cristian Zamfir
It appears that the -status option was dropped in 2.x - was it replaced by something else? Thanks, Cristi On Wed, Jun 2, 2021 at 4:54 PM Tim Allison wrote: > >I wanted to double check that -JXX:+ExitOnOutOfMemoryError should be > provided to the main process or to the child, can you please con

Fwd: best practices for avoiding OOM for tika docker

2021-06-02 Thread Tim Allison
dropped cc... >I noticed that Tika prints in the logs OOM (null), but seems to recover by >itself even when not using -spawnChild. Is this the expected behavior? When not in -spawnChild mode, Tika is catching OOM exceptions (when it can), but it isn't "recovering"... the jvm may be in an inconsi

Re: best practices for avoiding OOM for tika docker

2021-06-02 Thread Nick Burch
On Wed, 2 Jun 2021, Cristian Zamfir wrote: 1. Do you have a recommendation for a stress test that would allow me to easily test OOM behavior? Depends what kind of OOM you're interested in. If you fire a lot of memory-hungry documents at a single server at once, you can trigger an OOM. Alterna

Re: best practices for avoiding OOM for tika docker

2021-06-02 Thread Cristian Zamfir
Hi! I noticed that Tika prints in the logs OOM (null), but seems to recover by itself even when not using -spawnChild. Is this the expected behavior? I am trying to figure out when logs containing "OOM" are critical and would require a container restart. I also wanted to bring up two of my questi

Re: best practices for avoiding OOM for tika docker

2021-05-29 Thread Cristian Zamfir
> On 28 May 2021, at 19:03, Tim Allison wrote: > > Tika 2.x should help with this in pipes and async. Your system should > expect to go oom or crash at some point if you're processing enough > files. I believe that this is what is happening in my case, it’s not due to a single file, it happe

Re: best practices for avoiding OOM for tika docker

2021-05-28 Thread Tim Allison
Tika 2.x should help with this in pipes and async. Your system should expect to go oom or crash at some point if you're processing enough files. Right --spawnChild is not default in 1.x, but it will be in 2.x. And, yes, you should be using it. To set the Xmx in the forked process add -J, as in -

Re: best practices for avoiding OOM for tika docker

2021-05-28 Thread Tim Allison
>Is it reasonable to assume that detect-only is much safer? Safer, yes. Safe, no. At one point, a user reported a feature of the Compressor/Package detector that read a 5 (?) byte file and allocated 2GB (?): https://issues.apache.org/jira/browse/TIKA-2330 If you're only doing detection with the

Re: best practices for avoiding OOM for tika docker

2021-05-28 Thread John Ulric
> For now, the general advice is documented at: https://cwiki.apache.org/confluence/display/TIKA/The+Robustness+of+Apache+Tika General question here: How would you compare the robustness in that respect between callin

Re: best practices for avoiding OOM for tika docker

2021-05-28 Thread Cristian Zamfir
Thanks for your answer Nick! I am running apache/tika:latest-full which is using 1.25. Looks like I need at least version 1.26 for https://issues.apache.org/jira/browse/TIKA-3353, but I am not sure if this is not overkill for implementing basic liveness health checks. It's clear that –spawnChild

Re: best practices for avoiding OOM for tika docker

2021-05-28 Thread Nick Burch
On Thu, 27 May 2021, Cristian Zamfir wrote: I am running some stress tests of the latest tika server docker (not modified in any way, just pulled from the registry) and seeing that after a few hours I see OOM in the logs. The container has a limit of 4GB set in K8S. I am wondering if you have any

Re: best practices for avoiding OOM for tika docker

2021-05-27 Thread Cristian Zamfir
On Thu, May 27, 2021 at 11:31 PM Cristian Zamfir wrote: > Hi, > > I am running some stress tests of the latest tika server docker (not > modified in any way, just pulled from the registry) and seeing that after a > few hours I see OOM in the logs. The container has a limit of 4GB set in > K8S. I

best practices for avoiding OOM for tika docker

2021-05-27 Thread Cristian Zamfir
Hi, I am running some stress tests of the latest tika server docker (not modified in any way, just pulled from the registry) and seeing that after a few hours I see OOM in the logs. The container has a limit of 4GB set in K8S. I am wondering if you have any best practices on how to avoid this. For