Re: Tika OOM issue

Cristian Zamfir Wed, 25 Oct 2023 05:37:35 -0700

On Wed, Oct 25, 2023 at 2:39 AM Tim Allison <[email protected]> wrote:


> Sorry for my delay.
>

Thanks a lot Tim!


> > My preliminary conclusion is that the jvm is not able to enforce these
> flags 100% of the time quickly enough before the cgroup limits kick in and
> the kernel oom kicks in. Did anyone else experience this.,
>
> Y, that's my guess as well.  There's a chance that some parsers are
> using off heap memory, and that may be causing the problems. If
> there's a way that the WatchDog process can monitor actual memory
> usage (esp as we move to Java 11 in 3.x), we should do that. As
> currently coded, though, we do rely on java and -Xmx for limiting
> memory usage.
>

The flags  -XX:MaxRAMPercentage or -XX:MaxRAM should do the trick, but am
investigating if they are enforced fast enough before the system OOM kicks
in. So far I would say that is not the case.


>
> I don't think parsers are using native code within
> tika-parsers-standard. Some parsers in tika-parsers-extended may use
> native code (e.g. sqlite).
>
> There is no concurrency control in tika-server for the non-pipes
> endpoints e.g. /tika, /rmeta. We rely on cxf. If we can set
> concurrency via cxf, we should let users configure that. Otherwise,
> clients are responsible for limiting dos'ing the server. If we can't
> limit concurrency via cxf, I don't think there's a straightforward way
> to handle this.
>

Alright, will do rate limiting upstream.


> Y, pipes does limit concurrency.
>

Feel free to point me to any documentation about this, but as I understand
from the docs
<https://cwiki.apache.org/confluence/display/TIKA/tika-pipes#tikapipes-tika-app>
(under Fetchers in the classic tika-server endpoints), then using Tika
server this way would allow one to configure concurrency and provide
request isolation? I am mainly interested in a convenient way to process
requests in isolation while still treating Tika as a black box and not
having to write Java code that would need to be maintained as Tika evolves.


>
> > I remember seeing somewhere on the list a timeout per request, but
> cannot find it now.
> You can set timeout less than or equal to the setting in
> tika-config.xml (<taskTimeoutMillis/>) via this header:
> X-Tika-Timeout-Millis
>

Understood, thanks.

Cristi


>
>
>
> >
> >>
> >> With these settings, the JVM quite often deals well with terminating
> processes that hit the memory cap and the watchdog restarts them:
> >>
> >> [pool-2-thread-1] 21:38:26,395
> org.apache.tika.server.core.TikaServerWatchDog forked process exited with
> exit value 137
> >>
> >>
> >> However, from time to time,  the JVM seems to not be able to deal with
> it, the OS kicks in and the container is killed with OOM. My only
> explanation so far is that the JVM is too slow to kill the forked process
> and the memory usage blows up quite quickly. You can see below how the
> total-vm values are close to 6GB at OOM time. This does not make sense IMO,
> the JVM should kill these processes way before reaching the e.g., 5613608kB
> value, actually the forked process should not exceed 1.8GB if we take into
> account at MaxRAMPercentage.
> >>
> >> Another puzzling fact is that the anon + file RSS do not really add up
> to the total-vm size, so I am guessing that this is not actually due to
> heap. Could this be caused by some native code?
> >>
> >>
> >> dmesg -T | grep "Killed process"
> >>
> >> [Fri Oct 20 21:14:13 2023] Memory cgroup out of memory: Killed process
> 109549 (java) total-vm:5632740kB, anon-rss:1036696kB, file-rss:24668kB,
> shmem-rss:0kB, UID:35002 pgtables:2532kB oom_score_adj:-997
> >> [Fri Oct 20 21:14:27 2023] Memory cgroup out of memory: Killed process
> 109713 (java) total-vm:5613608kB, anon-rss:1029280kB, file-rss:24380kB,
> shmem-rss:0kB, UID:35002 pgtables:2456kB oom_score_adj:-997
> >> [Fri Oct 20 21:14:34 2023] Memory cgroup out of memory: Killed process
> 109839 (java) total-vm:5607392kB, anon-rss:976664kB, file-rss:24116kB,
> shmem-rss:0kB, UID:35002 pgtables:2336kB oom_score_adj:-997
> >> [Fri Oct 20 21:14:52 2023] Memory cgroup out of memory: Killed process
> 109970 (java) total-vm:5598332kB, anon-rss:954312kB, file-rss:24592kB,
> shmem-rss:0kB, UID:35002 pgtables:2272kB oom_score_adj:-997
> >> [Fri Oct 20 21:15:19 2023] Memory cgroup out of memory: Killed process
> 110089 (java) total-vm:5615776kB, anon-rss:946484kB, file-rss:24672kB,
> shmem-rss:0kB, UID:35002 pgtables:2280kB oom_score_adj:-997
> >> [Fri Oct 20 21:15:29 2023] Memory cgroup out of memory: Killed process
> 110269 (java) total-vm:5602004kB, anon-rss:948548kB, file-rss:24412kB,
> shmem-rss:0kB, UID:35002 pgtables:2280kB oom_score_adj:-997
> >> [Fri Oct 20 21:15:42 2023] Memory cgroup out of memory: Killed process
> 110367 (java) total-vm:5607104kB, anon-rss:942636kB, file-rss:24524kB,
> shmem-rss:0kB, UID:35002 pgtables:2284kB oom_score_adj:-997
> >> [Fri Oct 20 21:16:07 2023] Memory cgroup out of memory: Killed process
> 110464 (java) total-vm:5593792kB, anon-rss:940524kB, file-rss:24712kB,
> shmem-rss:0kB, UID:35002 pgtables:2216kB oom_score_adj:-997
> >> [Fri Oct 20 21:16:17 2023] Memory cgroup out of memory: Killed process
> 110684 (java) total-vm:5627620kB, anon-rss:910000kB, file-rss:24340kB,
> shmem-rss:0kB, UID:35002 pgtables:2224kB oom_score_adj:-997
> >> [Fri Oct 20 21:16:25 2023] Memory cgroup out of memory: Killed process
> 110798 (java) total-vm:5616588kB, anon-rss:889436kB, file-rss:24500kB,
> shmem-rss:0kB, UID:35002 pgtables:2216kB oom_score_adj:-997
> >> [Fri Oct 20 21:16:31 2023] Memory cgroup out of memory: Killed process
> 110939 (java) total-vm:5619708kB, anon-rss:839724kB, file-rss:23796kB,
> shmem-rss:0kB, UID:35002 pgtables:2100kB oom_score_adj:-997
> >> [Fri Oct 20 21:16:43 2023] Memory cgroup out of memory: Killed process
> 111042 (java) total-vm:5601976kB, anon-rss:807116kB, file-rss:24420kB,
> shmem-rss:0kB, UID:35002 pgtables:2000kB oom_score_adj:-997
> >> [Fri Oct 20 21:17:03 2023] Memory cgroup out of memory: Killed process
> 111165 (java) total-vm:5599008kB, anon-rss:792704kB, file-rss:24724kB,
> shmem-rss:0kB, UID:35002 pgtables:1944kB oom_score_adj:-997
> >> [Fri Oct 20 21:17:09 2023] Memory cgroup out of memory: Killed process
> 111317 (java) total-vm:5612224kB, anon-rss:767304kB, file-rss:24400kB,
> shmem-rss:0kB, UID:35002 pgtables:1984kB oom_score_adj:-997
> >> [Fri Oct 20 21:17:16 2023] Memory cgroup out of memory: Killed process
> 111427 (java) total-vm:5613572kB, anon-rss:739720kB, file-rss:24196kB,
> shmem-rss:0kB, UID:35002 pgtables:1892kB oom_score_adj:-997
> >> [Fri Oct 20 21:17:28 2023] Memory cgroup out of memory: Killed process
> 111525 (java) total-vm:5603008kB, anon-rss:737940kB, file-rss:24796kB,
> shmem-rss:0kB, UID:35002 pgtables:1860kB oom_score_adj:-997
> >> [Fri Oct 20 21:17:36 2023] Memory cgroup out of memory: Killed process
> 111620 (java) total-vm:5602048kB, anon-rss:728384kB, file-rss:24480kB,
> shmem-rss:0kB, UID:35002 pgtables:1828kB oom_score_adj:-997
> >> [Fri Oct 20 21:17:43 2023] Memory cgroup out of memory: Killed process
> 111711 (java) total-vm:5601984kB, anon-rss:710832kB, file-rss:24648kB,
> shmem-rss:0kB, UID:35002 pgtables:1804kB oom_score_adj:-997
> >> [Fri Oct 20 21:17:55 2023] Memory cgroup out of memory: Killed process
> 111776 (java) total-vm:5594816kB, anon-rss:709584kB, file-rss:24444kB,
> shmem-rss:0kB, UID:35002 pgtables:1824kB oom_score_adj:-997
> >>
> >>
> >>
> >> I guess my question is if I am missing something that explains this and
> I could configure tika-server to preempt this issue.
> >>
> >>
> >> Going forward however, I realize that I need to set up the following 3,
> and I have a question for each:
> >>
> >> concurrency control to avoid overwhelming tika-sever (seems like I
> could only control concurrency on the sender side since tika server does
> not provide a way to limit the number of concurrent request). Is that
> correct?
> >
> >
> > AFAIU previously this is not possible except if we move to Tika pipes. I
> just wanted to check if that is accurate.
> >
> >> request isolation to avoid that a single file brings down an entire
> instance -> is the only recommended solution to use tika pipes?
> >
> >
> > Is there a plan to implement isolation between requests in Tika
> standalone server?
> >
> >> implement timeouts and memory limits per request, to avoid that a
> single request can go haywire and use too much CPU and/or memory -> is
> there a way to configure this already and maybe I missed it?
> >
> >
> > I remember seeing somewhere on the list a timeout per request, but
> cannot find it now.
> >
> >
> > Thanks!
> > Cristi
> >
> >>
> >> Thanks! I realize these are a lot of questions 🙂
> >>
> >> Cristi
> >>
> >>
>

Re: Tika OOM issue

Reply via email to