Re: Tika OOM issue

Cristian Zamfir Tue, 24 Oct 2023 11:05:36 -0700

Hi,

Just a kind ping on this ticket. I am adding a few clarifications inline.



On 21 Oct 2023 at 00:10:34, Cristian Zamfir <[email protected]> wrote:

> Hello!
>
> I have been using the tika docker image pretty much out of the box so far
> and I am puzzled by an OOM issue that has been going on for a while now:
> despite quite conservative memory limits given to the JVM in terms of both
> heap and total max memory, containers still crash with OOM.
> These are the settings I am using inside containers capped at 6GB of
> memory using tika server with the tika watchdog config:
>
>
>       <forkedJvmArgs>
>         <arg>-Xmx3g</arg>
>           <arg>-Dlog4j.configurationFile=log4j2.xml</arg>
>               <arg>-XX:+UseContainerSupport</arg>
>         <arg>-XX:+UnlockExperimentalVMOptions</arg>
>         <arg>-XX:MaxRAMPercentage=30</arg>
>       </forkedJvmArgs>
>
>
>
My preliminary conclusion is that the jvm is not able to enforce these
flags 100% of the time quickly enough before the cgroup limits kick in and
the kernel oom kicks in. Did anyone else experience this.,


> With these settings, the JVM quite often deals well with terminating
> processes that hit the memory cap and the watchdog restarts them:
>
> [pool-2-thread-1] 21:38:26,395 org.apache.tika.server.core.TikaServerWatchDog 
> forked process exited with exit value 137
>
>
> However, from time to time,  the JVM seems to not be able to deal with it,
> the OS kicks in and the container is killed with OOM. My only explanation
> so far is that the JVM is too slow to kill the forked process and the
> memory usage blows up quite quickly. You can see below how the total-vm
> values are close to 6GB at OOM time. This does not make sense IMO, the JVM
> should kill these processes way before reaching the e.g., 5613608kB value,
> actually the forked process should not exceed 1.8GB if we take into account
> at MaxRAMPercentage.
>
> Another puzzling fact is that the anon + file RSS do not really add up to
> the total-vm size, so I am guessing that this is not actually due to heap.
> Could this be caused by some native code?
>
>
> dmesg -T | grep "Killed process"
>
> [Fri Oct 20 21:14:13 2023] Memory cgroup out of memory: Killed process 109549 
> (java) total-vm:5632740kB, anon-rss:1036696kB, file-rss:24668kB, 
> shmem-rss:0kB, UID:35002 pgtables:2532kB oom_score_adj:-997
> [Fri Oct 20 21:14:27 2023] Memory cgroup out of memory: Killed process 109713 
> (java) total-vm:5613608kB, anon-rss:1029280kB, file-rss:24380kB, 
> shmem-rss:0kB, UID:35002 pgtables:2456kB oom_score_adj:-997
> [Fri Oct 20 21:14:34 2023] Memory cgroup out of memory: Killed process 109839 
> (java) total-vm:5607392kB, anon-rss:976664kB, file-rss:24116kB, 
> shmem-rss:0kB, UID:35002 pgtables:2336kB oom_score_adj:-997
> [Fri Oct 20 21:14:52 2023] Memory cgroup out of memory: Killed process 109970 
> (java) total-vm:5598332kB, anon-rss:954312kB, file-rss:24592kB, 
> shmem-rss:0kB, UID:35002 pgtables:2272kB oom_score_adj:-997
> [Fri Oct 20 21:15:19 2023] Memory cgroup out of memory: Killed process 110089 
> (java) total-vm:5615776kB, anon-rss:946484kB, file-rss:24672kB, 
> shmem-rss:0kB, UID:35002 pgtables:2280kB oom_score_adj:-997
> [Fri Oct 20 21:15:29 2023] Memory cgroup out of memory: Killed process 110269 
> (java) total-vm:5602004kB, anon-rss:948548kB, file-rss:24412kB, 
> shmem-rss:0kB, UID:35002 pgtables:2280kB oom_score_adj:-997
> [Fri Oct 20 21:15:42 2023] Memory cgroup out of memory: Killed process 110367 
> (java) total-vm:5607104kB, anon-rss:942636kB, file-rss:24524kB, 
> shmem-rss:0kB, UID:35002 pgtables:2284kB oom_score_adj:-997
> [Fri Oct 20 21:16:07 2023] Memory cgroup out of memory: Killed process 110464 
> (java) total-vm:5593792kB, anon-rss:940524kB, file-rss:24712kB, 
> shmem-rss:0kB, UID:35002 pgtables:2216kB oom_score_adj:-997
> [Fri Oct 20 21:16:17 2023] Memory cgroup out of memory: Killed process 110684 
> (java) total-vm:5627620kB, anon-rss:910000kB, file-rss:24340kB, 
> shmem-rss:0kB, UID:35002 pgtables:2224kB oom_score_adj:-997
> [Fri Oct 20 21:16:25 2023] Memory cgroup out of memory: Killed process 110798 
> (java) total-vm:5616588kB, anon-rss:889436kB, file-rss:24500kB, 
> shmem-rss:0kB, UID:35002 pgtables:2216kB oom_score_adj:-997
> [Fri Oct 20 21:16:31 2023] Memory cgroup out of memory: Killed process 110939 
> (java) total-vm:5619708kB, anon-rss:839724kB, file-rss:23796kB, 
> shmem-rss:0kB, UID:35002 pgtables:2100kB oom_score_adj:-997
> [Fri Oct 20 21:16:43 2023] Memory cgroup out of memory: Killed process 111042 
> (java) total-vm:5601976kB, anon-rss:807116kB, file-rss:24420kB, 
> shmem-rss:0kB, UID:35002 pgtables:2000kB oom_score_adj:-997
> [Fri Oct 20 21:17:03 2023] Memory cgroup out of memory: Killed process 111165 
> (java) total-vm:5599008kB, anon-rss:792704kB, file-rss:24724kB, 
> shmem-rss:0kB, UID:35002 pgtables:1944kB oom_score_adj:-997
> [Fri Oct 20 21:17:09 2023] Memory cgroup out of memory: Killed process 111317 
> (java) total-vm:5612224kB, anon-rss:767304kB, file-rss:24400kB, 
> shmem-rss:0kB, UID:35002 pgtables:1984kB oom_score_adj:-997
> [Fri Oct 20 21:17:16 2023] Memory cgroup out of memory: Killed process 111427 
> (java) total-vm:5613572kB, anon-rss:739720kB, file-rss:24196kB, 
> shmem-rss:0kB, UID:35002 pgtables:1892kB oom_score_adj:-997
> [Fri Oct 20 21:17:28 2023] Memory cgroup out of memory: Killed process 111525 
> (java) total-vm:5603008kB, anon-rss:737940kB, file-rss:24796kB, 
> shmem-rss:0kB, UID:35002 pgtables:1860kB oom_score_adj:-997
> [Fri Oct 20 21:17:36 2023] Memory cgroup out of memory: Killed process 111620 
> (java) total-vm:5602048kB, anon-rss:728384kB, file-rss:24480kB, 
> shmem-rss:0kB, UID:35002 pgtables:1828kB oom_score_adj:-997
> [Fri Oct 20 21:17:43 2023] Memory cgroup out of memory: Killed process 111711 
> (java) total-vm:5601984kB, anon-rss:710832kB, file-rss:24648kB, 
> shmem-rss:0kB, UID:35002 pgtables:1804kB oom_score_adj:-997
> [Fri Oct 20 21:17:55 2023] Memory cgroup out of memory: Killed process 111776 
> (java) total-vm:5594816kB, anon-rss:709584kB, file-rss:24444kB, 
> shmem-rss:0kB, UID:35002 pgtables:1824kB oom_score_adj:-997
>
>
>
> I guess my question is if I am missing something that explains this and I
> could configure tika-server to preempt this issue.
>
>
> Going forward however, I realize that I need to set up the following 3,
> and I have a question for each:
>
>    1. concurrency control to avoid overwhelming tika-sever (seems like I
>    could only control concurrency on the sender side since tika server does
>    not provide a way to limit the number of concurrent request). Is that
>    correct?
>
>
AFAIU previously this is not possible except if we move to Tika pipes. I
just wanted to check if that is accurate.


>    1. request isolation to avoid that a single file brings down an entire
>    instance -> is the only recommended solution to use tika pipes?
>
>
Is there a plan to implement isolation between requests in Tika standalone
server?


>    1. implement timeouts and memory limits per request, to avoid that a
>    single request can go haywire and use too much CPU and/or memory -> is
>    there a way to configure this already and maybe I missed it?
>
>
I remember seeing somewhere on the list a timeout per request, but cannot
find it now.


Thanks!
Cristi


> Thanks! I realize these are a lot of questions 🙂
>
> Cristi
>
>
>

Re: Tika OOM issue

Reply via email to