Re: Tika OOM issue

Tim Allison Tue, 24 Oct 2023 17:40:02 -0700

Sorry for my delay.

> My preliminary conclusion is that the jvm is not able to enforce these flags 
> 100% of the time quickly enough before the cgroup limits kick in and the 
> kernel oom kicks in. Did anyone else experience this.,


Y, that's my guess as well.  There's a chance that some parsers are
using off heap memory, and that may be causing the problems. If
there's a way that the WatchDog process can monitor actual memory
usage (esp as we move to Java 11 in 3.x), we should do that. As
currently coded, though, we do rely on java and -Xmx for limiting
memory usage.

I don't think parsers are using native code within
tika-parsers-standard. Some parsers in tika-parsers-extended may use
native code (e.g. sqlite).

There is no concurrency control in tika-server for the non-pipes
endpoints e.g. /tika, /rmeta. We rely on cxf. If we can set
concurrency via cxf, we should let users configure that. Otherwise,
clients are responsible for limiting dos'ing the server. If we can't
limit concurrency via cxf, I don't think there's a straightforward way
to handle this.

Y, pipes does limit concurrency.

> I remember seeing somewhere on the list a timeout per request, but cannot 
> find it now.
You can set timeout less than or equal to the setting in
tika-config.xml (<taskTimeoutMillis/>) via this header:
X-Tika-Timeout-Millis



>
>>
>> With these settings, the JVM quite often deals well with terminating 
>> processes that hit the memory cap and the watchdog restarts them:
>>
>> [pool-2-thread-1] 21:38:26,395 
>> org.apache.tika.server.core.TikaServerWatchDog forked process exited with 
>> exit value 137
>>
>>
>> However, from time to time,  the JVM seems to not be able to deal with it, 
>> the OS kicks in and the container is killed with OOM. My only explanation so 
>> far is that the JVM is too slow to kill the forked process and the memory 
>> usage blows up quite quickly. You can see below how the total-vm values are 
>> close to 6GB at OOM time. This does not make sense IMO, the JVM should kill 
>> these processes way before reaching the e.g., 5613608kB value, actually the 
>> forked process should not exceed 1.8GB if we take into account at 
>> MaxRAMPercentage.
>>
>> Another puzzling fact is that the anon + file RSS do not really add up to 
>> the total-vm size, so I am guessing that this is not actually due to heap. 
>> Could this be caused by some native code?
>>
>>
>> dmesg -T | grep "Killed process"
>>
>> [Fri Oct 20 21:14:13 2023] Memory cgroup out of memory: Killed process 
>> 109549 (java) total-vm:5632740kB, anon-rss:1036696kB, file-rss:24668kB, 
>> shmem-rss:0kB, UID:35002 pgtables:2532kB oom_score_adj:-997
>> [Fri Oct 20 21:14:27 2023] Memory cgroup out of memory: Killed process 
>> 109713 (java) total-vm:5613608kB, anon-rss:1029280kB, file-rss:24380kB, 
>> shmem-rss:0kB, UID:35002 pgtables:2456kB oom_score_adj:-997
>> [Fri Oct 20 21:14:34 2023] Memory cgroup out of memory: Killed process 
>> 109839 (java) total-vm:5607392kB, anon-rss:976664kB, file-rss:24116kB, 
>> shmem-rss:0kB, UID:35002 pgtables:2336kB oom_score_adj:-997
>> [Fri Oct 20 21:14:52 2023] Memory cgroup out of memory: Killed process 
>> 109970 (java) total-vm:5598332kB, anon-rss:954312kB, file-rss:24592kB, 
>> shmem-rss:0kB, UID:35002 pgtables:2272kB oom_score_adj:-997
>> [Fri Oct 20 21:15:19 2023] Memory cgroup out of memory: Killed process 
>> 110089 (java) total-vm:5615776kB, anon-rss:946484kB, file-rss:24672kB, 
>> shmem-rss:0kB, UID:35002 pgtables:2280kB oom_score_adj:-997
>> [Fri Oct 20 21:15:29 2023] Memory cgroup out of memory: Killed process 
>> 110269 (java) total-vm:5602004kB, anon-rss:948548kB, file-rss:24412kB, 
>> shmem-rss:0kB, UID:35002 pgtables:2280kB oom_score_adj:-997
>> [Fri Oct 20 21:15:42 2023] Memory cgroup out of memory: Killed process 
>> 110367 (java) total-vm:5607104kB, anon-rss:942636kB, file-rss:24524kB, 
>> shmem-rss:0kB, UID:35002 pgtables:2284kB oom_score_adj:-997
>> [Fri Oct 20 21:16:07 2023] Memory cgroup out of memory: Killed process 
>> 110464 (java) total-vm:5593792kB, anon-rss:940524kB, file-rss:24712kB, 
>> shmem-rss:0kB, UID:35002 pgtables:2216kB oom_score_adj:-997
>> [Fri Oct 20 21:16:17 2023] Memory cgroup out of memory: Killed process 
>> 110684 (java) total-vm:5627620kB, anon-rss:910000kB, file-rss:24340kB, 
>> shmem-rss:0kB, UID:35002 pgtables:2224kB oom_score_adj:-997
>> [Fri Oct 20 21:16:25 2023] Memory cgroup out of memory: Killed process 
>> 110798 (java) total-vm:5616588kB, anon-rss:889436kB, file-rss:24500kB, 
>> shmem-rss:0kB, UID:35002 pgtables:2216kB oom_score_adj:-997
>> [Fri Oct 20 21:16:31 2023] Memory cgroup out of memory: Killed process 
>> 110939 (java) total-vm:5619708kB, anon-rss:839724kB, file-rss:23796kB, 
>> shmem-rss:0kB, UID:35002 pgtables:2100kB oom_score_adj:-997
>> [Fri Oct 20 21:16:43 2023] Memory cgroup out of memory: Killed process 
>> 111042 (java) total-vm:5601976kB, anon-rss:807116kB, file-rss:24420kB, 
>> shmem-rss:0kB, UID:35002 pgtables:2000kB oom_score_adj:-997
>> [Fri Oct 20 21:17:03 2023] Memory cgroup out of memory: Killed process 
>> 111165 (java) total-vm:5599008kB, anon-rss:792704kB, file-rss:24724kB, 
>> shmem-rss:0kB, UID:35002 pgtables:1944kB oom_score_adj:-997
>> [Fri Oct 20 21:17:09 2023] Memory cgroup out of memory: Killed process 
>> 111317 (java) total-vm:5612224kB, anon-rss:767304kB, file-rss:24400kB, 
>> shmem-rss:0kB, UID:35002 pgtables:1984kB oom_score_adj:-997
>> [Fri Oct 20 21:17:16 2023] Memory cgroup out of memory: Killed process 
>> 111427 (java) total-vm:5613572kB, anon-rss:739720kB, file-rss:24196kB, 
>> shmem-rss:0kB, UID:35002 pgtables:1892kB oom_score_adj:-997
>> [Fri Oct 20 21:17:28 2023] Memory cgroup out of memory: Killed process 
>> 111525 (java) total-vm:5603008kB, anon-rss:737940kB, file-rss:24796kB, 
>> shmem-rss:0kB, UID:35002 pgtables:1860kB oom_score_adj:-997
>> [Fri Oct 20 21:17:36 2023] Memory cgroup out of memory: Killed process 
>> 111620 (java) total-vm:5602048kB, anon-rss:728384kB, file-rss:24480kB, 
>> shmem-rss:0kB, UID:35002 pgtables:1828kB oom_score_adj:-997
>> [Fri Oct 20 21:17:43 2023] Memory cgroup out of memory: Killed process 
>> 111711 (java) total-vm:5601984kB, anon-rss:710832kB, file-rss:24648kB, 
>> shmem-rss:0kB, UID:35002 pgtables:1804kB oom_score_adj:-997
>> [Fri Oct 20 21:17:55 2023] Memory cgroup out of memory: Killed process 
>> 111776 (java) total-vm:5594816kB, anon-rss:709584kB, file-rss:24444kB, 
>> shmem-rss:0kB, UID:35002 pgtables:1824kB oom_score_adj:-997
>>
>>
>>
>> I guess my question is if I am missing something that explains this and I 
>> could configure tika-server to preempt this issue.
>>
>>
>> Going forward however, I realize that I need to set up the following 3, and 
>> I have a question for each:
>>
>> concurrency control to avoid overwhelming tika-sever (seems like I could 
>> only control concurrency on the sender side since tika server does not 
>> provide a way to limit the number of concurrent request). Is that correct?
>
>
> AFAIU previously this is not possible except if we move to Tika pipes. I just 
> wanted to check if that is accurate.
>
>> request isolation to avoid that a single file brings down an entire instance 
>> -> is the only recommended solution to use tika pipes?
>
>
> Is there a plan to implement isolation between requests in Tika standalone 
> server?
>
>> implement timeouts and memory limits per request, to avoid that a single 
>> request can go haywire and use too much CPU and/or memory -> is there a way 
>> to configure this already and maybe I missed it?
>
>
> I remember seeing somewhere on the list a timeout per request, but cannot 
> find it now.
>
>
> Thanks!
> Cristi
>
>>
>> Thanks! I realize these are a lot of questions 🙂
>>
>> Cristi
>>
>>

Re: Tika OOM issue

Reply via email to