On Wed, Oct 25, 2023 at 2:39 AM Tim Allison <[email protected]> wrote:
> Sorry for my delay. > Thanks a lot Tim! > > My preliminary conclusion is that the jvm is not able to enforce these > flags 100% of the time quickly enough before the cgroup limits kick in and > the kernel oom kicks in. Did anyone else experience this., > > Y, that's my guess as well. There's a chance that some parsers are > using off heap memory, and that may be causing the problems. If > there's a way that the WatchDog process can monitor actual memory > usage (esp as we move to Java 11 in 3.x), we should do that. As > currently coded, though, we do rely on java and -Xmx for limiting > memory usage. > The flags -XX:MaxRAMPercentage or -XX:MaxRAM should do the trick, but am investigating if they are enforced fast enough before the system OOM kicks in. So far I would say that is not the case. > > I don't think parsers are using native code within > tika-parsers-standard. Some parsers in tika-parsers-extended may use > native code (e.g. sqlite). > > There is no concurrency control in tika-server for the non-pipes > endpoints e.g. /tika, /rmeta. We rely on cxf. If we can set > concurrency via cxf, we should let users configure that. Otherwise, > clients are responsible for limiting dos'ing the server. If we can't > limit concurrency via cxf, I don't think there's a straightforward way > to handle this. > Alright, will do rate limiting upstream. > Y, pipes does limit concurrency. > Feel free to point me to any documentation about this, but as I understand from the docs <https://cwiki.apache.org/confluence/display/TIKA/tika-pipes#tikapipes-tika-app> (under Fetchers in the classic tika-server endpoints), then using Tika server this way would allow one to configure concurrency and provide request isolation? I am mainly interested in a convenient way to process requests in isolation while still treating Tika as a black box and not having to write Java code that would need to be maintained as Tika evolves. > > > I remember seeing somewhere on the list a timeout per request, but > cannot find it now. > You can set timeout less than or equal to the setting in > tika-config.xml (<taskTimeoutMillis/>) via this header: > X-Tika-Timeout-Millis > Understood, thanks. Cristi > > > > > > >> > >> With these settings, the JVM quite often deals well with terminating > processes that hit the memory cap and the watchdog restarts them: > >> > >> [pool-2-thread-1] 21:38:26,395 > org.apache.tika.server.core.TikaServerWatchDog forked process exited with > exit value 137 > >> > >> > >> However, from time to time, the JVM seems to not be able to deal with > it, the OS kicks in and the container is killed with OOM. My only > explanation so far is that the JVM is too slow to kill the forked process > and the memory usage blows up quite quickly. You can see below how the > total-vm values are close to 6GB at OOM time. This does not make sense IMO, > the JVM should kill these processes way before reaching the e.g., 5613608kB > value, actually the forked process should not exceed 1.8GB if we take into > account at MaxRAMPercentage. > >> > >> Another puzzling fact is that the anon + file RSS do not really add up > to the total-vm size, so I am guessing that this is not actually due to > heap. Could this be caused by some native code? > >> > >> > >> dmesg -T | grep "Killed process" > >> > >> [Fri Oct 20 21:14:13 2023] Memory cgroup out of memory: Killed process > 109549 (java) total-vm:5632740kB, anon-rss:1036696kB, file-rss:24668kB, > shmem-rss:0kB, UID:35002 pgtables:2532kB oom_score_adj:-997 > >> [Fri Oct 20 21:14:27 2023] Memory cgroup out of memory: Killed process > 109713 (java) total-vm:5613608kB, anon-rss:1029280kB, file-rss:24380kB, > shmem-rss:0kB, UID:35002 pgtables:2456kB oom_score_adj:-997 > >> [Fri Oct 20 21:14:34 2023] Memory cgroup out of memory: Killed process > 109839 (java) total-vm:5607392kB, anon-rss:976664kB, file-rss:24116kB, > shmem-rss:0kB, UID:35002 pgtables:2336kB oom_score_adj:-997 > >> [Fri Oct 20 21:14:52 2023] Memory cgroup out of memory: Killed process > 109970 (java) total-vm:5598332kB, anon-rss:954312kB, file-rss:24592kB, > shmem-rss:0kB, UID:35002 pgtables:2272kB oom_score_adj:-997 > >> [Fri Oct 20 21:15:19 2023] Memory cgroup out of memory: Killed process > 110089 (java) total-vm:5615776kB, anon-rss:946484kB, file-rss:24672kB, > shmem-rss:0kB, UID:35002 pgtables:2280kB oom_score_adj:-997 > >> [Fri Oct 20 21:15:29 2023] Memory cgroup out of memory: Killed process > 110269 (java) total-vm:5602004kB, anon-rss:948548kB, file-rss:24412kB, > shmem-rss:0kB, UID:35002 pgtables:2280kB oom_score_adj:-997 > >> [Fri Oct 20 21:15:42 2023] Memory cgroup out of memory: Killed process > 110367 (java) total-vm:5607104kB, anon-rss:942636kB, file-rss:24524kB, > shmem-rss:0kB, UID:35002 pgtables:2284kB oom_score_adj:-997 > >> [Fri Oct 20 21:16:07 2023] Memory cgroup out of memory: Killed process > 110464 (java) total-vm:5593792kB, anon-rss:940524kB, file-rss:24712kB, > shmem-rss:0kB, UID:35002 pgtables:2216kB oom_score_adj:-997 > >> [Fri Oct 20 21:16:17 2023] Memory cgroup out of memory: Killed process > 110684 (java) total-vm:5627620kB, anon-rss:910000kB, file-rss:24340kB, > shmem-rss:0kB, UID:35002 pgtables:2224kB oom_score_adj:-997 > >> [Fri Oct 20 21:16:25 2023] Memory cgroup out of memory: Killed process > 110798 (java) total-vm:5616588kB, anon-rss:889436kB, file-rss:24500kB, > shmem-rss:0kB, UID:35002 pgtables:2216kB oom_score_adj:-997 > >> [Fri Oct 20 21:16:31 2023] Memory cgroup out of memory: Killed process > 110939 (java) total-vm:5619708kB, anon-rss:839724kB, file-rss:23796kB, > shmem-rss:0kB, UID:35002 pgtables:2100kB oom_score_adj:-997 > >> [Fri Oct 20 21:16:43 2023] Memory cgroup out of memory: Killed process > 111042 (java) total-vm:5601976kB, anon-rss:807116kB, file-rss:24420kB, > shmem-rss:0kB, UID:35002 pgtables:2000kB oom_score_adj:-997 > >> [Fri Oct 20 21:17:03 2023] Memory cgroup out of memory: Killed process > 111165 (java) total-vm:5599008kB, anon-rss:792704kB, file-rss:24724kB, > shmem-rss:0kB, UID:35002 pgtables:1944kB oom_score_adj:-997 > >> [Fri Oct 20 21:17:09 2023] Memory cgroup out of memory: Killed process > 111317 (java) total-vm:5612224kB, anon-rss:767304kB, file-rss:24400kB, > shmem-rss:0kB, UID:35002 pgtables:1984kB oom_score_adj:-997 > >> [Fri Oct 20 21:17:16 2023] Memory cgroup out of memory: Killed process > 111427 (java) total-vm:5613572kB, anon-rss:739720kB, file-rss:24196kB, > shmem-rss:0kB, UID:35002 pgtables:1892kB oom_score_adj:-997 > >> [Fri Oct 20 21:17:28 2023] Memory cgroup out of memory: Killed process > 111525 (java) total-vm:5603008kB, anon-rss:737940kB, file-rss:24796kB, > shmem-rss:0kB, UID:35002 pgtables:1860kB oom_score_adj:-997 > >> [Fri Oct 20 21:17:36 2023] Memory cgroup out of memory: Killed process > 111620 (java) total-vm:5602048kB, anon-rss:728384kB, file-rss:24480kB, > shmem-rss:0kB, UID:35002 pgtables:1828kB oom_score_adj:-997 > >> [Fri Oct 20 21:17:43 2023] Memory cgroup out of memory: Killed process > 111711 (java) total-vm:5601984kB, anon-rss:710832kB, file-rss:24648kB, > shmem-rss:0kB, UID:35002 pgtables:1804kB oom_score_adj:-997 > >> [Fri Oct 20 21:17:55 2023] Memory cgroup out of memory: Killed process > 111776 (java) total-vm:5594816kB, anon-rss:709584kB, file-rss:24444kB, > shmem-rss:0kB, UID:35002 pgtables:1824kB oom_score_adj:-997 > >> > >> > >> > >> I guess my question is if I am missing something that explains this and > I could configure tika-server to preempt this issue. > >> > >> > >> Going forward however, I realize that I need to set up the following 3, > and I have a question for each: > >> > >> concurrency control to avoid overwhelming tika-sever (seems like I > could only control concurrency on the sender side since tika server does > not provide a way to limit the number of concurrent request). Is that > correct? > > > > > > AFAIU previously this is not possible except if we move to Tika pipes. I > just wanted to check if that is accurate. > > > >> request isolation to avoid that a single file brings down an entire > instance -> is the only recommended solution to use tika pipes? > > > > > > Is there a plan to implement isolation between requests in Tika > standalone server? > > > >> implement timeouts and memory limits per request, to avoid that a > single request can go haywire and use too much CPU and/or memory -> is > there a way to configure this already and maybe I missed it? > > > > > > I remember seeing somewhere on the list a timeout per request, but > cannot find it now. > > > > > > Thanks! > > Cristi > > > >> > >> Thanks! I realize these are a lot of questions 🙂 > >> > >> Cristi > >> > >> >
