More often than not, excessive JVM use is because of indexes with a lot of documents and very large filter caches, that either gets blown at once by auto warm or over time due to near-zero hitrate. With that in mind:
1) How many documents in your index? 2) What are your filter cache settings? (and what are your auto warm settings?) 3) Can you provice a typical query? Preferably copied from solr.log so that we can see all the parameters Worst case for filter cache entries is 1 bit/document in your index, so if you have e.g. 64M documents in your index, an entry will be 8 Megabyte. If your filtercache has a max size of 40,000 and you autowarm just as many, that would fill your (very large) heap when Solr is restarted. - Toke Eskildsen ________________________________________ From: Vignan Malyala <[email protected]> Sent: Monday, May 10, 2021 07:54 To: [email protected]; solr_user lucene_apache Subject: Solr JVM Heap becomes full and stops when we try to restart Hi everyone, We have 3 cluster solr running in 3 different machines with an index size of 300 GB. RAM: 300 GB per node Heap - Xms: 240GB Xmx: 300GB Index size: 300GB GC_TUNE="-XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=6 -XX:ParallelGCThreads=30 -XX:G1ReservePercent=20 <autoCommit> <maxTime>${solr.autoCommit.maxTime:400000}</maxTime> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime> </autoSoftCommit> [image.png] Our cloud servers suddenly stopped yesterday. When we try to restart, our JVM heap size goes to max of 300 GB just in few seconds and we get the following message before stopping automatically. Heap before GC invocations=0 (full 0): garbage-first heap total 251658240K, used 360448K [0x00007eba80000000, 0x00007eba8200f000, 0x00007f0580000000) region size 32768K, 12 young (393216K), 0 survivors (0K) Metaspace used 20504K, capacity 21158K, committed 21248K, reserved 22528K 2021-05-10T05:31:59.511+0000: 3.036: [GC pause (Metadata GC Threshold) (young) (initial-mark) Desired survivor size 805306368 bytes, new threshold 15 (max 15) {Heap before GC invocations=11 (full 0): garbage-first heap total 288849920K, used 20398080K [0x00007eba80000000, 0x00007eba82011378, 0x00007f0580000000) region size 32768K, 440 young (14417920K), 54 survivors (1769472K) Metaspace used 58413K, capacity 61495K, committed 61696K, reserved 63488K 2021-05-10T05:33:15.477+0000: 79.002: [GC pause (G1 Evacuation Pause) (young) Desired survivor size 922746880 bytes, new threshold 1 (max 15) - age 1: 1043976736 bytes, 1043976736 total - age 2: 766998080 bytes, 1810974816 total , 0.4319767 secs] [Parallel Time: 408.3 ms, GC Workers: 30] [GC Worker Start (ms): Min: 79002.5, Avg: 79003.0, Max: 79003.6, Diff: 1.2] [Ext Root Scanning (ms): Min: 0.1, Avg: 0.8, Max: 2.7, Diff: 2.6, Sum: 23.7] [Update RS (ms): Min: 0.0, Avg: 1.7, Max: 3.1, Diff: 3.1, Sum: 51.7] [Processed Buffers: Min: 0, Avg: 3.8, Max: 17, Diff: 17, Sum: 113] [Scan RS (ms): Min: 13.9, Avg: 15.8, Max: 16.7, Diff: 2.8, Sum: 474.0] [Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 2.1, Diff: 2.1, Sum: 4.3] [Object Copy (ms): Min: 385.5, Avg: 387.5, Max: 390.6, Diff: 5.1, Sum: 11624.2] [Termination (ms): Min: 0.1, Avg: 0.5, Max: 0.9, Diff: 0.9, Sum: 13.8] [Termination Attempts: Min: 1, Avg: 82.1, Max: 172, Diff: 171, Sum: 2464] [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.4, Diff: 0.4, Sum: 3.6] [GC Worker Total (ms): Min: 405.9, Avg: 406.5, Max: 407.3, Diff: 1.4, Sum: 12195.3] [GC Worker End (ms): Min: 79409.4, Avg: 79409.5, Max: 79409.8, Diff: 0.4] [Code Root Fixup: 0.1 ms] [Code Root Purge: 0.0 ms] [Clear CT: 6.7 ms] [Other: 16.9 ms] [Choose CSet: 0.0 ms] [Ref Proc: 5.2 ms] [Ref Enq: 0.0 ms] [Redirty Cards: 9.2 ms] [Humongous Register: 0.3 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 0.7 ms] Please help to solve this issue! Thanks in advance! Regards! Vigz
