On 2/12/2019 7:35 AM, Joe Obernberger wrote:
Yesterday, we upgraded our 40 node cluster from solr 7.6.0 to solr
7.7.0. This morning, all the nodes are using 1200+% of CPU. It looks
like it's in garbage collection. We did reduce our HDFS cache size from
11G to 6G, but other than that, no other parameters were changes.
Your message included a small excerpt from the GC log. That is not
helpful. We will need the entire GC log, possibly more than one log.
The log or logs should fully cover the timeframe where the problem
occurs. Full disclosure: Once obtained, I would use this website to
analyze GC log data:
http://gceasy.io
Parameters are:
GC_TUNE="-XX:+UseG1GC \
-XX:MaxDirectMemorySize=6g \
-XX:+PerfDisableSharedMem \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=16m \
-XX:MaxGCPauseMillis=300 \
-XX:InitiatingHeapOccupancyPercent=75 \
-XX:+UseLargePages \
-XX:ParallelGCThreads=16 \
-XX:-ResizePLAB \
-XX:+AggressiveOpts"
Looks like you've chosen to use G1 settings very similar to what I put
on my wiki page:
https://wiki.apache.org/solr/ShawnHeisey#Current_experiments
Those settings are not intended to be a canonical resource that everyone
can use. Your heap size is different than what I was using when I
worked on that, so you may need different settings.
Have you considered not using your own GC tuning, letting Solr's start
script handle that?
With the limited information available, my initial guess is that you
need a larger heap, that Java is spending all its time freeing up enough
memory to keep the program running.
Thanks,
Shawn