When I ran load benchmarks with 6.3.0, an overloaded cluster would get super 
slow but keep functioning. With 6.5.1, we hit 100% CPU, then start getting 
OOMs. That is really bad, because it means we need to reboot every node in the 
cluster.

Also, the JVM OOM hook isn’t running the process killer (JVM 1.8.0_121-b13). 
Using the G1 collector with the Shawn Heisey settings in an 8G heap.

GC_TUNE=" \
-XX:+UseG1GC \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=8m \
-XX:MaxGCPauseMillis=200 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"

This is not good behavior in prod. The process goes to the bad place, then we 
need to wait until someone is paged and kills it manually. Luckily, it usually 
drops out of the live nodes for each collection and doesn’t take user traffic.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


Reply via email to