Hi solr user group, Sorry if this isn't directly a Solr question. Seems like once in a blue moon the GC crashes on a server in our Solr 3.6.1 slave farm. This seems to only happen on a couple of the twelve slaves we have deployed and only very rarely on those. It seems like this doesn't directly affect solr because in the logs it looks like solr keeps working after the time of the exception but our external monitoring tool reports that the solr service is down so our operations department restarts solr on that box and alerts me. The solr logs show nothing unusual. The exception does show up in the catalina.out log file though. Does this happen to anyone else? Here is the basic error and I have attached the crash dump file also. Our total uptime on these boxes is over a year now BTW.
# # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00002b5379346612, pid=13724, tid=1082353984 # # JRE version: 6.0_25-b06 # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.0-b11 mixed mode linux-amd64 ) # Problematic frame: # V [libjvm.so+0x3c4612] Par_ConcMarkingClosure::trim_queue(unsigned long)+0x82 # # An error report file with more information is saved as: # /var/LucidWorks/lucidworks/hs_err_pid13724.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # VM Arguments: jvm_args: -Djava.util.logging.config.file=/var/LucidWorks/lucidworks/tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xmx32768m -Xms32768m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=6060 -Djava.endorsed.dirs=/var/LucidWorks/lucidworks/tomcat/endorsed -Dcatalina.base=/var/LucidWorks/lucidworks/tomcat -Dcatalina.home=/var/LucidWorks/lucidworks/tomcat -Djava.io.tmpdir=/var/LucidWorks/lucidworks/tomcat/temp java_command: org.apache.catalina.startup.Bootstrap -server -Dsolr.solr.home=lucidworks/solr start Launcher Type: SUN_STANDARD Stack: [0x0000000000000000,0x0000000000000000], sp=0x0000000040835eb0, free space=1056983k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x3c4612] Par_ConcMarkingClosure::trim_queue(unsigned long)+0x82 V [libjvm.so+0x3c481a] CMSConcMarkingTask::do_work_steal(int)+0xfa V [libjvm.so+0x3c3dcf] CMSConcMarkingTask::work(int)+0xef V [libjvm.so+0x8783dc] YieldingFlexibleGangWorker::loop()+0xbc V [libjvm.so+0x8755b4] GangWorker::run()+0x24 V [libjvm.so+0x71096f] java_start(Thread*)+0x13f Heap par new generation total 345024K, used 180672K [0x00002aaaae120000, 0x00002aaac5780000, 0x00002aaac5780000) eden space 306688K, 53% used [0x00002aaaae120000, 0x00002aaab8243c28, 0x00002aaac0ca0000) from space 38336K, 40% used [0x00002aaac3210000, 0x00002aaac415c3f8, 0x00002aaac5780000) to space 38336K, 0% used [0x00002aaac0ca0000, 0x00002aaac0ca0000, 0x00002aaac3210000) concurrent mark-sweep generation total 33171072K, used 12144213K [0x00002aaac5780000, 0x00002ab2ae120000, 0x00002ab2ae120000) concurrent-mark-sweep perm gen total 83968K, used 50650K [0x00002ab2ae120000, 0x00002ab2b3320000, 0x00002ab2b3320000) Code Cache [0x00002aaaab054000, 0x00002aaaab9a4000, 0x00002aaaae054000) total_blobs=2800 nmethods=2273 adapters=480 free_code_cache=40752512 largest_free_block=15808 Thanks, Robert (Robi) Petersen Senior Software Engineer Search Department