One other note on the JVM options, even though those aren’t the cause of the problem.
Don’t run four GC threads when you have four processors. That can use 100% of CPU just doing GC. With four processors, I’d run one thread. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 11, 2018, at 7:49 AM, Walter Underwood <wun...@wunderwood.org> wrote: > > For readability, I’d use -Xmx12G instead of -XX:MaxHeapSize=12884901888. > Also, I always use a start size the same as the max size, since servers will > eventually grow to the max size. So: > > -Xmx12G -Xms12G > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > >> On Apr 11, 2018, at 6:29 AM, Sujay Bawaskar <sujaybawas...@gmail.com> wrote: >> >> What is directory factory defined in solrconfig.xml? Your JVM heap should >> be tuned up with respect to that. >> How solr is being use, is it more updates and less query or less updates >> more queries? >> What is OOM error? Is it frequent GC or Error 12? >> >> On Wed, Apr 11, 2018 at 6:05 PM, Adam Harrison-Fuller < >> aharrison-ful...@mintel.com> wrote: >> >>> Hey Jesus, >>> >>> Thanks for the suggestions. The Solr nodes have 4 CPUs assigned to them. >>> >>> Cheers! >>> Adam >>> >>> On 11 April 2018 at 11:22, Jesus Olivan <jesus.oli...@letgo.com> wrote: >>> >>>> Hi Adam, >>>> >>>> IMHO you could try increasing heap to 20 Gb (with 46 Gb of physical RAM, >>>> your JVM can afford more RAM without threading penalties due to outside >>>> heap RAM lacks. >>>> >>>> Another good one would be to increase -XX:CMSInitiatingOccupancyFraction >>>> =50 >>>> to 75. I think that CMS collector works better when Old generation space >>> is >>>> more populated. >>>> >>>> I usually use to set Survivor spaces to lesser size. If you want to try >>>> SurvivorRatio to 6, i think performance would be improved. >>>> >>>> Another good practice for me would be to set an static NewSize instead >>>> of -XX:NewRatio=3. >>>> You could try to set -XX:NewSize=7000m and -XX:MaxNewSize=7000Mb (one >>> third >>>> of total heap space is recommended). >>>> >>>> Finally, my best results after a deep JVM I+D related to Solr, came >>>> removing ScavengeBeforeRemark flag and applying this new one: + >>>> ParGCCardsPerStrideChunk. >>>> >>>> However, It would be a good one to set ParallelGCThreads and >>>> *ConcGCThreads *to their optimal value, and we need you system CPU number >>>> to know it. Can you provide this data, please? >>>> >>>> Regards >>>> >>>> >>>> 2018-04-11 12:01 GMT+02:00 Adam Harrison-Fuller < >>>> aharrison-ful...@mintel.com >>>>> : >>>> >>>>> Hey all, >>>>> >>>>> I was wondering if I could get some JVM/GC tuning advice to resolve an >>>>> issue that we are experiencing. >>>>> >>>>> Full disclaimer, I am in no way a JVM/Solr expert so any advice you can >>>>> render would be greatly appreciated. >>>>> >>>>> Our Solr cloud nodes are having issues throwing OOM exceptions under >>>> load. >>>>> This issue has only started manifesting itself over the last few months >>>>> during which time the only change I can discern is an increase in index >>>>> size. They are running Solr 5.5.2 on OpenJDK version "1.8.0_101". The >>>>> index is currently 58G and the server has 46G of physical RAM and runs >>>>> nothing other than the Solr node. >>>>> >>>>> The JVM is invoked with the following JVM options: >>>>> -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime= >>>> 6000 >>>>> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark >>>>> -XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888 >>>> -XX:+ManagementServer >>>>> -XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8 >>>>> -XX:NewRatio=3 -XX:OldPLABSize=16 >>>>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 30000 >>>>> /data/gnpd/solr/logs >>>>> -XX:ParallelGCThreads=4 >>>>> -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864 >>>>> -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps >>>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC >>>>> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4 >>>>> -XX:TargetSurvivorRatio=90 >>>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers >>>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC >>>>> >>>>> These values were decided upon serveral years by a colleague based upon >>>>> some suggestions from this mailing group with an index size ~25G. >>>>> >>>>> I have imported the GC logs into GCViewer and attached a link to a >>>>> screenshot showing the lead up to a OOM crash. Interestingly the young >>>>> generation space is almost empty before the repeated GC's and >>> subsequent >>>>> crash. >>>>> https://imgur.com/a/Wtlez >>>>> >>>>> I was considering slowly increasing the amount of heap available to the >>>> JVM >>>>> slowly until the crashes, any other suggestions? I'm looking at trying >>>> to >>>>> get the nodes stable without having issues with the GC taking forever >>> to >>>>> run. >>>>> >>>>> Additional information can be provided on request. >>>>> >>>>> Cheers! >>>>> Adam >>>>> >>>>> -- >>>>> >>>>> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN >>>>> Registered in >>>>> England: Number 1475918. | VAT Number: GB 232 9342 72 >>>>> >>>>> Contact details for >>>>> our other offices can be found at http://www.mintel.com/office- >>> locations >>>>> <http://www.mintel.com/office-locations>. >>>>> >>>>> This email and any attachments >>>>> may include content that is confidential, privileged >>>>> or otherwise >>>>> protected under applicable law. Unauthorised disclosure, copying, >>>>> distribution >>>>> or use of the contents is prohibited and may be unlawful. If >>>>> you have received this email in error, >>>>> including without appropriate >>>>> authorisation, then please reply to the sender about the error >>>>> and delete >>>>> this email and any attachments. >>>>> >>>>> >>>> >>> >>> -- >>> >>> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN >>> Registered in >>> England: Number 1475918. | VAT Number: GB 232 9342 72 >>> >>> Contact details for >>> our other offices can be found at http://www.mintel.com/office-locations >>> <http://www.mintel.com/office-locations>. >>> >>> This email and any attachments >>> may include content that is confidential, privileged >>> or otherwise >>> protected under applicable law. Unauthorised disclosure, copying, >>> distribution >>> or use of the contents is prohibited and may be unlawful. If >>> you have received this email in error, >>> including without appropriate >>> authorisation, then please reply to the sender about the error >>> and delete >>> this email and any attachments. >>> >>> >> >> >> -- >> Thanks, >> Sujay P Bawaskar >> M:+91-77091 53669 >