Re: Java GC issue investigation
First thing is to stop using CMS and use G1GC. We’ve been using these settings with over a hundred machines in prod for nearly four years. SOLR_HEAP=8g # Use G1 GC -- wunder 2017-01-23 # Settings from https://wiki.apache.org/solr/ShawnHeisey GC_TUNE=" \ -XX:+UseG1GC \ -XX:+ParallelRefProcEnabled \ -XX:G1HeapRegionSize=8m \ -XX:MaxGCPauseMillis=200 \ -XX:+UseLargePages \ -XX:+AggressiveOpts \ " wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 7, 2020, at 2:39 AM, Karol Grzyb wrote: > > Hi Matthew, Erick! > > Thank you very much for the feedback, I'll try to convince them to > reduce the heap size. > > current GC settings: > > -XX:+CMSParallelRemarkEnabled > -XX:+CMSScavengeBeforeRemark > -XX:+ParallelRefProcEnabled > -XX:+UseCMSInitiatingOccupancyOnly > -XX:+UseConcMarkSweepGC > -XX:+UseParNewGC > -XX:CMSInitiatingOccupancyFraction=50 > -XX:CMSMaxAbortablePrecleanTime=6000 > -XX:ConcGCThreads=4 > -XX:MaxTenuringThreshold=8 > -XX:NewRatio=3 > -XX:ParallelGCThreads=4 > -XX:PretenureSizeThreshold=64m > -XX:SurvivorRatio=4 > -XX:TargetSurvivorRatio=90 > > Kind regards, > Karol > > > wt., 6 paź 2020 o 16:52 Erick Erickson napisał(a): >> >> 12G is not that huge, it’s surprising that you’re seeing this problem. >> >> However, there are a couple of things to look at: >> >> 1> If you’re saying that you have 16G total physical memory and are >> allocating 12G to Solr, that’s an anti-pattern. See: >> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html >> If at all possible, you should allocate between 25% and 50% of your physical >> memory to Solr... >> >> 2> what garbage collector are you using? G1GC might be a better choice. >> >>> On Oct 6, 2020, at 10:44 AM, matthew sporleder wrote: >>> >>> Your index is so small that it should easily get cached into OS memory >>> as it is accessed. Having a too-big heap is a known problem >>> situation. >>> >>> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-HowmuchheapspacedoIneed? >>> >>> On Tue, Oct 6, 2020 at 9:44 AM Karol Grzyb wrote: Hi Matthew, Thank you for the answer, I cannot reproduce the setup locally I'll try to convince them to reduce Xmx, I guess they will rather not agree to 1GB but something less than 12G for sure. And have some proper dev setup because for now we could only test prod or stage which are difficult to adjust. Is being stuck in GC common behaviour when the index is small compared to available heap during bigger load? I was more worried about the ratio of heap to total host memory. Regards, Karol wt., 6 paź 2020 o 14:39 matthew sporleder napisał(a): > > You have a 12G heap for a 200MB index? Can you just try changing Xmx > to, like, 1g ? > > On Tue, Oct 6, 2020 at 7:43 AM Karol Grzyb wrote: >> >> Hi, >> >> I'm involved in investigation of issue that involves huge GC overhead >> that happens during performance tests on Solr Nodes. Solr version is >> 6.1. Last test were done on staging env, and we run into problems for >> <100 requests/second. >> >> The size of the index itself is ~200MB ~ 50K docs >> Index has small updates every 15min. >> >> >> >> Queries involve sorting and faceting. >> >> I've gathered some heap dumps, I can see from them that most of heap >> memory is retained because of object of following classes: >> >> -org.apache.lucene.search.grouping.term.TermSecondPassGroupingCollector >> (>4G, 91% of heap) >> -org.apache.lucene.search.grouping.AbstractSecondPassGroupingCollector$SearchGroupDocs >> -org.apache.lucene.search.FieldValueHitQueue$MultiComparatorsFieldValueHitQueue >> -org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector >> (>3.7G 76% of heap) >> >> >> >> Based on information above is there anything generic that can been >> looked at as source of potential improvement without diving deeply >> into schema and queries (which may be very difficlut to change at this >> moment)? I don't see docvalues being enabled - could this help, as if >> I get the docs correctly, it's specifically helpful when there are >> many sorts/grouping/facets? Or I >> >> Additionaly I see, that many threads are blocked on LRUCache.get, >> should I recomend switching to FastLRUCache? >> >> Also, I wonder if -Xmx12288m for java heap is not too much for 16G >> memory? I see some (~5/s) page faults in Dynatrace during the biggest >> traffic. >> >> Thank you very much for any help, >> Kind regards, >> Karol >>
Re: Java GC issue investigation
Hi Matthew, Erick! Thank you very much for the feedback, I'll try to convince them to reduce the heap size. current GC settings: -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark -XX:+ParallelRefProcEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:ConcGCThreads=4 -XX:MaxTenuringThreshold=8 -XX:NewRatio=3 -XX:ParallelGCThreads=4 -XX:PretenureSizeThreshold=64m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 Kind regards, Karol wt., 6 paź 2020 o 16:52 Erick Erickson napisał(a): > > 12G is not that huge, it’s surprising that you’re seeing this problem. > > However, there are a couple of things to look at: > > 1> If you’re saying that you have 16G total physical memory and are > allocating 12G to Solr, that’s an anti-pattern. See: > https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > If at all possible, you should allocate between 25% and 50% of your physical > memory to Solr... > > 2> what garbage collector are you using? G1GC might be a better choice. > > > On Oct 6, 2020, at 10:44 AM, matthew sporleder wrote: > > > > Your index is so small that it should easily get cached into OS memory > > as it is accessed. Having a too-big heap is a known problem > > situation. > > > > https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-HowmuchheapspacedoIneed? > > > > On Tue, Oct 6, 2020 at 9:44 AM Karol Grzyb wrote: > >> > >> Hi Matthew, > >> > >> Thank you for the answer, I cannot reproduce the setup locally I'll > >> try to convince them to reduce Xmx, I guess they will rather not agree > >> to 1GB but something less than 12G for sure. > >> And have some proper dev setup because for now we could only test prod > >> or stage which are difficult to adjust. > >> > >> Is being stuck in GC common behaviour when the index is small compared > >> to available heap during bigger load? I was more worried about the > >> ratio of heap to total host memory. > >> > >> Regards, > >> Karol > >> > >> > >> wt., 6 paź 2020 o 14:39 matthew sporleder > >> napisał(a): > >>> > >>> You have a 12G heap for a 200MB index? Can you just try changing Xmx > >>> to, like, 1g ? > >>> > >>> On Tue, Oct 6, 2020 at 7:43 AM Karol Grzyb wrote: > > Hi, > > I'm involved in investigation of issue that involves huge GC overhead > that happens during performance tests on Solr Nodes. Solr version is > 6.1. Last test were done on staging env, and we run into problems for > <100 requests/second. > > The size of the index itself is ~200MB ~ 50K docs > Index has small updates every 15min. > > > > Queries involve sorting and faceting. > > I've gathered some heap dumps, I can see from them that most of heap > memory is retained because of object of following classes: > > -org.apache.lucene.search.grouping.term.TermSecondPassGroupingCollector > (>4G, 91% of heap) > -org.apache.lucene.search.grouping.AbstractSecondPassGroupingCollector$SearchGroupDocs > -org.apache.lucene.search.FieldValueHitQueue$MultiComparatorsFieldValueHitQueue > -org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector > (>3.7G 76% of heap) > > > > Based on information above is there anything generic that can been > looked at as source of potential improvement without diving deeply > into schema and queries (which may be very difficlut to change at this > moment)? I don't see docvalues being enabled - could this help, as if > I get the docs correctly, it's specifically helpful when there are > many sorts/grouping/facets? Or I > > Additionaly I see, that many threads are blocked on LRUCache.get, > should I recomend switching to FastLRUCache? > > Also, I wonder if -Xmx12288m for java heap is not too much for 16G > memory? I see some (~5/s) page faults in Dynatrace during the biggest > traffic. > > Thank you very much for any help, > Kind regards, > Karol >
Re: Java GC issue investigation
12G is not that huge, it’s surprising that you’re seeing this problem. However, there are a couple of things to look at: 1> If you’re saying that you have 16G total physical memory and are allocating 12G to Solr, that’s an anti-pattern. See: https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html If at all possible, you should allocate between 25% and 50% of your physical memory to Solr... 2> what garbage collector are you using? G1GC might be a better choice. > On Oct 6, 2020, at 10:44 AM, matthew sporleder wrote: > > Your index is so small that it should easily get cached into OS memory > as it is accessed. Having a too-big heap is a known problem > situation. > > https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-HowmuchheapspacedoIneed? > > On Tue, Oct 6, 2020 at 9:44 AM Karol Grzyb wrote: >> >> Hi Matthew, >> >> Thank you for the answer, I cannot reproduce the setup locally I'll >> try to convince them to reduce Xmx, I guess they will rather not agree >> to 1GB but something less than 12G for sure. >> And have some proper dev setup because for now we could only test prod >> or stage which are difficult to adjust. >> >> Is being stuck in GC common behaviour when the index is small compared >> to available heap during bigger load? I was more worried about the >> ratio of heap to total host memory. >> >> Regards, >> Karol >> >> >> wt., 6 paź 2020 o 14:39 matthew sporleder napisał(a): >>> >>> You have a 12G heap for a 200MB index? Can you just try changing Xmx >>> to, like, 1g ? >>> >>> On Tue, Oct 6, 2020 at 7:43 AM Karol Grzyb wrote: Hi, I'm involved in investigation of issue that involves huge GC overhead that happens during performance tests on Solr Nodes. Solr version is 6.1. Last test were done on staging env, and we run into problems for <100 requests/second. The size of the index itself is ~200MB ~ 50K docs Index has small updates every 15min. Queries involve sorting and faceting. I've gathered some heap dumps, I can see from them that most of heap memory is retained because of object of following classes: -org.apache.lucene.search.grouping.term.TermSecondPassGroupingCollector (>4G, 91% of heap) -org.apache.lucene.search.grouping.AbstractSecondPassGroupingCollector$SearchGroupDocs -org.apache.lucene.search.FieldValueHitQueue$MultiComparatorsFieldValueHitQueue -org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector (>3.7G 76% of heap) Based on information above is there anything generic that can been looked at as source of potential improvement without diving deeply into schema and queries (which may be very difficlut to change at this moment)? I don't see docvalues being enabled - could this help, as if I get the docs correctly, it's specifically helpful when there are many sorts/grouping/facets? Or I Additionaly I see, that many threads are blocked on LRUCache.get, should I recomend switching to FastLRUCache? Also, I wonder if -Xmx12288m for java heap is not too much for 16G memory? I see some (~5/s) page faults in Dynatrace during the biggest traffic. Thank you very much for any help, Kind regards, Karol
Re: Java GC issue investigation
Your index is so small that it should easily get cached into OS memory as it is accessed. Having a too-big heap is a known problem situation. https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-HowmuchheapspacedoIneed? On Tue, Oct 6, 2020 at 9:44 AM Karol Grzyb wrote: > > Hi Matthew, > > Thank you for the answer, I cannot reproduce the setup locally I'll > try to convince them to reduce Xmx, I guess they will rather not agree > to 1GB but something less than 12G for sure. > And have some proper dev setup because for now we could only test prod > or stage which are difficult to adjust. > > Is being stuck in GC common behaviour when the index is small compared > to available heap during bigger load? I was more worried about the > ratio of heap to total host memory. > > Regards, > Karol > > > wt., 6 paź 2020 o 14:39 matthew sporleder napisał(a): > > > > You have a 12G heap for a 200MB index? Can you just try changing Xmx > > to, like, 1g ? > > > > On Tue, Oct 6, 2020 at 7:43 AM Karol Grzyb wrote: > > > > > > Hi, > > > > > > I'm involved in investigation of issue that involves huge GC overhead > > > that happens during performance tests on Solr Nodes. Solr version is > > > 6.1. Last test were done on staging env, and we run into problems for > > > <100 requests/second. > > > > > > The size of the index itself is ~200MB ~ 50K docs > > > Index has small updates every 15min. > > > > > > > > > > > > Queries involve sorting and faceting. > > > > > > I've gathered some heap dumps, I can see from them that most of heap > > > memory is retained because of object of following classes: > > > > > > -org.apache.lucene.search.grouping.term.TermSecondPassGroupingCollector > > > (>4G, 91% of heap) > > > -org.apache.lucene.search.grouping.AbstractSecondPassGroupingCollector$SearchGroupDocs > > > -org.apache.lucene.search.FieldValueHitQueue$MultiComparatorsFieldValueHitQueue > > > -org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector > > > (>3.7G 76% of heap) > > > > > > > > > > > > Based on information above is there anything generic that can been > > > looked at as source of potential improvement without diving deeply > > > into schema and queries (which may be very difficlut to change at this > > > moment)? I don't see docvalues being enabled - could this help, as if > > > I get the docs correctly, it's specifically helpful when there are > > > many sorts/grouping/facets? Or I > > > > > > Additionaly I see, that many threads are blocked on LRUCache.get, > > > should I recomend switching to FastLRUCache? > > > > > > Also, I wonder if -Xmx12288m for java heap is not too much for 16G > > > memory? I see some (~5/s) page faults in Dynatrace during the biggest > > > traffic. > > > > > > Thank you very much for any help, > > > Kind regards, > > > Karol
Re: Java GC issue investigation
Hi Matthew, Thank you for the answer, I cannot reproduce the setup locally I'll try to convince them to reduce Xmx, I guess they will rather not agree to 1GB but something less than 12G for sure. And have some proper dev setup because for now we could only test prod or stage which are difficult to adjust. Is being stuck in GC common behaviour when the index is small compared to available heap during bigger load? I was more worried about the ratio of heap to total host memory. Regards, Karol wt., 6 paź 2020 o 14:39 matthew sporleder napisał(a): > > You have a 12G heap for a 200MB index? Can you just try changing Xmx > to, like, 1g ? > > On Tue, Oct 6, 2020 at 7:43 AM Karol Grzyb wrote: > > > > Hi, > > > > I'm involved in investigation of issue that involves huge GC overhead > > that happens during performance tests on Solr Nodes. Solr version is > > 6.1. Last test were done on staging env, and we run into problems for > > <100 requests/second. > > > > The size of the index itself is ~200MB ~ 50K docs > > Index has small updates every 15min. > > > > > > > > Queries involve sorting and faceting. > > > > I've gathered some heap dumps, I can see from them that most of heap > > memory is retained because of object of following classes: > > > > -org.apache.lucene.search.grouping.term.TermSecondPassGroupingCollector > > (>4G, 91% of heap) > > -org.apache.lucene.search.grouping.AbstractSecondPassGroupingCollector$SearchGroupDocs > > -org.apache.lucene.search.FieldValueHitQueue$MultiComparatorsFieldValueHitQueue > > -org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector > > (>3.7G 76% of heap) > > > > > > > > Based on information above is there anything generic that can been > > looked at as source of potential improvement without diving deeply > > into schema and queries (which may be very difficlut to change at this > > moment)? I don't see docvalues being enabled - could this help, as if > > I get the docs correctly, it's specifically helpful when there are > > many sorts/grouping/facets? Or I > > > > Additionaly I see, that many threads are blocked on LRUCache.get, > > should I recomend switching to FastLRUCache? > > > > Also, I wonder if -Xmx12288m for java heap is not too much for 16G > > memory? I see some (~5/s) page faults in Dynatrace during the biggest > > traffic. > > > > Thank you very much for any help, > > Kind regards, > > Karol
Re: Java GC issue investigation
You have a 12G heap for a 200MB index? Can you just try changing Xmx to, like, 1g ? On Tue, Oct 6, 2020 at 7:43 AM Karol Grzyb wrote: > > Hi, > > I'm involved in investigation of issue that involves huge GC overhead > that happens during performance tests on Solr Nodes. Solr version is > 6.1. Last test were done on staging env, and we run into problems for > <100 requests/second. > > The size of the index itself is ~200MB ~ 50K docs > Index has small updates every 15min. > > > > Queries involve sorting and faceting. > > I've gathered some heap dumps, I can see from them that most of heap > memory is retained because of object of following classes: > > -org.apache.lucene.search.grouping.term.TermSecondPassGroupingCollector > (>4G, 91% of heap) > -org.apache.lucene.search.grouping.AbstractSecondPassGroupingCollector$SearchGroupDocs > -org.apache.lucene.search.FieldValueHitQueue$MultiComparatorsFieldValueHitQueue > -org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector > (>3.7G 76% of heap) > > > > Based on information above is there anything generic that can been > looked at as source of potential improvement without diving deeply > into schema and queries (which may be very difficlut to change at this > moment)? I don't see docvalues being enabled - could this help, as if > I get the docs correctly, it's specifically helpful when there are > many sorts/grouping/facets? Or I > > Additionaly I see, that many threads are blocked on LRUCache.get, > should I recomend switching to FastLRUCache? > > Also, I wonder if -Xmx12288m for java heap is not too much for 16G > memory? I see some (~5/s) page faults in Dynatrace during the biggest > traffic. > > Thank you very much for any help, > Kind regards, > Karol
Java GC issue investigation
Hi, I'm involved in investigation of issue that involves huge GC overhead that happens during performance tests on Solr Nodes. Solr version is 6.1. Last test were done on staging env, and we run into problems for <100 requests/second. The size of the index itself is ~200MB ~ 50K docs Index has small updates every 15min. Queries involve sorting and faceting. I've gathered some heap dumps, I can see from them that most of heap memory is retained because of object of following classes: -org.apache.lucene.search.grouping.term.TermSecondPassGroupingCollector (>4G, 91% of heap) -org.apache.lucene.search.grouping.AbstractSecondPassGroupingCollector$SearchGroupDocs -org.apache.lucene.search.FieldValueHitQueue$MultiComparatorsFieldValueHitQueue -org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector (>3.7G 76% of heap) Based on information above is there anything generic that can been looked at as source of potential improvement without diving deeply into schema and queries (which may be very difficlut to change at this moment)? I don't see docvalues being enabled - could this help, as if I get the docs correctly, it's specifically helpful when there are many sorts/grouping/facets? Or I Additionaly I see, that many threads are blocked on LRUCache.get, should I recomend switching to FastLRUCache? Also, I wonder if -Xmx12288m for java heap is not too much for 16G memory? I see some (~5/s) page faults in Dynatrace during the biggest traffic. Thank you very much for any help, Kind regards, Karol