[jira] [Commented] (SOLR-11196) Solr 6.5.0 consuming entire Heap suddenly while working smoothly on Solr 6.1.0
[ https://issues.apache.org/jira/browse/SOLR-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804718#comment-16804718 ] Ishan Chattopadhyaya commented on SOLR-11196: - Solr 7x, 8.0 or Solr 6.6.6 (releasing shortly) will address SOLR-10506, which can potentially be the cause for this. > Solr 6.5.0 consuming entire Heap suddenly while working smoothly on Solr 6.1.0 > -- > > Key: SOLR-11196 > URL: https://issues.apache.org/jira/browse/SOLR-11196 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.5, 6.6 >Reporter: Amit >Priority: Major > > Please note, this issue does not occurs on Solr-6.1.0 while the same occurs > on Solr-6.5.0 and above. To fix this we had to move back to Solr-6.1.0 > version. > We have been hit by a Solr Behavior in production which we are unable to > debug. To start with here are the configurations for solr: > Solr Version: 6.5, Master with 1 Slave of the same configuration as mentioned > below. > *JVM Config:* > > {code:java} > -Xms2048m > -Xmx4096m > -XX:+ParallelRefProcEnabled > -XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSInitiatingOccupancyFraction=50 > {code} > Rest all are default values. > *Solr Config* : > > {code:java} > > > {solr.autoCommit.maxTime:30} > false > > > > {solr.autoSoftCommit.maxTime:90} > > > > 1024 >autowarmCount="0" /> >autowarmCount="0" /> >autowarmCount="0" /> >initialSize="0" autowarmCount="10" regenerator="solr.NoOpRegenerator" /> > true > 20 > ${solr.query.max.docs:40} > > false > 2 > > {code} > *The Host (AWS) configurations are:* > RAM: 7.65GB > Cores: 4 > Now, our solr works perfectly fine for hours and sometimes for days but > sometimes suddenly memory jumps up and the GC kicks in causing long big > pauses with not much to recover. We are seeing this happening most often when > one or multiple segments gets added or deleted post a hard commit. It doesn't > matter how many documents got indexed. The images attached shows that just 1 > document was indexed, causing an addition of one segment and it all got > messed up till we restarted the Solr. > Here are the images from NewRelic and Sematext (Kindly click on the links to > view): > [JVM Heap Memory Image | https://i.stack.imgur.com/9dQAy.png] > [1 Document and 1 Segment addition Image | > https://i.stack.imgur.com/6N4FC.png] > Update: Here is the JMap output when SOLR last died, we have now increased > the JVM memory to xmx of 12GB: > > {code:java} > num #instances #bytes class name > -- > 1: 11210921 1076248416 > org.apache.lucene.codecs.lucene50.Lucene50PostingsFormat$IntBlockTermState > 2: 10623486 934866768 [Lorg.apache.lucene.index.TermState; > 3: 15567646 475873992 [B > 4: 10623485 424939400 > org.apache.lucene.search.spans.SpanTermQuery$SpanTermWeight > 5: 15508972 372215328 org.apache.lucene.util.BytesRef > 6: 15485834 371660016 org.apache.lucene.index.Term > 7: 15477679 371464296 > org.apache.lucene.search.spans.SpanTermQuery > 8: 10623486 339951552 org.apache.lucene.index.TermContext > 9: 1516724 150564320 [Ljava.lang.Object; > 10:724486 50948800 [C > 11: 1528110 36674640 java.util.ArrayList > 12:849884 27196288 > org.apache.lucene.search.spans.SpanNearQuery > 13:582008 23280320 > org.apache.lucene.search.spans.SpanNearQuery$SpanNearWeight > 14:481601 23116848 org.apache.lucene.document.FieldType > 15:623073 19938336 org.apache.lucene.document.StoredField > 16:721649 17319576 java.lang.String > 17: 327297329640 [J > 18: 146435788376 [F > {code} > The load on Solr is not much - max it goes to 2000 requests per minute. The > indexing load can sometimes be in burst but most of the time its pretty low. > But as mentioned above sometimes even a single document indexing can put solr > into tizzy and sometimes it just works like a charm. > Edit : > The last configuration on which 6.1 works but not 6.5 is: > *JVM Config:* > > {code:java} > Xms: 2 GB > Xmx: 12 GB > {code} > *Solr Config:* > We also removed soft commit. > {code:java} > > >${solr.autoCommit.maxTime:30} >true > > {code} > *The Host (AWS) configurations:* > RAM: 16GB > Cores: 4 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (SOLR-11196) Solr 6.5.0 consuming entire Heap suddenly while working smoothly on Solr 6.1.0
[ https://issues.apache.org/jira/browse/SOLR-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119629#comment-16119629 ] Amit commented on SOLR-11196: - This is a Master-Slave architecture. Indexing happens on master only. While searching is on both master and slave through a load balancer. Both Master and Slave gets OOM frequently. Both master and slave works smoothly on 6.1.0 with the same configurations. > Solr 6.5.0 consuming entire Heap suddenly while working smoothly on Solr 6.1.0 > -- > > Key: SOLR-11196 > URL: https://issues.apache.org/jira/browse/SOLR-11196 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.5, 6.6 >Reporter: Amit > > Please note, this issue does not occurs on Solr-6.1.0 while the same occurs > on Solr-6.5.0 and above. To fix this we had to move back to Solr-6.1.0 > version. > We have been hit by a Solr Behavior in production which we are unable to > debug. To start with here are the configurations for solr: > Solr Version: 6.5, Master with 1 Slave of the same configuration as mentioned > below. > *JVM Config:* > > {code:java} > -Xms2048m > -Xmx4096m > -XX:+ParallelRefProcEnabled > -XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSInitiatingOccupancyFraction=50 > {code} > Rest all are default values. > *Solr Config* : > > {code:java} > > > {solr.autoCommit.maxTime:30} > false > > > > {solr.autoSoftCommit.maxTime:90} > > > > 1024 >autowarmCount="0" /> >autowarmCount="0" /> >autowarmCount="0" /> >initialSize="0" autowarmCount="10" regenerator="solr.NoOpRegenerator" /> > true > 20 > ${solr.query.max.docs:40} > > false > 2 > > {code} > *The Host (AWS) configurations are:* > RAM: 7.65GB > Cores: 4 > Now, our solr works perfectly fine for hours and sometimes for days but > sometimes suddenly memory jumps up and the GC kicks in causing long big > pauses with not much to recover. We are seeing this happening most often when > one or multiple segments gets added or deleted post a hard commit. It doesn't > matter how many documents got indexed. The images attached shows that just 1 > document was indexed, causing an addition of one segment and it all got > messed up till we restarted the Solr. > Here are the images from NewRelic and Sematext (Kindly click on the links to > view): > [JVM Heap Memory Image | https://i.stack.imgur.com/9dQAy.png] > [1 Document and 1 Segment addition Image | > https://i.stack.imgur.com/6N4FC.png] > Update: Here is the JMap output when SOLR last died, we have now increased > the JVM memory to xmx of 12GB: > > {code:java} > num #instances #bytes class name > -- > 1: 11210921 1076248416 > org.apache.lucene.codecs.lucene50.Lucene50PostingsFormat$IntBlockTermState > 2: 10623486 934866768 [Lorg.apache.lucene.index.TermState; > 3: 15567646 475873992 [B > 4: 10623485 424939400 > org.apache.lucene.search.spans.SpanTermQuery$SpanTermWeight > 5: 15508972 372215328 org.apache.lucene.util.BytesRef > 6: 15485834 371660016 org.apache.lucene.index.Term > 7: 15477679 371464296 > org.apache.lucene.search.spans.SpanTermQuery > 8: 10623486 339951552 org.apache.lucene.index.TermContext > 9: 1516724 150564320 [Ljava.lang.Object; > 10:724486 50948800 [C > 11: 1528110 36674640 java.util.ArrayList > 12:849884 27196288 > org.apache.lucene.search.spans.SpanNearQuery > 13:582008 23280320 > org.apache.lucene.search.spans.SpanNearQuery$SpanNearWeight > 14:481601 23116848 org.apache.lucene.document.FieldType > 15:623073 19938336 org.apache.lucene.document.StoredField > 16:721649 17319576 java.lang.String > 17: 327297329640 [J > 18: 146435788376 [F > {code} > The load on Solr is not much - max it goes to 2000 requests per minute. The > indexing load can sometimes be in burst but most of the time its pretty low. > But as mentioned above sometimes even a single document indexing can put solr > into tizzy and sometimes it just works like a charm. > Edit : > The last configuration on which 6.1 works but not 6.5 is: > *JVM Config:* > > {code:java} > Xms: 2 GB > Xmx: 12 GB > {code} > *Solr Config:* > We also removed soft commit. > {code:java} > > >${solr.autoCommit.maxTime:30} >true > > {code} > *The Host (AWS) configurations:* > RAM: 16GB > Cores: 4 --
[jira] [Commented] (SOLR-11196) Solr 6.5.0 consuming entire Heap suddenly while working smoothly on Solr 6.1.0
[ https://issues.apache.org/jira/browse/SOLR-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118615#comment-16118615 ] Tomás Fernández Löbbe commented on SOLR-11196: -- bq. As this issue occurs on 6.5.0 and not on 6.1.0 so I expect it to be a bug. +1, lets reopen until we are sure this is not some bug Is it correct that this is Master-Slave architecture (not SolrCloud)? You are indexing on the master only, and searching on the slave only? Which server is getting OOM? Master or Slave? Your JMap lists a bunch of (span) queries, so I'd assume you are talking about the slave here, however you also say this happens when you add docs, could you clarify? > Solr 6.5.0 consuming entire Heap suddenly while working smoothly on Solr 6.1.0 > -- > > Key: SOLR-11196 > URL: https://issues.apache.org/jira/browse/SOLR-11196 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.5, 6.6 >Reporter: Amit >Priority: Critical > > Please note, this issue does not occurs on Solr-6.1.0 while the same occurs > on Solr-6.5.0 and above. To fix this we had to move back to Solr-6.1.0 > version. > We have been hit by a Solr Behavior in production which we are unable to > debug. To start with here are the configurations for solr: > Solr Version: 6.5, Master with 1 Slave of the same configuration as mentioned > below. > *JVM Config:* > > {code:java} > -Xms2048m > -Xmx4096m > -XX:+ParallelRefProcEnabled > -XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSInitiatingOccupancyFraction=50 > {code} > Rest all are default values. > *Solr Config* : > > {code:java} > > > {solr.autoCommit.maxTime:30} > false > > > > {solr.autoSoftCommit.maxTime:90} > > > > 1024 >autowarmCount="0" /> >autowarmCount="0" /> >autowarmCount="0" /> >initialSize="0" autowarmCount="10" regenerator="solr.NoOpRegenerator" /> > true > 20 > ${solr.query.max.docs:40} > > false > 2 > > {code} > *The Host (AWS) configurations are:* > RAM: 7.65GB > Cores: 4 > Now, our solr works perfectly fine for hours and sometimes for days but > sometimes suddenly memory jumps up and the GC kicks in causing long big > pauses with not much to recover. We are seeing this happening most often when > one or multiple segments gets added or deleted post a hard commit. It doesn't > matter how many documents got indexed. The images attached shows that just 1 > document was indexed, causing an addition of one segment and it all got > messed up till we restarted the Solr. > Here are the images from NewRelic and Sematext (Kindly click on the links to > view): > [JVM Heap Memory Image | https://i.stack.imgur.com/9dQAy.png] > [1 Document and 1 Segment addition Image | > https://i.stack.imgur.com/6N4FC.png] > Update: Here is the JMap output when SOLR last died, we have now increased > the JVM memory to xmx of 12GB: > > {code:java} > num #instances #bytes class name > -- > 1: 11210921 1076248416 > org.apache.lucene.codecs.lucene50.Lucene50PostingsFormat$IntBlockTermState > 2: 10623486 934866768 [Lorg.apache.lucene.index.TermState; > 3: 15567646 475873992 [B > 4: 10623485 424939400 > org.apache.lucene.search.spans.SpanTermQuery$SpanTermWeight > 5: 15508972 372215328 org.apache.lucene.util.BytesRef > 6: 15485834 371660016 org.apache.lucene.index.Term > 7: 15477679 371464296 > org.apache.lucene.search.spans.SpanTermQuery > 8: 10623486 339951552 org.apache.lucene.index.TermContext > 9: 1516724 150564320 [Ljava.lang.Object; > 10:724486 50948800 [C > 11: 1528110 36674640 java.util.ArrayList > 12:849884 27196288 > org.apache.lucene.search.spans.SpanNearQuery > 13:582008 23280320 > org.apache.lucene.search.spans.SpanNearQuery$SpanNearWeight > 14:481601 23116848 org.apache.lucene.document.FieldType > 15:623073 19938336 org.apache.lucene.document.StoredField > 16:721649 17319576 java.lang.String > 17: 327297329640 [J > 18: 146435788376 [F > {code} > The load on Solr is not much - max it goes to 2000 requests per minute. The > indexing load can sometimes be in burst but most of the time its pretty low. > But as mentioned above sometimes even a single document indexing can put solr > into tizzy and sometimes it just works like a charm. > Edit : > The last configuration on which 6.1 works but not
[jira] [Commented] (SOLR-11196) Solr 6.5.0 consuming entire Heap suddenly while working smoothly on Solr 6.1.0
[ https://issues.apache.org/jira/browse/SOLR-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118160#comment-16118160 ] Amit commented on SOLR-11196: - 1. As this issue occurs on 6.5.0 and not on 6.1.0 so I expect it to be a bug. 2. Please note, we have xmx of 12GB on AWS with RAM 16GB, have made an edit, please refer. 3. We are running our instances on gp2 type EBS storage. Solr indexes are on magnetic EBS disk. > Solr 6.5.0 consuming entire Heap suddenly while working smoothly on Solr 6.1.0 > -- > > Key: SOLR-11196 > URL: https://issues.apache.org/jira/browse/SOLR-11196 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.5, 6.6 >Reporter: Amit >Priority: Critical > > Please note, this issue does not occurs on Solr-6.1.0 while the same occurs > on Solr-6.5.0 and above. To fix this we had to move back to Solr-6.1.0 > version. > We have been hit by a Solr Behavior in production which we are unable to > debug. To start with here are the configurations for solr: > Solr Version: 6.5, Master with 1 Slave of the same configuration as mentioned > below. > *JVM Config:* > > {code:java} > -Xms2048m > -Xmx4096m > -XX:+ParallelRefProcEnabled > -XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSInitiatingOccupancyFraction=50 > {code} > Rest all are default values. > *Solr Config* : > > {code:java} > > > {solr.autoCommit.maxTime:30} > false > > > > {solr.autoSoftCommit.maxTime:90} > > > > 1024 >autowarmCount="0" /> >autowarmCount="0" /> >autowarmCount="0" /> >initialSize="0" autowarmCount="10" regenerator="solr.NoOpRegenerator" /> > true > 20 > ${solr.query.max.docs:40} > > false > 2 > > {code} > *The Host (AWS) configurations are:* > RAM: 7.65GB > Cores: 4 > Now, our solr works perfectly fine for hours and sometimes for days but > sometimes suddenly memory jumps up and the GC kicks in causing long big > pauses with not much to recover. We are seeing this happening most often when > one or multiple segments gets added or deleted post a hard commit. It doesn't > matter how many documents got indexed. The images attached shows that just 1 > document was indexed, causing an addition of one segment and it all got > messed up till we restarted the Solr. > Here are the images from NewRelic and Sematext (Kindly click on the links to > view): > [JVM Heap Memory Image | https://i.stack.imgur.com/9dQAy.png] > [1 Document and 1 Segment addition Image | > https://i.stack.imgur.com/6N4FC.png] > Update: Here is the JMap output when SOLR last died, we have now increased > the JVM memory to xmx of 12GB: > > {code:java} > num #instances #bytes class name > -- > 1: 11210921 1076248416 > org.apache.lucene.codecs.lucene50.Lucene50PostingsFormat$IntBlockTermState > 2: 10623486 934866768 [Lorg.apache.lucene.index.TermState; > 3: 15567646 475873992 [B > 4: 10623485 424939400 > org.apache.lucene.search.spans.SpanTermQuery$SpanTermWeight > 5: 15508972 372215328 org.apache.lucene.util.BytesRef > 6: 15485834 371660016 org.apache.lucene.index.Term > 7: 15477679 371464296 > org.apache.lucene.search.spans.SpanTermQuery > 8: 10623486 339951552 org.apache.lucene.index.TermContext > 9: 1516724 150564320 [Ljava.lang.Object; > 10:724486 50948800 [C > 11: 1528110 36674640 java.util.ArrayList > 12:849884 27196288 > org.apache.lucene.search.spans.SpanNearQuery > 13:582008 23280320 > org.apache.lucene.search.spans.SpanNearQuery$SpanNearWeight > 14:481601 23116848 org.apache.lucene.document.FieldType > 15:623073 19938336 org.apache.lucene.document.StoredField > 16:721649 17319576 java.lang.String > 17: 327297329640 [J > 18: 146435788376 [F > {code} > The load on Solr is not much - max it goes to 2000 requests per minute. The > indexing load can sometimes be in burst but most of the time its pretty low. > But as mentioned above sometimes even a single document indexing can put solr > into tizzy and sometimes it just works like a charm. > Edit : > The last configuration on which 6.1 works but not 6.5 is: > *JVM Config:* > > {code:java} > Xms: 2 GB > Xmx : 12 GB > {code} > *Solr Config:* > We also removed soft commit. > {code:java} > > >${solr.autoCommit.maxTime:30} >true > > {code} > *The Host (AWS)
[jira] [Commented] (SOLR-11196) Solr 6.5.0 consuming entire Heap suddenly while working smoothly on Solr 6.1.0
[ https://issues.apache.org/jira/browse/SOLR-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114510#comment-16114510 ] Walter Underwood commented on SOLR-11196: - Ah, missed that openSearcher was false. This host is named production-solr-master, so it might be master-slave. > Solr 6.5.0 consuming entire Heap suddenly while working smoothly on Solr 6.1.0 > -- > > Key: SOLR-11196 > URL: https://issues.apache.org/jira/browse/SOLR-11196 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.5, 6.6 >Reporter: Amit >Priority: Critical > > Please note, this issue does not occurs on Solr-6.1.0 while the same occurs > on Solr-6.5.0 and above. To fix this we had to move back to Solr-6.1.0 > version. > We have been hit by a Solr Behavior in production which we are unable to > debug. To start with here are the configurations for solr: > Solr Version: 6.5, Master with 1 Slave of the same configuration as mentioned > below. > *JVM Config:* > > {code:java} > -Xms2048m > -Xmx4096m > -XX:+ParallelRefProcEnabled > -XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSInitiatingOccupancyFraction=50 > {code} > Rest all are default values. > *Solr Config:* > > {code:java} > > > {solr.autoCommit.maxTime:30} > false > > > > {solr.autoSoftCommit.maxTime:90} > > > > 1024 >autowarmCount="0" /> >autowarmCount="0" /> >autowarmCount="0" /> >initialSize="0" autowarmCount="10" regenerator="solr.NoOpRegenerator" /> > true > 20 > ${solr.query.max.docs:40} > > false > 2 > > {code} > *The Host (AWS) configurations are:* > RAM: 7.65GB > Cores: 4 > Now, our solr works perfectly fine for hours and sometimes for days but > sometimes suddenly memory jumps up and the GC kicks in causing long big > pauses with not much to recover. We are seeing this happening most often when > one or multiple segments gets added or deleted post a hard commit. It doesn't > matter how many documents got indexed. The images attached shows that just 1 > document was indexed, causing an addition of one segment and it all got > messed up till we restarted the Solr. > Here are the images from NewRelic and Sematext (Kindly click on the links to > view): > [JVM Heap Memory Image | https://i.stack.imgur.com/9dQAy.png] > [1 Document and 1 Segment addition Image | > https://i.stack.imgur.com/6N4FC.png] > Update: Here is the JMap output when SOLR last died, we have now increased > the JVM memory to xmx of 12GB: > > {code:java} > num #instances #bytes class name > -- > 1: 11210921 1076248416 > org.apache.lucene.codecs.lucene50.Lucene50PostingsFormat$IntBlockTermState > 2: 10623486 934866768 [Lorg.apache.lucene.index.TermState; > 3: 15567646 475873992 [B > 4: 10623485 424939400 > org.apache.lucene.search.spans.SpanTermQuery$SpanTermWeight > 5: 15508972 372215328 org.apache.lucene.util.BytesRef > 6: 15485834 371660016 org.apache.lucene.index.Term > 7: 15477679 371464296 > org.apache.lucene.search.spans.SpanTermQuery > 8: 10623486 339951552 org.apache.lucene.index.TermContext > 9: 1516724 150564320 [Ljava.lang.Object; > 10:724486 50948800 [C > 11: 1528110 36674640 java.util.ArrayList > 12:849884 27196288 > org.apache.lucene.search.spans.SpanNearQuery > 13:582008 23280320 > org.apache.lucene.search.spans.SpanNearQuery$SpanNearWeight > 14:481601 23116848 org.apache.lucene.document.FieldType > 15:623073 19938336 org.apache.lucene.document.StoredField > 16:721649 17319576 java.lang.String > 17: 327297329640 [J > 18: 146435788376 [F > {code} > The load on Solr is not much - max it goes to 2000 requests per minute. The > indexing load can sometimes be in burst but most of the time its pretty low. > But as mentioned above sometimes even a single document indexing can put solr > into tizzy and sometimes it just works like a charm. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11196) Solr 6.5.0 consuming entire Heap suddenly while working smoothly on Solr 6.1.0
[ https://issues.apache.org/jira/browse/SOLR-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114500#comment-16114500 ] Erick Erickson commented on SOLR-11196: --- Disagree with your point 7. The hard commit has openSearcher set to false so the soft commit is the only thing making documents visible. This is a relatively common pattern to limit the size of the tlog without doing the work of opening new searchers. Otherwise agree totally and would add that the caches are very large relative to the memory. You have a filterCache set to 8192. Each entry can consume maxDoc/8 bytes, have you examined how much actually gets used when you go into the bad state? You say "we have now increased the JVM memory to xmx of 12GB". Where is it coming from when you only have 7.65 GB available? My rule of thumb is to reserve _at least_ half the physical memory for the OS for MMapDirecotry's use, see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html All in all this is a misconfigured system, I doubt it's anything Solr can do much about. I'll close this JIRA, we can re-open it if you can show this is really a Solr problem and not just misconfiguration on your part, but let's discuss this on the user's list first. > Solr 6.5.0 consuming entire Heap suddenly while working smoothly on Solr 6.1.0 > -- > > Key: SOLR-11196 > URL: https://issues.apache.org/jira/browse/SOLR-11196 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.5, 6.6 >Reporter: Amit >Priority: Critical > > Please note, this issue does not occurs on Solr-6.1.0 while the same occurs > on Solr-6.5.0 and above. To fix this we had to move back to Solr-6.1.0 > version. > We have been hit by a Solr Behavior in production which we are unable to > debug. To start with here are the configurations for solr: > Solr Version: 6.5, Master with 1 Slave of the same configuration as mentioned > below. > *JVM Config:* > > {code:java} > -Xms2048m > -Xmx4096m > -XX:+ParallelRefProcEnabled > -XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSInitiatingOccupancyFraction=50 > {code} > Rest all are default values. > *Solr Config:* > > {code:java} > > > {solr.autoCommit.maxTime:30} > false > > > > {solr.autoSoftCommit.maxTime:90} > > > > 1024 >autowarmCount="0" /> >autowarmCount="0" /> >autowarmCount="0" /> >initialSize="0" autowarmCount="10" regenerator="solr.NoOpRegenerator" /> > true > 20 > ${solr.query.max.docs:40} > > false > 2 > > {code} > *The Host (AWS) configurations are:* > RAM: 7.65GB > Cores: 4 > Now, our solr works perfectly fine for hours and sometimes for days but > sometimes suddenly memory jumps up and the GC kicks in causing long big > pauses with not much to recover. We are seeing this happening most often when > one or multiple segments gets added or deleted post a hard commit. It doesn't > matter how many documents got indexed. The images attached shows that just 1 > document was indexed, causing an addition of one segment and it all got > messed up till we restarted the Solr. > Here are the images from NewRelic and Sematext (Kindly click on the links to > view): > [JVM Heap Memory Image | https://i.stack.imgur.com/9dQAy.png] > [1 Document and 1 Segment addition Image | > https://i.stack.imgur.com/6N4FC.png] > Update: Here is the JMap output when SOLR last died, we have now increased > the JVM memory to xmx of 12GB: > > {code:java} > num #instances #bytes class name > -- > 1: 11210921 1076248416 > org.apache.lucene.codecs.lucene50.Lucene50PostingsFormat$IntBlockTermState > 2: 10623486 934866768 [Lorg.apache.lucene.index.TermState; > 3: 15567646 475873992 [B > 4: 10623485 424939400 > org.apache.lucene.search.spans.SpanTermQuery$SpanTermWeight > 5: 15508972 372215328 org.apache.lucene.util.BytesRef > 6: 15485834 371660016 org.apache.lucene.index.Term > 7: 15477679 371464296 > org.apache.lucene.search.spans.SpanTermQuery > 8: 10623486 339951552 org.apache.lucene.index.TermContext > 9: 1516724 150564320 [Ljava.lang.Object; > 10:724486 50948800 [C > 11: 1528110 36674640 java.util.ArrayList > 12:849884 27196288 > org.apache.lucene.search.spans.SpanNearQuery > 13:582008 23280320 > org.apache.lucene.search.spans.SpanNearQuery$SpanNearWeight > 14:481601 23116848 org.apache.lucene.document.FieldType
[jira] [Commented] (SOLR-11196) Solr 6.5.0 consuming entire Heap suddenly while working smoothly on Solr 6.1.0
[ https://issues.apache.org/jira/browse/SOLR-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114490#comment-16114490 ] Erick Erickson commented on SOLR-11196: --- cut-n-paste from the user's list reply from Walter Underwood: 1. This should be a question on the solr-u...@lucene.apache.org, not a bug report. 2. A 12 GB heap on an instance with 7.65 GB of RAM is a fatal configuration. A full GC will cause lots of swapping and an extreme slowdown. 3. A 4 GB heap on an instance with 7.65 GB of RAM is not a good configuration. That does not leave enough room for the OS, other processes, and file buffers to cache Solr’s index files. 4. That instance is pretty small for Solr. The smallest AWS instance we run has 15 GB of RAM. We run an 8 GB heap. Check the disk access on New Relic during the slowdown. 5. Does this instance swap to magnetic disk? Are the Solr indexes on magnetic ephemeral or magnetic EBS? Check the iops on New Relic. When you hit the max iops for a disk volume, very bad performance things happen. 6. Set -Xms equal to -Xmx. Growing the heap to max at startup is a waste of time and makes Solr slow at the beginning. The heap will always get to max. 7. Setting a longer time for auto soft commit than for auto hard commit is nonsense. Just don’t do the soft commit. wunder > Solr 6.5.0 consuming entire Heap suddenly while working smoothly on Solr 6.1.0 > -- > > Key: SOLR-11196 > URL: https://issues.apache.org/jira/browse/SOLR-11196 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.5, 6.6 >Reporter: Amit >Priority: Critical > > Please note, this issue does not occurs on Solr-6.1.0 while the same occurs > on Solr-6.5.0 and above. To fix this we had to move back to Solr-6.1.0 > version. > We have been hit by a Solr Behavior in production which we are unable to > debug. To start with here are the configurations for solr: > Solr Version: 6.5, Master with 1 Slave of the same configuration as mentioned > below. > *JVM Config:* > > {code:java} > -Xms2048m > -Xmx4096m > -XX:+ParallelRefProcEnabled > -XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSInitiatingOccupancyFraction=50 > {code} > Rest all are default values. > *Solr Config:* > > {code:java} > > > {solr.autoCommit.maxTime:30} > false > > > > {solr.autoSoftCommit.maxTime:90} > > > > 1024 >autowarmCount="0" /> >autowarmCount="0" /> >autowarmCount="0" /> >initialSize="0" autowarmCount="10" regenerator="solr.NoOpRegenerator" /> > true > 20 > ${solr.query.max.docs:40} > > false > 2 > > {code} > *The Host (AWS) configurations are:* > RAM: 7.65GB > Cores: 4 > Now, our solr works perfectly fine for hours and sometimes for days but > sometimes suddenly memory jumps up and the GC kicks in causing long big > pauses with not much to recover. We are seeing this happening most often when > one or multiple segments gets added or deleted post a hard commit. It doesn't > matter how many documents got indexed. The images attached shows that just 1 > document was indexed, causing an addition of one segment and it all got > messed up till we restarted the Solr. > Here are the images from NewRelic and Sematext (Kindly click on the links to > view): > [JVM Heap Memory Image | https://i.stack.imgur.com/9dQAy.png] > [1 Document and 1 Segment addition Image | > https://i.stack.imgur.com/6N4FC.png] > Update: Here is the JMap output when SOLR last died, we have now increased > the JVM memory to xmx of 12GB: > > {code:java} > num #instances #bytes class name > -- > 1: 11210921 1076248416 > org.apache.lucene.codecs.lucene50.Lucene50PostingsFormat$IntBlockTermState > 2: 10623486 934866768 [Lorg.apache.lucene.index.TermState; > 3: 15567646 475873992 [B > 4: 10623485 424939400 > org.apache.lucene.search.spans.SpanTermQuery$SpanTermWeight > 5: 15508972 372215328 org.apache.lucene.util.BytesRef > 6: 15485834 371660016 org.apache.lucene.index.Term > 7: 15477679 371464296 > org.apache.lucene.search.spans.SpanTermQuery > 8: 10623486 339951552 org.apache.lucene.index.TermContext > 9: 1516724 150564320 [Ljava.lang.Object; > 10:724486 50948800 [C > 11: 1528110 36674640 java.util.ArrayList > 12:849884 27196288 > org.apache.lucene.search.spans.SpanNearQuery > 13:582008 23280320 > org.apache.lucene.search.spans.SpanNearQuery$SpanNearWeight > 14: