Re: OOM during indexing with 24G heap - Solr 6.5.1
On 10/16/2017 5:38 PM, Randy Fradin wrote: > Each shard has around 4.2 million documents which are around 40GB on disk. > Two nodes have 3 shard replicas each and the third has 2 shard replicas. > > The text of the exception is: java.lang.OutOfMemoryError: Java heap space > And the heap dump is a full 24GB indicating the full heap space was being > used. > > Here is the solrconfig as output by the config request handler: I was hoping for the actual XML, but I don't see any red flags in the output you've provided. It does look like you've probably got a very minimal configuration. Some things that I expected to see (and do see on my own systems) aren't in the handler output at all. With only 12 million docs on the machine, I would not expect any need for 24GB of heap except in the case of a large number of particularly RAM-hungry complex queries. The ratio of index size to document count says that the documents are bigger than what I think of as typical, but not what I would call enormous. If there's any way you can adjust your schema to remove unused parts and reduce the index size, that would be a good idea, but I don't consider that to be an immediate action item. Your index size is well within what Solr should be able to handle easily -- if there are sufficient system resources, memory in particular. The 6.5.1 version of Solr that you're running should have most known memory leak issues fixed -- and there are not many of those. I'm not aware of any leak problems that would affect Lucene's DocumentsWriter class, where you said most of the heap was being consumed. That doesn't necessarily mean there isn't a leak bug that applies, just that I am not aware of any. You have indicated that you're doing a very large number of concurrent update requests, up to 240 at the same time. I cannot imagine a situation where Lucene would require a buffer (100 MB in your config) for every indexing thread. That would really cause some major memory issues with Lucene and Solr installations. Your description of what you have in your heap sounds a little bit different than a buffer per indexing thread. It sounds like your indexing has resulted in a LOT of flushes, which is probably normal, except that the flush queue doesn't appear to be getting emptied. If I'm right, either your indexing is happening faster than Lucene can flush the segments that get built, or there is something preventing Lucene from actually doing the flush. I do not see any indication in the code that Lucene ever imposes a limit on the number of queued flushes, but in a system that's working correctly, it probably doesn't have to. My theories here should be validated by somebody who has much better insight into Lucene than I do. I'm interested in seeing some details about the system and the processes running. What OS is this running on? If it's something other than Windows, you probably have the "top" utility installed. The gnu version of top has a keyboard shortcut (shift-M) to sort by memory usage. If it's available, run top (not htop or any other variant), press the key to sort by memory, and grab a screenshot. On recent versions of Windows, there's a program called Resource Monitor. If you're on Windows, run that program, click on the memory tab, sort by Private, make sure that the memory graph and MB counts below the process list are fully visible, and grab a screenshot. It is unlikely that you'll be able to send a screenshot image to the list, so you'll probably need a file sharing website. Thanks, Shawn
Re: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1
Randy That is one issue, i don't know if it fixes everything for you or not. However, Lucene doesn't put a limit on number of incoming requests and after https://issues.apache.org/jira/browse/LUCENE-6659 , solr has no way (i don't know at least) to limit threads. So if you have ton of parallel updates reaching the solr websever, it can cause a performance problem. On Tue, Oct 17, 2017 at 10:52 AM, Randy Fradin wrote: > I've been trying to understand DocumentsWriterFlushControl.java to figure > this one out. I don't really have a firm grasp of it but I'm starting to > suspect that blocked flushes in aggregate can take up to (ramBufferSizeMB * > maximum # of concurrent update requests * # of cores) of heap space and > that I need to limit how many concurrent update requests are sent to the > same Solr node at the same time to something much lower than my current > 240. I don't know this for sure.. it is mostly a guess based on the fact > that one of the DocumentsWriter instances in my heap dump has just under > 240 items in the blockedFlushes list and each of those is retaining up to > 57MB of heap space (which is less than ramBufferSizeMB=100 but in the > ballpark). > > Can anyone shed light on whether I'm going down the right path here? > > > On Mon, Oct 16, 2017 at 5:34 PM David M Giannone > wrote: > > > > > > > > > > > Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone > > > > > > Original message ---- > > From: Randy Fradin > > Date: 10/16/17 7:38 PM (GMT-05:00) > > To: solr-user@lucene.apache.org > > Subject: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1 > > > > Each shard has around 4.2 million documents which are around 40GB on > disk. > > Two nodes have 3 shard replicas each and the third has 2 shard replicas. > > > > The text of the exception is: java.lang.OutOfMemoryError: Java heap space > > And the heap dump is a full 24GB indicating the full heap space was being > > used. > > > > Here is the solrconfig as output by the config request handler: > > > > { > > "responseHeader":{ > > "status":0, > > "QTime":0}, > > "config":{ > > "znodeVersion":0, > > "luceneMatchVersion":"org.apache.lucene.util.Version:6.5.1", > > "updateHandler":{ > > "indexWriter":{"closeWaitsForMerges":true}, > > "commitWithin":{"softCommit":true}, > > "autoCommit":{ > > "maxDocs":5, > > "maxTime":30, > > "openSearcher":false}, > > "autoSoftCommit":{ > > "maxDocs":-1, > > "maxTime":3}}, > > "query":{ > > "useFilterForSortedQuery":false, > > "queryResultWindowSize":1, > > "queryResultMaxDocsCached":2147483647 <(214)%20748-3647>, > > "enableLazyFieldLoading":false, > > "maxBooleanClauses":1024, > > "":{ > > "size":"1", > > "showItems":"-1", > > "initialSize":"10", > > "name":"fieldValueCache"}}, > > "jmx":{ > > "agentId":null, > > "serviceUrl":null, > > "rootName":null}, > > "requestHandler":{ > > "/select":{ > > "name":"/select", > > "defaults":{ > > "rows":10, > > "echoParams":"explicit"}, > > "class":"solr.SearchHandler"}, > > "/update":{ > > "useParams":"_UPDATE", > > "class":"solr.UpdateRequestHandler", > > "name":"/update"}, > > "/update/json":{ > > "useParams":"_UPDATE_JSON", > > "class":"solr.UpdateRequestHandler", > > "invariants":{"update.contentType":"application/json"}, > > "name":"/update/json"}, > > "/update/csv":{ > > "useParams":"_UPDATE_CSV", > > "class":"solr.UpdateRequestHandler", > >
Re: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1
I've been trying to understand DocumentsWriterFlushControl.java to figure this one out. I don't really have a firm grasp of it but I'm starting to suspect that blocked flushes in aggregate can take up to (ramBufferSizeMB * maximum # of concurrent update requests * # of cores) of heap space and that I need to limit how many concurrent update requests are sent to the same Solr node at the same time to something much lower than my current 240. I don't know this for sure.. it is mostly a guess based on the fact that one of the DocumentsWriter instances in my heap dump has just under 240 items in the blockedFlushes list and each of those is retaining up to 57MB of heap space (which is less than ramBufferSizeMB=100 but in the ballpark). Can anyone shed light on whether I'm going down the right path here? On Mon, Oct 16, 2017 at 5:34 PM David M Giannone wrote: > > > > > Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone > > > Original message > From: Randy Fradin > Date: 10/16/17 7:38 PM (GMT-05:00) > To: solr-user@lucene.apache.org > Subject: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1 > > Each shard has around 4.2 million documents which are around 40GB on disk. > Two nodes have 3 shard replicas each and the third has 2 shard replicas. > > The text of the exception is: java.lang.OutOfMemoryError: Java heap space > And the heap dump is a full 24GB indicating the full heap space was being > used. > > Here is the solrconfig as output by the config request handler: > > { > "responseHeader":{ > "status":0, > "QTime":0}, > "config":{ > "znodeVersion":0, > "luceneMatchVersion":"org.apache.lucene.util.Version:6.5.1", > "updateHandler":{ > "indexWriter":{"closeWaitsForMerges":true}, > "commitWithin":{"softCommit":true}, > "autoCommit":{ > "maxDocs":5, > "maxTime":30, > "openSearcher":false}, > "autoSoftCommit":{ > "maxDocs":-1, > "maxTime":3}}, > "query":{ > "useFilterForSortedQuery":false, > "queryResultWindowSize":1, > "queryResultMaxDocsCached":2147483647 <(214)%20748-3647>, > "enableLazyFieldLoading":false, > "maxBooleanClauses":1024, > "":{ > "size":"1", > "showItems":"-1", > "initialSize":"10", > "name":"fieldValueCache"}}, > "jmx":{ > "agentId":null, > "serviceUrl":null, > "rootName":null}, > "requestHandler":{ > "/select":{ > "name":"/select", > "defaults":{ > "rows":10, > "echoParams":"explicit"}, > "class":"solr.SearchHandler"}, > "/update":{ > "useParams":"_UPDATE", > "class":"solr.UpdateRequestHandler", > "name":"/update"}, > "/update/json":{ > "useParams":"_UPDATE_JSON", > "class":"solr.UpdateRequestHandler", > "invariants":{"update.contentType":"application/json"}, > "name":"/update/json"}, > "/update/csv":{ > "useParams":"_UPDATE_CSV", > "class":"solr.UpdateRequestHandler", > "invariants":{"update.contentType":"application/csv"}, > "name":"/update/csv"}, > "/update/json/docs":{ > "useParams":"_UPDATE_JSON_DOCS", > "class":"solr.UpdateRequestHandler", > "invariants":{ > "update.contentType":"application/json", > "json.command":"false"}, > "name":"/update/json/docs"}, > "update":{ > "class":"solr.UpdateRequestHandlerApi", > "useParams":"_UPDATE_JSON_DOCS", > "name":"update"}, > "/config":{ > "useParams":"_CONFIG", > "class":&qu
Re: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1
Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone Original message From: Randy Fradin Date: 10/16/17 7:38 PM (GMT-05:00) To: solr-user@lucene.apache.org Subject: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1 Each shard has around 4.2 million documents which are around 40GB on disk. Two nodes have 3 shard replicas each and the third has 2 shard replicas. The text of the exception is: java.lang.OutOfMemoryError: Java heap space And the heap dump is a full 24GB indicating the full heap space was being used. Here is the solrconfig as output by the config request handler: { "responseHeader":{ "status":0, "QTime":0}, "config":{ "znodeVersion":0, "luceneMatchVersion":"org.apache.lucene.util.Version:6.5.1", "updateHandler":{ "indexWriter":{"closeWaitsForMerges":true}, "commitWithin":{"softCommit":true}, "autoCommit":{ "maxDocs":5, "maxTime":30, "openSearcher":false}, "autoSoftCommit":{ "maxDocs":-1, "maxTime":3}}, "query":{ "useFilterForSortedQuery":false, "queryResultWindowSize":1, "queryResultMaxDocsCached":2147483647, "enableLazyFieldLoading":false, "maxBooleanClauses":1024, "":{ "size":"1", "showItems":"-1", "initialSize":"10", "name":"fieldValueCache"}}, "jmx":{ "agentId":null, "serviceUrl":null, "rootName":null}, "requestHandler":{ "/select":{ "name":"/select", "defaults":{ "rows":10, "echoParams":"explicit"}, "class":"solr.SearchHandler"}, "/update":{ "useParams":"_UPDATE", "class":"solr.UpdateRequestHandler", "name":"/update"}, "/update/json":{ "useParams":"_UPDATE_JSON", "class":"solr.UpdateRequestHandler", "invariants":{"update.contentType":"application/json"}, "name":"/update/json"}, "/update/csv":{ "useParams":"_UPDATE_CSV", "class":"solr.UpdateRequestHandler", "invariants":{"update.contentType":"application/csv"}, "name":"/update/csv"}, "/update/json/docs":{ "useParams":"_UPDATE_JSON_DOCS", "class":"solr.UpdateRequestHandler", "invariants":{ "update.contentType":"application/json", "json.command":"false"}, "name":"/update/json/docs"}, "update":{ "class":"solr.UpdateRequestHandlerApi", "useParams":"_UPDATE_JSON_DOCS", "name":"update"}, "/config":{ "useParams":"_CONFIG", "class":"solr.SolrConfigHandler", "name":"/config"}, "/schema":{ "class":"solr.SchemaHandler", "useParams":"_SCHEMA", "name":"/schema"}, "/replication":{ "class":"solr.ReplicationHandler", "useParams":"_REPLICATION", "name":"/replication"}, "/get":{ "class":"solr.RealTimeGetHandler", "useParams":"_GET", "defaults":{ "omitHeader":true, "wt":"json", "indent":true}, "name":"/get"}, "/admin/ping":{ "class":"solr.PingRequestHandler", "useParams":"_ADMIN_PING", "invariants":{ "echoParams":"all", "q":"{!lucene}*:*"}, "name":"/admin/ping"}, "/admin/segments":{ "class":"solr.SegmentsInfoRequestHandler", "useParams":"_ADMIN_SEGMENTS", "name":"/admin/segments"},
Re: OOM during indexing with 24G heap - Solr 6.5.1
Each shard has around 4.2 million documents which are around 40GB on disk. Two nodes have 3 shard replicas each and the third has 2 shard replicas. The text of the exception is: java.lang.OutOfMemoryError: Java heap space And the heap dump is a full 24GB indicating the full heap space was being used. Here is the solrconfig as output by the config request handler: { "responseHeader":{ "status":0, "QTime":0}, "config":{ "znodeVersion":0, "luceneMatchVersion":"org.apache.lucene.util.Version:6.5.1", "updateHandler":{ "indexWriter":{"closeWaitsForMerges":true}, "commitWithin":{"softCommit":true}, "autoCommit":{ "maxDocs":5, "maxTime":30, "openSearcher":false}, "autoSoftCommit":{ "maxDocs":-1, "maxTime":3}}, "query":{ "useFilterForSortedQuery":false, "queryResultWindowSize":1, "queryResultMaxDocsCached":2147483647, "enableLazyFieldLoading":false, "maxBooleanClauses":1024, "":{ "size":"1", "showItems":"-1", "initialSize":"10", "name":"fieldValueCache"}}, "jmx":{ "agentId":null, "serviceUrl":null, "rootName":null}, "requestHandler":{ "/select":{ "name":"/select", "defaults":{ "rows":10, "echoParams":"explicit"}, "class":"solr.SearchHandler"}, "/update":{ "useParams":"_UPDATE", "class":"solr.UpdateRequestHandler", "name":"/update"}, "/update/json":{ "useParams":"_UPDATE_JSON", "class":"solr.UpdateRequestHandler", "invariants":{"update.contentType":"application/json"}, "name":"/update/json"}, "/update/csv":{ "useParams":"_UPDATE_CSV", "class":"solr.UpdateRequestHandler", "invariants":{"update.contentType":"application/csv"}, "name":"/update/csv"}, "/update/json/docs":{ "useParams":"_UPDATE_JSON_DOCS", "class":"solr.UpdateRequestHandler", "invariants":{ "update.contentType":"application/json", "json.command":"false"}, "name":"/update/json/docs"}, "update":{ "class":"solr.UpdateRequestHandlerApi", "useParams":"_UPDATE_JSON_DOCS", "name":"update"}, "/config":{ "useParams":"_CONFIG", "class":"solr.SolrConfigHandler", "name":"/config"}, "/schema":{ "class":"solr.SchemaHandler", "useParams":"_SCHEMA", "name":"/schema"}, "/replication":{ "class":"solr.ReplicationHandler", "useParams":"_REPLICATION", "name":"/replication"}, "/get":{ "class":"solr.RealTimeGetHandler", "useParams":"_GET", "defaults":{ "omitHeader":true, "wt":"json", "indent":true}, "name":"/get"}, "/admin/ping":{ "class":"solr.PingRequestHandler", "useParams":"_ADMIN_PING", "invariants":{ "echoParams":"all", "q":"{!lucene}*:*"}, "name":"/admin/ping"}, "/admin/segments":{ "class":"solr.SegmentsInfoRequestHandler", "useParams":"_ADMIN_SEGMENTS", "name":"/admin/segments"}, "/admin/luke":{ "class":"solr.LukeRequestHandler", "useParams":"_ADMIN_LUKE", "name":"/admin/luke"}, "/admin/system":{ "class":"solr.SystemInfoHandler", "useParams":"_ADMIN_SYSTEM", "name":"/admin/system"}, "/admin/mbeans":{ "class":"solr.SolrInfoMBeanHandler", "useParams":"_ADMIN_MBEANS", "name":"/admin/mbeans"}, "/admin/plugins":{ "class":"solr.PluginInfoHandler", "name":"/admin/plugins"}, "/admin/threads":{ "class":"solr.ThreadDumpHandler", "useParams":"_ADMIN_THREADS", "name":"/admin/threads"}, "/admin/properties":{ "class":"solr.PropertiesRequestHandler", "useParams":"_ADMIN_PROPERTIES", "name":"/admin/properties"}, "/admin/logging":{ "class":"solr.LoggingHandler", "useParams":"_ADMIN_LOGGING", "name":"/admin/logging"}, "/admin/file":{ "class":"solr.ShowFileRequestHandler", "useParams":"_ADMIN_FILE", "name":"/admin/file"}, "/export":{ "class":"solr.ExportHandler", "useParams":"_EXPORT", "components":["query"], "defaults":{"wt":"json"}, "invariants":{ "rq":"{!xport}", "distrib":false}, "name":"/export"}, "/graph":{ "class":"solr.GraphHandler", "useParams":"_ADMIN_GRAPH", "invariants":{ "wt":"graphml", "distrib":false}, "name":"/graph"}, "/stream":{ "class":"solr.StreamHandler", "useParams":"_STREAM", "defaults":{"wt":"json"}, "invariants":{"distrib":false}, "name":"/stream"}, "/sql":{ "class"
Re: OOM during indexing with 24G heap - Solr 6.5.1
On 10/16/2017 3:19 PM, Randy Fradin wrote: > We are seeing a lot of full GC events and eventual OOM errors in Solr > during indexing. This is Solr 6.5.1 running in cloud mode with a 24G heap. > At these times indexing is the only activity taking place. The collection > has 4 shards and 2 replicas across 3 nodes. Each document is ~10KB (a few > hundred fields each), and indexing is using the normal update handler, 1 > document per request, up to 240 request at a time. > > The heap dump taken automatically on OOM shows 18.3GB of heap taken by 3 > instances of DocumentsWriter. Within those instances, all of the heap is > retained by the blockedFlushes LinkedList inside the flushControl object. > Each node in the LinkedList appears to be retaining around 55MB. > > Clearly something to do with flushing is at play here but I'm at a loss > what tuning parameters I should be looking at. I would expect things to > start blocking if I fall too far behind on flushing but apparently that's > not happening. The ramBufferSizeMB is set to the default 100. My heap size > is already absurdly more than I thought we would need for this volume. One of the first things we need to find out is about your index size. In each of your shards, how many documents are there? How much disk space does one shard replica take up? How many shard replica cores does each node have on it in total? I would also like to get a look at your full solrconfig.xml file. The schema may be helpful at a later date, along with an example of a document that you're indexing. With ramBufferSizeMB at the default, having a ton of memory used up by a class used for indexing seems very odd. Do you have the text of the OOM exception? Is it saying out of heap space, or some other problem? Thanks, Shawn