Safe to change numVersionBuckets?
I have a SolrCloud cluster (version 6.5.1) with around 3300 cores per instance. I've been investigating what is driving heap utilization since it is higher than I expected. I took a heap dump and found the largest driver of heap utilization is the array of VersionBucket objects in the org.apache.solr.update.VersionInfo class. The array is size 65536 and there is one per SolrCore instance. Each instance of the array is 1.8MB so the aggregate size is 6GB in heap. I understand from reading the discussion in SOLR-6820 that 65536 is the recommended default for this setting now because it results in higher document write rates than the old default of 256. I would like to reduce my heap utilization and I'm OK with somewhat slower document writing throughput. My question is, it is safe to reduce the value of numVersionBuckets on all of my existing cores without reindexing my data? My solrconfig.xml contains this for all of my collections: ${solr.ulog.dir:} ${solr.ulog.numVersionBuckets:65536} Assuming it is safe to change, can I just add a vm arg to the Solr process like "-Dsolr.ulog.numVersionBuckets=256" to override the value for all cores at once? Or do I have to change and re-upload the solrconfig.xml files and reload the cores? Thanks
Re: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1
I've been trying to understand DocumentsWriterFlushControl.java to figure this one out. I don't really have a firm grasp of it but I'm starting to suspect that blocked flushes in aggregate can take up to (ramBufferSizeMB * maximum # of concurrent update requests * # of cores) of heap space and that I need to limit how many concurrent update requests are sent to the same Solr node at the same time to something much lower than my current 240. I don't know this for sure.. it is mostly a guess based on the fact that one of the DocumentsWriter instances in my heap dump has just under 240 items in the blockedFlushes list and each of those is retaining up to 57MB of heap space (which is less than ramBufferSizeMB=100 but in the ballpark). Can anyone shed light on whether I'm going down the right path here? On Mon, Oct 16, 2017 at 5:34 PM David M Giannone wrote: > > > > > Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone > > > ---- Original message > From: Randy Fradin > Date: 10/16/17 7:38 PM (GMT-05:00) > To: solr-user@lucene.apache.org > Subject: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1 > > Each shard has around 4.2 million documents which are around 40GB on disk. > Two nodes have 3 shard replicas each and the third has 2 shard replicas. > > The text of the exception is: java.lang.OutOfMemoryError: Java heap space > And the heap dump is a full 24GB indicating the full heap space was being > used. > > Here is the solrconfig as output by the config request handler: > > { > "responseHeader":{ > "status":0, > "QTime":0}, > "config":{ > "znodeVersion":0, > "luceneMatchVersion":"org.apache.lucene.util.Version:6.5.1", > "updateHandler":{ > "indexWriter":{"closeWaitsForMerges":true}, > "commitWithin":{"softCommit":true}, > "autoCommit":{ > "maxDocs":5, > "maxTime":30, > "openSearcher":false}, > "autoSoftCommit":{ > "maxDocs":-1, > "maxTime":3}}, > "query":{ > "useFilterForSortedQuery":false, > "queryResultWindowSize":1, > "queryResultMaxDocsCached":2147483647 <(214)%20748-3647>, > "enableLazyFieldLoading":false, > "maxBooleanClauses":1024, > "":{ > "size":"1", > "showItems":"-1", > "initialSize":"10", > "name":"fieldValueCache"}}, > "jmx":{ > "agentId":null, > "serviceUrl":null, > "rootName":null}, > "requestHandler":{ > "/select":{ > "name":"/select", > "defaults":{ > "rows":10, > "echoParams":"explicit"}, > "class":"solr.SearchHandler"}, > "/update":{ > "useParams":"_UPDATE", > "class":"solr.UpdateRequestHandler", > "name":"/update"}, > "/update/json":{ > "useParams":"_UPDATE_JSON", > "class":"solr.UpdateRequestHandler", > "invariants":{"update.contentType":"application/json"}, > "name":"/update/json"}, > "/update/csv":{ > "useParams":"_UPDATE_CSV", > "class":"solr.UpdateRequestHandler", > "invariants":{"update.contentType":"application/csv"}, > "name":"/update/csv"}, > "/update/json/docs":{ > "useParams":"_UPDATE_JSON_DOCS", > "class":"solr.UpdateRequestHandler", > "invariants":{ > "update.contentType":"application/json", > "json.command":"false"}, > "name":"/update/json/docs"}, > "update":{ > "class":"solr.UpdateRequestHandlerApi", > "useParams":"_UPDATE_JSON_DOCS", > "name":"update"}, > "/config":{ > "useParams":"_CONFIG", > "class":&qu
Re: OOM during indexing with 24G heap - Solr 6.5.1
foHandler", "useParams":"_ADMIN_SYSTEM", "name":"/admin/system"}, "/admin/mbeans":{ "class":"solr.SolrInfoMBeanHandler", "useParams":"_ADMIN_MBEANS", "name":"/admin/mbeans"}, "/admin/plugins":{ "class":"solr.PluginInfoHandler", "name":"/admin/plugins"}, "/admin/threads":{ "class":"solr.ThreadDumpHandler", "useParams":"_ADMIN_THREADS", "name":"/admin/threads"}, "/admin/properties":{ "class":"solr.PropertiesRequestHandler", "useParams":"_ADMIN_PROPERTIES", "name":"/admin/properties"}, "/admin/logging":{ "class":"solr.LoggingHandler", "useParams":"_ADMIN_LOGGING", "name":"/admin/logging"}, "/admin/file":{ "class":"solr.ShowFileRequestHandler", "useParams":"_ADMIN_FILE", "name":"/admin/file"}, "/export":{ "class":"solr.ExportHandler", "useParams":"_EXPORT", "components":["query"], "defaults":{"wt":"json"}, "invariants":{ "rq":"{!xport}", "distrib":false}, "name":"/export"}, "/graph":{ "class":"solr.GraphHandler", "useParams":"_ADMIN_GRAPH", "invariants":{ "wt":"graphml", "distrib":false}, "name":"/graph"}, "/stream":{ "class":"solr.StreamHandler", "useParams":"_STREAM", "defaults":{"wt":"json"}, "invariants":{"distrib":false}, "name":"/stream"}, "/sql":{ "class":"solr.SQLHandler", "useParams":"_SQL", "defaults":{"wt":"json"}, "invariants":{"distrib":false}, "name":"/sql"}, "/terms":{ "class":"solr.SearchHandler", "useParams":"_TERMS", "components":["terms"], "name":"/terms"}, "/analysis/document":{ "class":"solr.DocumentAnalysisRequestHandler", "startup":"lazy", "useParams":"_ANALYSIS_DOCUMENT", "name":"/analysis/document"}, "/analysis/field":{ "class":"solr.FieldAnalysisRequestHandler", "startup":"lazy", "useParams":"_ANALYSIS_FIELD", "name":"/analysis/field"}, "/debug/dump":{ "class":"solr.DumpRequestHandler", "useParams":"_DEBUG_DUMP", "defaults":{ "echoParams":"explicit", "echoHandler":true}, "name":"/debug/dump"}}, "updateRequestProcessorChain":[{ "default":"true", "name":"customupdatechain", "":[{"class":"org.apache.solr.update.processor.CustomDedupProcessorFactory"}, {"class":"solr.LogUpdateProcessorFactory"}, {"class":"solr.RunUpdateProcessorFactory"}]}], "updateHandlerupdateLog":{ "dir":"", "numVersionBuckets":65536}, "requestDispatcher":{ "handleSelect":true, "httpCaching":{ "never304":false, "etagSeed":"Solr", "lastModFrom":"opentime", "cacheControl":null}, "requestParsers":{ "multipartUploadLimitKB":2048, "formUploadLimitKB":2048, "addHttpRequestToContext":false}}, "indexConfig":{ "useCompoundFile":false, "maxBufferedDocs":-1, "maxMergeDocs":-1, "mergeFactor":-1, "ramBufferSizeMB":100.0, "writeLockTimeout":-1, "lockType":"native", "infoStreamEnabled":false, "metrics":{}}, "peerSync":{"useRangeVersions":true}}} On Mon, Oct 16, 2017 at 3:38 PM Shawn Heisey wrote: > On 10/16/2017 3:19 PM, Randy Fradin wrote: > > We are seeing a lot of full GC events and eventual OOM errors in Solr > > during indexing. This is Solr 6.5.1 running in cloud mode with a 24G > heap. > > At these times indexing is the only activity taking place. The collection > > has 4 shards and 2 replicas across 3 nodes. Each document is ~10KB (a few > > hundred fields each), and indexing is using the normal update handler, 1 > > document per request, up to 240 request at a time. > > > > The heap dump taken automatically on OOM shows 18.3GB of heap taken by 3 > > instances of DocumentsWriter. Within those instances, all of the heap is > > retained by the blockedFlushes LinkedList inside the flushControl object. > > Each node in the LinkedList appears to be retaining around 55MB. > > > > Clearly something to do with flushing is at play here but I'm at a loss > > what tuning parameters I should be looking at. I would expect things to > > start blocking if I fall too far behind on flushing but apparently that's > > not happening. The ramBufferSizeMB is set to the default 100. My heap > size > > is already absurdly more than I thought we would need for this volume. > > One of the first things we need to find out is about your index size. > > In each of your shards, how many documents are there? How much disk > space does one shard replica take up? How many shard replica cores does > each node have on it in total? > > I would also like to get a look at your full solrconfig.xml file. The > schema may be helpful at a later date, along with an example of a > document that you're indexing. With ramBufferSizeMB at the default, > having a ton of memory used up by a class used for indexing seems very odd. > > Do you have the text of the OOM exception? Is it saying out of heap > space, or some other problem? > > Thanks, > Shawn > >
OOM during indexing with 24G heap - Solr 6.5.1
We are seeing a lot of full GC events and eventual OOM errors in Solr during indexing. This is Solr 6.5.1 running in cloud mode with a 24G heap. At these times indexing is the only activity taking place. The collection has 4 shards and 2 replicas across 3 nodes. Each document is ~10KB (a few hundred fields each), and indexing is using the normal update handler, 1 document per request, up to 240 request at a time. The heap dump taken automatically on OOM shows 18.3GB of heap taken by 3 instances of DocumentsWriter. Within those instances, all of the heap is retained by the blockedFlushes LinkedList inside the flushControl object. Each node in the LinkedList appears to be retaining around 55MB. Clearly something to do with flushing is at play here but I'm at a loss what tuning parameters I should be looking at. I would expect things to start blocking if I fall too far behind on flushing but apparently that's not happening. The ramBufferSizeMB is set to the default 100. My heap size is already absurdly more than I thought we would need for this volume. Any idea what could be causing this?