Safe to change numVersionBuckets?

2018-03-30 Thread Randy Fradin
I have a SolrCloud cluster (version 6.5.1) with around 3300 cores per
instance. I've been investigating what is driving heap utilization since it
is higher than I expected. I took a heap dump and found the largest driver
of heap utilization is the array of VersionBucket objects in the
org.apache.solr.update.VersionInfo class. The array is size 65536 and there
is one per SolrCore instance. Each instance of the array is 1.8MB so the
aggregate size is 6GB in heap.

I understand from reading the discussion in SOLR-6820 that 65536 is the
recommended default for this setting now because it results in higher
document write rates than the old default of 256. I would like to reduce my
heap utilization and I'm OK with somewhat slower document writing
throughput. My question is, it is safe to reduce the value
of numVersionBuckets on all of my existing cores without reindexing my data?

My solrconfig.xml contains this for all of my collections:


  
${solr.ulog.dir:}
${solr.ulog.numVersionBuckets:65536}
  


Assuming it is safe to change, can I just add a vm arg to the Solr process
like "-Dsolr.ulog.numVersionBuckets=256" to override the value for all
cores at once? Or do I have to change and re-upload the solrconfig.xml
files and reload the cores?

Thanks


Re: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1

2017-10-17 Thread Randy Fradin
I've been trying to understand DocumentsWriterFlushControl.java to figure
this one out. I don't really have a firm grasp of it but I'm starting to
suspect that blocked flushes in aggregate can take up to (ramBufferSizeMB *
maximum # of concurrent update requests * # of cores) of heap space and
that I need to limit how many concurrent update requests are sent to the
same Solr node at the same time to something much lower than my current
240. I don't know this for sure.. it is mostly a guess based on the fact
that one of the DocumentsWriter instances in my heap dump has just under
240 items in the blockedFlushes list and each of those is retaining up to
57MB of heap space (which is less than ramBufferSizeMB=100 but in the
ballpark).

Can anyone shed light on whether I'm going down the right path here?


On Mon, Oct 16, 2017 at 5:34 PM David M Giannone 
wrote:

>
>
>
>
> Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
>
>
> ---- Original message 
> From: Randy Fradin 
> Date: 10/16/17 7:38 PM (GMT-05:00)
> To: solr-user@lucene.apache.org
> Subject: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1
>
> Each shard has around 4.2 million documents which are around 40GB on disk.
> Two nodes have 3 shard replicas each and the third has 2 shard replicas.
>
> The text of the exception is: java.lang.OutOfMemoryError: Java heap space
> And the heap dump is a full 24GB indicating the full heap space was being
> used.
>
> Here is the solrconfig as output by the config request handler:
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":0},
>   "config":{
> "znodeVersion":0,
> "luceneMatchVersion":"org.apache.lucene.util.Version:6.5.1",
> "updateHandler":{
>   "indexWriter":{"closeWaitsForMerges":true},
>   "commitWithin":{"softCommit":true},
>   "autoCommit":{
> "maxDocs":5,
> "maxTime":30,
> "openSearcher":false},
>   "autoSoftCommit":{
> "maxDocs":-1,
> "maxTime":3}},
> "query":{
>   "useFilterForSortedQuery":false,
>   "queryResultWindowSize":1,
>   "queryResultMaxDocsCached":2147483647 <(214)%20748-3647>,
>   "enableLazyFieldLoading":false,
>   "maxBooleanClauses":1024,
>   "":{
> "size":"1",
> "showItems":"-1",
> "initialSize":"10",
> "name":"fieldValueCache"}},
> "jmx":{
>   "agentId":null,
>   "serviceUrl":null,
>   "rootName":null},
> "requestHandler":{
>   "/select":{
> "name":"/select",
> "defaults":{
>   "rows":10,
>   "echoParams":"explicit"},
> "class":"solr.SearchHandler"},
>   "/update":{
> "useParams":"_UPDATE",
> "class":"solr.UpdateRequestHandler",
> "name":"/update"},
>   "/update/json":{
> "useParams":"_UPDATE_JSON",
> "class":"solr.UpdateRequestHandler",
> "invariants":{"update.contentType":"application/json"},
> "name":"/update/json"},
>   "/update/csv":{
> "useParams":"_UPDATE_CSV",
> "class":"solr.UpdateRequestHandler",
> "invariants":{"update.contentType":"application/csv"},
> "name":"/update/csv"},
>   "/update/json/docs":{
> "useParams":"_UPDATE_JSON_DOCS",
> "class":"solr.UpdateRequestHandler",
> "invariants":{
>   "update.contentType":"application/json",
>   "json.command":"false"},
> "name":"/update/json/docs"},
>   "update":{
> "class":"solr.UpdateRequestHandlerApi",
> "useParams":"_UPDATE_JSON_DOCS",
> "name":"update"},
>   "/config":{
> "useParams":"_CONFIG",
> "class":&qu

Re: OOM during indexing with 24G heap - Solr 6.5.1

2017-10-16 Thread Randy Fradin
foHandler",
"useParams":"_ADMIN_SYSTEM",
"name":"/admin/system"},
  "/admin/mbeans":{
"class":"solr.SolrInfoMBeanHandler",
"useParams":"_ADMIN_MBEANS",
"name":"/admin/mbeans"},
  "/admin/plugins":{
"class":"solr.PluginInfoHandler",
"name":"/admin/plugins"},
  "/admin/threads":{
"class":"solr.ThreadDumpHandler",
"useParams":"_ADMIN_THREADS",
"name":"/admin/threads"},
  "/admin/properties":{
"class":"solr.PropertiesRequestHandler",
"useParams":"_ADMIN_PROPERTIES",
"name":"/admin/properties"},
  "/admin/logging":{
"class":"solr.LoggingHandler",
"useParams":"_ADMIN_LOGGING",
"name":"/admin/logging"},
  "/admin/file":{
"class":"solr.ShowFileRequestHandler",
"useParams":"_ADMIN_FILE",
"name":"/admin/file"},
  "/export":{
"class":"solr.ExportHandler",
"useParams":"_EXPORT",
"components":["query"],
"defaults":{"wt":"json"},
"invariants":{
  "rq":"{!xport}",
  "distrib":false},
"name":"/export"},
  "/graph":{
"class":"solr.GraphHandler",
"useParams":"_ADMIN_GRAPH",
"invariants":{
  "wt":"graphml",
  "distrib":false},
"name":"/graph"},
  "/stream":{
"class":"solr.StreamHandler",
"useParams":"_STREAM",
"defaults":{"wt":"json"},
"invariants":{"distrib":false},
"name":"/stream"},
  "/sql":{
"class":"solr.SQLHandler",
"useParams":"_SQL",
"defaults":{"wt":"json"},
"invariants":{"distrib":false},
"name":"/sql"},
  "/terms":{
"class":"solr.SearchHandler",
"useParams":"_TERMS",
"components":["terms"],
"name":"/terms"},
  "/analysis/document":{
"class":"solr.DocumentAnalysisRequestHandler",
"startup":"lazy",
"useParams":"_ANALYSIS_DOCUMENT",
"name":"/analysis/document"},
  "/analysis/field":{
"class":"solr.FieldAnalysisRequestHandler",
"startup":"lazy",
"useParams":"_ANALYSIS_FIELD",
"name":"/analysis/field"},
  "/debug/dump":{
"class":"solr.DumpRequestHandler",
"useParams":"_DEBUG_DUMP",
"defaults":{
  "echoParams":"explicit",
  "echoHandler":true},
"name":"/debug/dump"}},
"updateRequestProcessorChain":[{
"default":"true",
"name":"customupdatechain",

"":[{"class":"org.apache.solr.update.processor.CustomDedupProcessorFactory"},
  {"class":"solr.LogUpdateProcessorFactory"},
  {"class":"solr.RunUpdateProcessorFactory"}]}],
"updateHandlerupdateLog":{
  "dir":"",
  "numVersionBuckets":65536},
"requestDispatcher":{
  "handleSelect":true,
  "httpCaching":{
"never304":false,
"etagSeed":"Solr",
"lastModFrom":"opentime",
"cacheControl":null},
  "requestParsers":{
"multipartUploadLimitKB":2048,
"formUploadLimitKB":2048,
"addHttpRequestToContext":false}},
"indexConfig":{
  "useCompoundFile":false,
  "maxBufferedDocs":-1,
  "maxMergeDocs":-1,
  "mergeFactor":-1,
  "ramBufferSizeMB":100.0,
  "writeLockTimeout":-1,
  "lockType":"native",
  "infoStreamEnabled":false,
  "metrics":{}},
"peerSync":{"useRangeVersions":true}}}



On Mon, Oct 16, 2017 at 3:38 PM Shawn Heisey  wrote:

> On 10/16/2017 3:19 PM, Randy Fradin wrote:
> > We are seeing a lot of full GC events and eventual OOM errors in Solr
> > during indexing. This is Solr 6.5.1 running in cloud mode with a 24G
> heap.
> > At these times indexing is the only activity taking place. The collection
> > has 4 shards and 2 replicas across 3 nodes. Each document is ~10KB (a few
> > hundred fields each), and indexing is using the normal update handler, 1
> > document per request, up to 240 request at a time.
> >
> > The heap dump taken automatically on OOM shows 18.3GB of heap taken by 3
> > instances of DocumentsWriter. Within those instances, all of the heap is
> > retained by the blockedFlushes LinkedList inside the flushControl object.
> > Each node in the LinkedList appears to be retaining around 55MB.
> >
> > Clearly something to do with flushing is at play here but I'm at a loss
> > what tuning parameters I should be looking at. I would expect things to
> > start blocking if I fall too far behind on flushing but apparently that's
> > not happening. The ramBufferSizeMB is set to the default 100. My heap
> size
> > is already absurdly more than I thought we would need for this volume.
>
> One of the first things we need to find out is about your index size.
>
> In each of your shards, how many documents are there?  How much disk
> space does one shard replica take up?  How many shard replica cores does
> each node have on it in total?
>
> I would also like to get a look at your full solrconfig.xml file.  The
> schema may be helpful at a later date, along with an example of a
> document that you're indexing.  With ramBufferSizeMB at the default,
> having a ton of memory used up by a class used for indexing seems very odd.
>
> Do you have the text of the OOM exception? Is it saying out of heap
> space, or some other problem?
>
> Thanks,
> Shawn
>
>


OOM during indexing with 24G heap - Solr 6.5.1

2017-10-16 Thread Randy Fradin
We are seeing a lot of full GC events and eventual OOM errors in Solr
during indexing. This is Solr 6.5.1 running in cloud mode with a 24G heap.
At these times indexing is the only activity taking place. The collection
has 4 shards and 2 replicas across 3 nodes. Each document is ~10KB (a few
hundred fields each), and indexing is using the normal update handler, 1
document per request, up to 240 request at a time.

The heap dump taken automatically on OOM shows 18.3GB of heap taken by 3
instances of DocumentsWriter. Within those instances, all of the heap is
retained by the blockedFlushes LinkedList inside the flushControl object.
Each node in the LinkedList appears to be retaining around 55MB.

Clearly something to do with flushing is at play here but I'm at a loss
what tuning parameters I should be looking at. I would expect things to
start blocking if I fall too far behind on flushing but apparently that's
not happening. The ramBufferSizeMB is set to the default 100. My heap size
is already absurdly more than I thought we would need for this volume.

Any idea what could be causing this?