Re: OOM during indexing with 24G heap - Solr 6.5.1

2017-10-20 Thread Shawn Heisey
On 10/16/2017 5:38 PM, Randy Fradin wrote:
> Each shard has around 4.2 million documents which are around 40GB on disk.
> Two nodes have 3 shard replicas each and the third has 2 shard replicas.
>
> The text of the exception is: java.lang.OutOfMemoryError: Java heap space
> And the heap dump is a full 24GB indicating the full heap space was being
> used.
>
> Here is the solrconfig as output by the config request handler:

I was hoping for the actual XML, but I don't see any red flags in the
output you've provided.  It does look like you've probably got a very
minimal configuration.  Some things that I expected to see (and do see
on my own systems) aren't in the handler output at all.

With only 12 million docs on the machine, I would not expect any need
for 24GB of heap except in the case of a large number of particularly
RAM-hungry complex queries.  The ratio of index size to document count
says that the documents are bigger than what I think of as typical, but
not what I would call enormous.  If there's any way you can adjust your
schema to remove unused parts and reduce the index size, that would be a
good idea, but I don't consider that to be an immediate action item. 
Your index size is well within what Solr should be able to handle easily
-- if there are sufficient system resources, memory in particular.

The 6.5.1 version of Solr that you're running should have most known
memory leak issues fixed -- and there are not many of those.  I'm not
aware of any leak problems that would affect Lucene's DocumentsWriter
class, where you said most of the heap was being consumed.  That doesn't
necessarily mean there isn't a leak bug that applies, just that I am not
aware of any.

You have indicated that you're doing a very large number of concurrent
update requests, up to 240 at the same time.  I cannot imagine a
situation where Lucene would require a buffer (100 MB in your config)
for every indexing thread.  That would really cause some major memory
issues with Lucene and Solr installations.

Your description of what you have in your heap sounds a little bit
different than a buffer per indexing thread.  It sounds like your
indexing has resulted in a LOT of flushes, which is probably normal,
except that the flush queue doesn't appear to be getting emptied.  If
I'm right, either your indexing is happening faster than Lucene can
flush the segments that get built, or there is something preventing
Lucene from actually doing the flush.  I do not see any indication in
the code that Lucene ever imposes a limit on the number of queued
flushes, but in a system that's working correctly, it probably doesn't
have to.  My theories here should be validated by somebody who has much
better insight into Lucene than I do.

I'm interested in seeing some details about the system and the processes
running.  What OS is this running on?  If it's something other than
Windows, you probably have the "top" utility installed.  The gnu version
of top has a keyboard shortcut (shift-M) to sort by memory usage.  If
it's available, run top (not htop or any other variant), press the key
to sort by memory, and grab a screenshot.

On recent versions of Windows, there's a program called Resource
Monitor.  If you're on Windows, run that program, click on the memory
tab, sort by Private, make sure that the memory graph and MB counts
below the process list are fully visible, and grab a screenshot.

It is unlikely that you'll be able to send a screenshot image to the
list, so you'll probably need a file sharing website.

Thanks,
Shawn



Re: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1

2017-10-17 Thread Nawab Zada Asad Iqbal
Randy

That is one issue, i don't know if it fixes everything for you or not.
However, Lucene doesn't put a limit on number of incoming requests and
after https://issues.apache.org/jira/browse/LUCENE-6659 , solr has no way
(i don't know at least) to limit threads. So if you have ton of parallel
updates reaching the solr websever, it can cause a performance problem.




On Tue, Oct 17, 2017 at 10:52 AM, Randy Fradin 
wrote:

> I've been trying to understand DocumentsWriterFlushControl.java to figure
> this one out. I don't really have a firm grasp of it but I'm starting to
> suspect that blocked flushes in aggregate can take up to (ramBufferSizeMB *
> maximum # of concurrent update requests * # of cores) of heap space and
> that I need to limit how many concurrent update requests are sent to the
> same Solr node at the same time to something much lower than my current
> 240. I don't know this for sure.. it is mostly a guess based on the fact
> that one of the DocumentsWriter instances in my heap dump has just under
> 240 items in the blockedFlushes list and each of those is retaining up to
> 57MB of heap space (which is less than ramBufferSizeMB=100 but in the
> ballpark).
>
> Can anyone shed light on whether I'm going down the right path here?
>
>
> On Mon, Oct 16, 2017 at 5:34 PM David M Giannone 
> wrote:
>
> >
> >
> >
> >
> > Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
> >
> >
> >  Original message ----
> > From: Randy Fradin 
> > Date: 10/16/17 7:38 PM (GMT-05:00)
> > To: solr-user@lucene.apache.org
> > Subject: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1
> >
> > Each shard has around 4.2 million documents which are around 40GB on
> disk.
> > Two nodes have 3 shard replicas each and the third has 2 shard replicas.
> >
> > The text of the exception is: java.lang.OutOfMemoryError: Java heap space
> > And the heap dump is a full 24GB indicating the full heap space was being
> > used.
> >
> > Here is the solrconfig as output by the config request handler:
> >
> > {
> >   "responseHeader":{
> > "status":0,
> > "QTime":0},
> >   "config":{
> > "znodeVersion":0,
> > "luceneMatchVersion":"org.apache.lucene.util.Version:6.5.1",
> > "updateHandler":{
> >   "indexWriter":{"closeWaitsForMerges":true},
> >   "commitWithin":{"softCommit":true},
> >   "autoCommit":{
> > "maxDocs":5,
> > "maxTime":30,
> > "openSearcher":false},
> >   "autoSoftCommit":{
> > "maxDocs":-1,
> > "maxTime":3}},
> > "query":{
> >   "useFilterForSortedQuery":false,
> >   "queryResultWindowSize":1,
> >   "queryResultMaxDocsCached":2147483647 <(214)%20748-3647>,
> >   "enableLazyFieldLoading":false,
> >   "maxBooleanClauses":1024,
> >   "":{
> > "size":"1",
> > "showItems":"-1",
> > "initialSize":"10",
> > "name":"fieldValueCache"}},
> > "jmx":{
> >   "agentId":null,
> >   "serviceUrl":null,
> >   "rootName":null},
> > "requestHandler":{
> >   "/select":{
> > "name":"/select",
> > "defaults":{
> >   "rows":10,
> >   "echoParams":"explicit"},
> > "class":"solr.SearchHandler"},
> >   "/update":{
> > "useParams":"_UPDATE",
> > "class":"solr.UpdateRequestHandler",
> > "name":"/update"},
> >   "/update/json":{
> > "useParams":"_UPDATE_JSON",
> > "class":"solr.UpdateRequestHandler",
> > "invariants":{"update.contentType":"application/json"},
> > "name":"/update/json"},
> >   "/update/csv":{
> > "useParams":"_UPDATE_CSV",
> > "class":"solr.UpdateRequestHandler",
> > 

Re: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1

2017-10-17 Thread Randy Fradin
I've been trying to understand DocumentsWriterFlushControl.java to figure
this one out. I don't really have a firm grasp of it but I'm starting to
suspect that blocked flushes in aggregate can take up to (ramBufferSizeMB *
maximum # of concurrent update requests * # of cores) of heap space and
that I need to limit how many concurrent update requests are sent to the
same Solr node at the same time to something much lower than my current
240. I don't know this for sure.. it is mostly a guess based on the fact
that one of the DocumentsWriter instances in my heap dump has just under
240 items in the blockedFlushes list and each of those is retaining up to
57MB of heap space (which is less than ramBufferSizeMB=100 but in the
ballpark).

Can anyone shed light on whether I'm going down the right path here?


On Mon, Oct 16, 2017 at 5:34 PM David M Giannone 
wrote:

>
>
>
>
> Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
>
>
>  Original message 
> From: Randy Fradin 
> Date: 10/16/17 7:38 PM (GMT-05:00)
> To: solr-user@lucene.apache.org
> Subject: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1
>
> Each shard has around 4.2 million documents which are around 40GB on disk.
> Two nodes have 3 shard replicas each and the third has 2 shard replicas.
>
> The text of the exception is: java.lang.OutOfMemoryError: Java heap space
> And the heap dump is a full 24GB indicating the full heap space was being
> used.
>
> Here is the solrconfig as output by the config request handler:
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":0},
>   "config":{
> "znodeVersion":0,
> "luceneMatchVersion":"org.apache.lucene.util.Version:6.5.1",
> "updateHandler":{
>   "indexWriter":{"closeWaitsForMerges":true},
>   "commitWithin":{"softCommit":true},
>   "autoCommit":{
> "maxDocs":5,
> "maxTime":30,
> "openSearcher":false},
>   "autoSoftCommit":{
> "maxDocs":-1,
> "maxTime":3}},
> "query":{
>   "useFilterForSortedQuery":false,
>   "queryResultWindowSize":1,
>   "queryResultMaxDocsCached":2147483647 <(214)%20748-3647>,
>   "enableLazyFieldLoading":false,
>   "maxBooleanClauses":1024,
>   "":{
> "size":"1",
> "showItems":"-1",
> "initialSize":"10",
> "name":"fieldValueCache"}},
> "jmx":{
>   "agentId":null,
>   "serviceUrl":null,
>   "rootName":null},
> "requestHandler":{
>   "/select":{
> "name":"/select",
> "defaults":{
>   "rows":10,
>   "echoParams":"explicit"},
> "class":"solr.SearchHandler"},
>   "/update":{
> "useParams":"_UPDATE",
> "class":"solr.UpdateRequestHandler",
> "name":"/update"},
>   "/update/json":{
> "useParams":"_UPDATE_JSON",
> "class":"solr.UpdateRequestHandler",
> "invariants":{"update.contentType":"application/json"},
> "name":"/update/json"},
>   "/update/csv":{
> "useParams":"_UPDATE_CSV",
> "class":"solr.UpdateRequestHandler",
> "invariants":{"update.contentType":"application/csv"},
> "name":"/update/csv"},
>   "/update/json/docs":{
> "useParams":"_UPDATE_JSON_DOCS",
> "class":"solr.UpdateRequestHandler",
> "invariants":{
>   "update.contentType":"application/json",
>   "json.command":"false"},
> "name":"/update/json/docs"},
>   "update":{
> "class":"solr.UpdateRequestHandlerApi",
> "useParams":"_UPDATE_JSON_DOCS",
> "name":"update"},
>   "/config":{
> "useParams":"_CONFIG",
> "class":&qu

Re: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1

2017-10-16 Thread David M Giannone




Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone


 Original message 
From: Randy Fradin 
Date: 10/16/17 7:38 PM (GMT-05:00)
To: solr-user@lucene.apache.org
Subject: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1

Each shard has around 4.2 million documents which are around 40GB on disk.
Two nodes have 3 shard replicas each and the third has 2 shard replicas.

The text of the exception is: java.lang.OutOfMemoryError: Java heap space
And the heap dump is a full 24GB indicating the full heap space was being
used.

Here is the solrconfig as output by the config request handler:

{
  "responseHeader":{
"status":0,
"QTime":0},
  "config":{
"znodeVersion":0,
"luceneMatchVersion":"org.apache.lucene.util.Version:6.5.1",
"updateHandler":{
  "indexWriter":{"closeWaitsForMerges":true},
  "commitWithin":{"softCommit":true},
  "autoCommit":{
"maxDocs":5,
"maxTime":30,
"openSearcher":false},
  "autoSoftCommit":{
"maxDocs":-1,
"maxTime":3}},
"query":{
  "useFilterForSortedQuery":false,
  "queryResultWindowSize":1,
  "queryResultMaxDocsCached":2147483647,
  "enableLazyFieldLoading":false,
  "maxBooleanClauses":1024,
  "":{
"size":"1",
"showItems":"-1",
"initialSize":"10",
"name":"fieldValueCache"}},
"jmx":{
  "agentId":null,
  "serviceUrl":null,
  "rootName":null},
"requestHandler":{
  "/select":{
"name":"/select",
"defaults":{
  "rows":10,
  "echoParams":"explicit"},
"class":"solr.SearchHandler"},
  "/update":{
"useParams":"_UPDATE",
"class":"solr.UpdateRequestHandler",
"name":"/update"},
  "/update/json":{
"useParams":"_UPDATE_JSON",
"class":"solr.UpdateRequestHandler",
"invariants":{"update.contentType":"application/json"},
"name":"/update/json"},
  "/update/csv":{
"useParams":"_UPDATE_CSV",
"class":"solr.UpdateRequestHandler",
"invariants":{"update.contentType":"application/csv"},
"name":"/update/csv"},
  "/update/json/docs":{
"useParams":"_UPDATE_JSON_DOCS",
"class":"solr.UpdateRequestHandler",
"invariants":{
  "update.contentType":"application/json",
  "json.command":"false"},
"name":"/update/json/docs"},
  "update":{
"class":"solr.UpdateRequestHandlerApi",
"useParams":"_UPDATE_JSON_DOCS",
"name":"update"},
  "/config":{
"useParams":"_CONFIG",
"class":"solr.SolrConfigHandler",
"name":"/config"},
  "/schema":{
"class":"solr.SchemaHandler",
"useParams":"_SCHEMA",
"name":"/schema"},
  "/replication":{
"class":"solr.ReplicationHandler",
"useParams":"_REPLICATION",
"name":"/replication"},
  "/get":{
"class":"solr.RealTimeGetHandler",
"useParams":"_GET",
"defaults":{
  "omitHeader":true,
  "wt":"json",
  "indent":true},
"name":"/get"},
  "/admin/ping":{
"class":"solr.PingRequestHandler",
"useParams":"_ADMIN_PING",
"invariants":{
  "echoParams":"all",
  "q":"{!lucene}*:*"},
"name":"/admin/ping"},
  "/admin/segments":{
"class":"solr.SegmentsInfoRequestHandler",
"useParams":"_ADMIN_SEGMENTS",
"name":"/admin/segments"},
  

Re: OOM during indexing with 24G heap - Solr 6.5.1

2017-10-16 Thread Randy Fradin
Each shard has around 4.2 million documents which are around 40GB on disk.
Two nodes have 3 shard replicas each and the third has 2 shard replicas.

The text of the exception is: java.lang.OutOfMemoryError: Java heap space
And the heap dump is a full 24GB indicating the full heap space was being
used.

Here is the solrconfig as output by the config request handler:

{
  "responseHeader":{
"status":0,
"QTime":0},
  "config":{
"znodeVersion":0,
"luceneMatchVersion":"org.apache.lucene.util.Version:6.5.1",
"updateHandler":{
  "indexWriter":{"closeWaitsForMerges":true},
  "commitWithin":{"softCommit":true},
  "autoCommit":{
"maxDocs":5,
"maxTime":30,
"openSearcher":false},
  "autoSoftCommit":{
"maxDocs":-1,
"maxTime":3}},
"query":{
  "useFilterForSortedQuery":false,
  "queryResultWindowSize":1,
  "queryResultMaxDocsCached":2147483647,
  "enableLazyFieldLoading":false,
  "maxBooleanClauses":1024,
  "":{
"size":"1",
"showItems":"-1",
"initialSize":"10",
"name":"fieldValueCache"}},
"jmx":{
  "agentId":null,
  "serviceUrl":null,
  "rootName":null},
"requestHandler":{
  "/select":{
"name":"/select",
"defaults":{
  "rows":10,
  "echoParams":"explicit"},
"class":"solr.SearchHandler"},
  "/update":{
"useParams":"_UPDATE",
"class":"solr.UpdateRequestHandler",
"name":"/update"},
  "/update/json":{
"useParams":"_UPDATE_JSON",
"class":"solr.UpdateRequestHandler",
"invariants":{"update.contentType":"application/json"},
"name":"/update/json"},
  "/update/csv":{
"useParams":"_UPDATE_CSV",
"class":"solr.UpdateRequestHandler",
"invariants":{"update.contentType":"application/csv"},
"name":"/update/csv"},
  "/update/json/docs":{
"useParams":"_UPDATE_JSON_DOCS",
"class":"solr.UpdateRequestHandler",
"invariants":{
  "update.contentType":"application/json",
  "json.command":"false"},
"name":"/update/json/docs"},
  "update":{
"class":"solr.UpdateRequestHandlerApi",
"useParams":"_UPDATE_JSON_DOCS",
"name":"update"},
  "/config":{
"useParams":"_CONFIG",
"class":"solr.SolrConfigHandler",
"name":"/config"},
  "/schema":{
"class":"solr.SchemaHandler",
"useParams":"_SCHEMA",
"name":"/schema"},
  "/replication":{
"class":"solr.ReplicationHandler",
"useParams":"_REPLICATION",
"name":"/replication"},
  "/get":{
"class":"solr.RealTimeGetHandler",
"useParams":"_GET",
"defaults":{
  "omitHeader":true,
  "wt":"json",
  "indent":true},
"name":"/get"},
  "/admin/ping":{
"class":"solr.PingRequestHandler",
"useParams":"_ADMIN_PING",
"invariants":{
  "echoParams":"all",
  "q":"{!lucene}*:*"},
"name":"/admin/ping"},
  "/admin/segments":{
"class":"solr.SegmentsInfoRequestHandler",
"useParams":"_ADMIN_SEGMENTS",
"name":"/admin/segments"},
  "/admin/luke":{
"class":"solr.LukeRequestHandler",
"useParams":"_ADMIN_LUKE",
"name":"/admin/luke"},
  "/admin/system":{
"class":"solr.SystemInfoHandler",
"useParams":"_ADMIN_SYSTEM",
"name":"/admin/system"},
  "/admin/mbeans":{
"class":"solr.SolrInfoMBeanHandler",
"useParams":"_ADMIN_MBEANS",
"name":"/admin/mbeans"},
  "/admin/plugins":{
"class":"solr.PluginInfoHandler",
"name":"/admin/plugins"},
  "/admin/threads":{
"class":"solr.ThreadDumpHandler",
"useParams":"_ADMIN_THREADS",
"name":"/admin/threads"},
  "/admin/properties":{
"class":"solr.PropertiesRequestHandler",
"useParams":"_ADMIN_PROPERTIES",
"name":"/admin/properties"},
  "/admin/logging":{
"class":"solr.LoggingHandler",
"useParams":"_ADMIN_LOGGING",
"name":"/admin/logging"},
  "/admin/file":{
"class":"solr.ShowFileRequestHandler",
"useParams":"_ADMIN_FILE",
"name":"/admin/file"},
  "/export":{
"class":"solr.ExportHandler",
"useParams":"_EXPORT",
"components":["query"],
"defaults":{"wt":"json"},
"invariants":{
  "rq":"{!xport}",
  "distrib":false},
"name":"/export"},
  "/graph":{
"class":"solr.GraphHandler",
"useParams":"_ADMIN_GRAPH",
"invariants":{
  "wt":"graphml",
  "distrib":false},
"name":"/graph"},
  "/stream":{
"class":"solr.StreamHandler",
"useParams":"_STREAM",
"defaults":{"wt":"json"},
"invariants":{"distrib":false},
"name":"/stream"},
  "/sql":{
"class"

Re: OOM during indexing with 24G heap - Solr 6.5.1

2017-10-16 Thread Shawn Heisey
On 10/16/2017 3:19 PM, Randy Fradin wrote:
> We are seeing a lot of full GC events and eventual OOM errors in Solr
> during indexing. This is Solr 6.5.1 running in cloud mode with a 24G heap.
> At these times indexing is the only activity taking place. The collection
> has 4 shards and 2 replicas across 3 nodes. Each document is ~10KB (a few
> hundred fields each), and indexing is using the normal update handler, 1
> document per request, up to 240 request at a time.
>
> The heap dump taken automatically on OOM shows 18.3GB of heap taken by 3
> instances of DocumentsWriter. Within those instances, all of the heap is
> retained by the blockedFlushes LinkedList inside the flushControl object.
> Each node in the LinkedList appears to be retaining around 55MB.
>
> Clearly something to do with flushing is at play here but I'm at a loss
> what tuning parameters I should be looking at. I would expect things to
> start blocking if I fall too far behind on flushing but apparently that's
> not happening. The ramBufferSizeMB is set to the default 100. My heap size
> is already absurdly more than I thought we would need for this volume.

One of the first things we need to find out is about your index size.

In each of your shards, how many documents are there?  How much disk
space does one shard replica take up?  How many shard replica cores does
each node have on it in total?

I would also like to get a look at your full solrconfig.xml file.  The
schema may be helpful at a later date, along with an example of a
document that you're indexing.  With ramBufferSizeMB at the default,
having a ton of memory used up by a class used for indexing seems very odd.

Do you have the text of the OOM exception? Is it saying out of heap
space, or some other problem?

Thanks,
Shawn