Re: Solr has a CPU% spike when indexing a batch of data

2016-12-15 Thread forest_soup
Thanks a lot, Shawn.

We'll consider your suggestion to tune our solr servers. Will let you know
the result. 

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-has-a-CPU-spike-when-indexing-a-batch-of-data-tp4309529p4310002.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr has a CPU% spike when indexing a batch of data

2016-12-14 Thread Shawn Heisey
On 12/14/2016 1:28 AM, forest_soup wrote:
> We are doing index on the same http endpoint. But as we have shardnum=1 and
> replicafactor=1, so each collection only has one core. So there should no
> distributed update/query, as we are using solrj's CloudSolrClient which will
> get the target URL of the solrnode when requesting to each collection.
>
> For the questions:
> * What is the total physical memory in the machine? 
> 128GB
>
> * What is the max heap on each of the two Solr processes? 
> 32GB for each 
>
> * What is the total index size in each Solr process?
> Each Solr node(process) has 16 cores. 130GB for each solr core. So totally
>> 2000G for each solr node. 

This means that you have approximately 64GB left  for your OS after
deducting the heap sizes, which it must use for itself and for OS disk
caching.  With nearly 2 terabytes of index data on the machine, 64GB is
nowhere near enough for good performance.  The server will be VERY busy
whenever there is query activity, so the CPU spike is what I would
expect.  For that much index data, I would hope to have somewhere
between 512GB and 2 terabytes of memory.  Adding machines and/or
increasing memory in each machine would make your performance better and
reduce CPU load.

https://wiki.apache.org/solr/SolrPerformanceProblems

> * What is the total tlog size in each Solr process? 
> 25m for each core. So totally 400m for each solr node.
>
> 
> ${solr.ulog.dir:}
>  name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}
> 1
> 100
> 

Compared to the amount of index data, 400MB is tiny, but this will take
a long time to process on restart.  You might want to consider lowering
the amount of data that the update log keeps so restarts are faster.

> * What are your commit characteristics like -- both manual and automatic. 
>
> 
> 1
> ${solr.autoCommit.maxTime:59000}
> false
> 
> 
> 5000
> ${solr.autoSoftCommit.maxTime:31000}
> 

I would personally remove the "maxDocs" portion of these settings and do
the automatic commits based purely on time.  For the amount of data
you're handling, those are very low maxDocs numbers, and could result in
very frequent commits when you index.  The time values are lower than I
would prefer, but are probably OK.

The number of collections should be no problem.  If there were hundreds
or thousands, that might be different.

Thanks,
Shawn



Re: Solr has a CPU% spike when indexing a batch of data

2016-12-14 Thread forest_soup
Thanks, Shawn!

We are doing index on the same http endpoint. But as we have shardnum=1 and
replicafactor=1, so each collection only has one core. So there should no
distributed update/query, as we are using solrj's CloudSolrClient which will
get the target URL of the solrnode when requesting to each collection.

For the questions:
* What is the total physical memory in the machine? 
128GB

* What is the max heap on each of the two Solr processes? 
32GB for each 

* What is the total index size in each Solr process?
Each Solr node(process) has 16 cores. 130GB for each solr core. So totally
>2000G for each solr node. 
 
* What is the total tlog size in each Solr process? 
25m for each core. So totally 400m for each solr node.


${solr.ulog.dir:}
${solr.ulog.numVersionBuckets:65536}
1
100


* What are your commit characteristics like -- both manual and automatic. 


1
${solr.autoCommit.maxTime:59000}
false


5000
${solr.autoSoftCommit.maxTime:31000}



* Do you have WARN or ERROR messages in your logfile? 
No.

* How many collections are in each cloud? 
80 collections with only one shard each. And replicafactor=1.

* How many servers are in each cloud? 
5 solr nodes. So each solr node has 16 cores.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-has-a-CPU-spike-when-indexing-a-batch-of-data-tp4309529p4309669.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr has a CPU% spike when indexing a batch of data

2016-12-13 Thread Shawn Heisey
On 12/13/2016 10:25 AM, Shawn Heisey wrote:
> That stacktrace indicates the thread is doing a query. If most of the
> threads have that stacktrace, it means Solr is handling a lot of
> simultaneous queries. That can cause a CPU spike. I checked one of the
> thread dumps

I didn't complete that sentence.

I checked one of the thread dumps and saw a number of ongoing queries,
plus some delegated requests to other hosts, and some ongoing indexing
requests.  I am unsure what conclusions can be made from what I saw.

I thought of some more questions:

* How many collections are in each cloud?
* How many servers are in each cloud?

Thanks,
Shawn



Re: Solr has a CPU% spike when indexing a batch of data

2016-12-13 Thread Shawn Heisey
On 12/13/2016 6:09 AM, forest_soup wrote:
> I posted this issue to a JIRA. Could anyone help comment? Thanks!
>
> https://issues.apache.org/jira/browse/SOLR-9741

Please use the mailing list *before* opening an issue in Jira.  If at
all possible, we want to be sure that problems are caused by a real bug
in the software before an issue is created.

> When we doing a batch of index and search operations to SolrCloud v5.3.2, we
> usually met a CPU% spike lasting about 10 min. 
> We have 5 physical servers, 2 solr instances running on each server with
> different port(8983 and 8984), all 8983 are in a same solrcloud, all 8984
> are in another solrcloud.
>
> You can see the chart in the attach file screenshot-1.png.
>  
>
> The thread dump are in the attach file threads.zip.
> threads.zip   
>
> During the spike, the thread dump shows most of the threads are with the
> call stacks below:

That stacktrace indicates the thread is doing a query.  If most of the
threads have that stacktrace, it means Solr is handling a lot of
simultaneous queries.  That can cause a CPU spike.  I checked one of the
thread dumps

Indexing tends to use a lot of resources.  If you are doing all your
indexing to the same HTTP endpoint (in a way that doesn't send the
request to the correct shard leader), that will also make Solr work harder.

You appear to be running Solr with SSL.  This is going to increase CPU
requirements.  I wouldn't expect the increase to be very high, but if
CPU is already a problem, that will make it worse.

Your iowait CPU percentage appears to be nearly nonexistent, so I might
be barking up the wrong tree with some of the following questions, but
I'll go ahead and ask them anyway:

* What is the total physical memory in the machine?
* What is the max heap on each of the two Solr processes?
* What is the total index size in each Solr process?
* What is the total tlog size in each Solr process?
* What are your commit characteristics like -- both manual and automatic.
* Do you have WARN or ERROR messages in your logfile?

Thanks,
Shawn



Solr has a CPU% spike when indexing a batch of data

2016-12-13 Thread forest_soup
Hi, 

I posted this issue to a JIRA. Could anyone help comment? Thanks!

https://issues.apache.org/jira/browse/SOLR-9741

The details:

When we doing a batch of index and search operations to SolrCloud v5.3.2, we
usually met a CPU% spike lasting about 10 min. 
We have 5 physical servers, 2 solr instances running on each server with
different port(8983 and 8984), all 8983 are in a same solrcloud, all 8984
are in another solrcloud.

You can see the chart in the attach file screenshot-1.png.
 

The thread dump are in the attach file threads.zip.
threads.zip   

During the spike, the thread dump shows most of the threads are with the
call stacks below:
"qtp634210724-4759" #4759 prio=5 os_prio=0 tid=0x7fb32803e000 nid=0x64e7
runnable [0x7fb3ef1ef000]
java.lang.Thread.State: RUNNABLE
at
java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss(ThreadLocal.java:444)
at java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:419)
at java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:298)
at java.lang.ThreadLocal.get(ThreadLocal.java:163)
at
org.apache.solr.search.SolrQueryTimeoutImpl.get(SolrQueryTimeoutImpl.java:49)
at
org.apache.solr.search.SolrQueryTimeoutImpl.shouldExit(SolrQueryTimeoutImpl.java:57)
at
org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.checkAndThrow(ExitableDirectoryReader.java:165)
at
org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.(ExitableDirectoryReader.java:157)
at
org.apache.lucene.index.ExitableDirectoryReader$ExitableTerms.iterator(ExitableDirectoryReader.java:141)
at org.apache.lucene.index.TermContext.build(TermContext.java:93)
at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:192)
at
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855)
at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56)
at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203)
at
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855)
at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56)
at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203)
at
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855)
at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56)
at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203)
at
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855)
at
org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:838)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486)
at org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:456)
at org.apache.solr.search.Grouping.execute(Grouping.java:370)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:496)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-has-a-CPU-spike-when-indexing-a-batch-of-data-tp4309529.html
Sent from the Solr - User mailing list archive at Nabble.com.