Hi Shawn,

Regarding CPU high, when we are troubleshooting, we found that Merge threads
are keep on running and it's take most CPU time ( as per Visual JVM ). GC is
not causing any issue as we use the default GC and also tried with G1 as you
suggested over  here
<https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr>  

Though it's only background process, we are suspecting whether it's causing
CPU to go high. 

Since we are using SOLR as real time indexing of data and depending on its
result immd. to show it in UI as well. So we keep adding document around 100
to 200 documents in parallel in a sec. Also it would be in batch of 20 solr
documents list in one add... 

*Note*: following is the code snippet we use for indexing / adding solr
document in batch per collection

/for (SolrCollectionList solrCollection : SolrCollectionList.values()) {
        CollectionBucket collectionBucket = getCollectionBucket(solrCollection);
        List<SolrInputDocument> solrInputDocuments =
collectionBucket.getSolrInputDocumentList();
        String collectionName = collectionBucket.getCollectionName();
        try {
                if(solrInputDocuments.size() > 0) {
                        CloudSolrClient solrClient =
PlatformIndexManager.getInstance().getCloudSolrClient(collectionName);
                        solrClient.add(collectionName, solrInputDocuments);
                }
}/

*where solrClient is created as below
*
/this.cloudSolrClient = new
CloudSolrClient.Builder().withZkHost(zooKeeperHost).withHttpClient(HttpClientUtil.HttpClientFactory.createHttpClient()).build();
this.cloudSolrClient.setZkClientTimeout(30000);
/

Hard commit is kept as automatic and set to 15000 ms.
In this process, we also see, when merge is happening, and already
maxMergeCount ( default one ) is reached, commits are getting delayed and
solrj client ( where we add document ) is getting blocked and once once of 
Merge thread process the merge, then solrj client returns the result.
How do we avoid this blocking of solrj client ? Do I need to go out of
default config for this scenario? I mean change the merge factor
configuration ? 

Can you suggest what would be merge config for such a scenario ? Based on
forums, I tried to change the merge settings to the following,

<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
        <int name="maxMergeAtOnce">30</int>
        <int name="maxMergeAtOnceExplicit">30</int>
        <int name="segmentsPerTier">30</int>
        <int name="floorSegmentMB">2048</int>
        <int name="maxMergedSegmentMB">512</int>
        <double name="noCFSRatio">0.1</double>
        <int name="maxCFSSegmentSizeMB">2048</int>
        <double name="reclaimDeletesWeight">2.0</double>
        <double name="forceMergeDeletesPctAllowed">10.0</double>
</mergePolicyFactory>

But couldn't see any much change in the behaviour.

In same solr node, we have multiple index / collection. In that case,
whether TieredMergePolicyFactory will be right option or for multiple
collection in same node we should go for other merge policy ( like LogByte
etc ) 


Can you throw some light on this aspects ?
Regards,

 Regarding auto commit, we discussed lot with our product owners and atlast
> we are forced to keep it to 1sec and we couldn't increase further. As this
> itself, sometimes our customers says that they have to refresh their pages
> for couple of times to get the update from solr. So we can't increase
> further.

I understand pressure from nontechnical departments for very low 
response times. Executives, sales, and marketing are usually the ones 
making those kinds of demands. I think you should push back on that 
particular requirement on technical grounds.

A soft commit interval that low *can* contribute to performance issues.  
It doesn't always cause them, I'm just saying that it *can*.  Maybe 
increasing it to five or ten seconds could help performance, or maybe it 
will make no real difference at all.

> Yes. As of now only solr is running in that machine. But intially we were
> running along with hbase region servers and was working fine. But due to
> CPU
> spikes and OS disk cache, we are forced to move solr to separate machine.
> But just I checked, our solr data folder size is coming only to 17GB. 2
> collection has around 5GB and other are have 2 to 3 GB of size. If you say
> that only 2/3 of total size comes to OS disk cache, in top command VIRT
> property it's always 28G, which means more than what we have. Why is
> that...
> Pls check that top command & GC we used in this  doc
> &lt;https://docs.google.com/document/d/1SaKPbGAKEPP8bSbdvfX52gaLsYWnQfDqfmV802hWIiQ/edit?usp=sharing&gt;

The VIRT memory should be about equivalent to the RES size plus the size 
of all the index data on the system.  So that looks about right.  The 
actual amount of memory allocated by Java for the heap and other memory 
structures is approximately equal to RES minus SHR.

I am not sure whether the SHR size gets counted in VIRT. It probably 
does.  On some Linux systems, SHR grows to a very high number, but when 
that happens, it typically doesn't reflect actual memory usage.  I do 
not know why this sometimes happens.That is a question for Oracle, since 
they are the current owners of Java.

Only 5GB is in the buff/cache area.  The system has 13GB of free 
memory.  That system is NOT low on memory.

With 4 CPUs, a load average in the 3-4 range is an indication that the 
server is busy.  I can't say for sure whether it means the server is 
overloaded.  Sometimes the load average on a system that's working well 
can go higher than the CPU count, sometimes a load average well below 
the CPU count is shown on a system with major performance issues.  It's 
difficult to say.  The instantaneous CPU usage on the Solr process in 
that screenshot is 384 percent.  Which means that it is exercising the 
CPUs hard. But this might be perfectly OK.  96.3 percent of the CPU is 
being used by user processes, a VERY small amount is being used by 
system, and the iowait percentage is zero.  Typically servers that are 
struggling will have a higher percentage in system and/or iowait, and I 
don't see that here.

> Queries are quiet fast, most of time simple queries with fq. Regarding
> index, during peak hours, we index around 100 documents in a second in a
> average.

That's good.  And not surprising, given how little memory pressure and 
how much free memory there is.  An indexing rate of 100 per second 
doesn't seem like a lot of indexing to me, but for some indexes, it 
might be very heavy.  If your general performance is good, I wouldn't be 
too concerned about it.

> Regarding release, initially we tried with 6.4.1 and since many
> discussions
> over here, mentioned like moving to 6.5.x will solve lot of performance
> issues etc, so we moved to 6.5.1. We will move to 6.6.3 in near future.

The 6.4.1 version had a really bad bug in it that killed performance for 
most users.  Some might not have even noticed a problem, though.  It's 
difficult to say for sure whether it would be something you would 
notice, or whether you would see an increase in performance by upgrading.

> Hope I have given enough information. One strange thing is that, CPU and
> memory spike are not seen when we move to r4.xlarge to r4.2xlarge ( which
> is
> 8 core with 60 GB RAM ). But this would not be cost effective. What's
> making
> CPU and memory to go high in this new version ( due to doc values )? If I
> switch off docvalues will CPU & Memory spikes will get reduced ?

Overall memory usage (outside of the Java heap) looks great to me.  CPU 
usage is high, but I can't tell if it's TOO high. As a proof of concept, 
I think you should try raising autoSoftCommit to five seconds.  If 
maxDocs is configured on either autoCommit or autoSoftCommit, remove it 
so that only maxTime is there, regardless of whether you actually change 
maxTime.  If raising autoSoftCommit makes no real difference, then the 1 
second autoSoftCommit probably isn't a worry.  I bet if you raised it to 
five seconds, most users would never notice anything different.

If you want to provide a GC log to us that covers a relatively long 
timeframe, we can analyze that and let you know whether your heap is 
sized appropriately, or whether it might be too big or too small, and 
whether garbage collection pauses are keeping your CPU usage high.  The 
standard Solr startup in most current versions always logs GC activity.  
It will usually be in the same directory as solr.log.

Do you know what typical and peak queries per second are on your Solr 
servers?  If your query rate is high, handling that will probably 
require more servers and a higher replica count.

Thanks,
Shawn





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to