On 4/9/2018 4:04 AM, mganeshs wrote:
Regarding CPU high, when we are troubleshooting, we found that Merge threads
are keep on running and it's take most CPU time ( as per Visual JVM ).

With a one second autoSoftCommit, nearly constant indexing will produce a lot of very small index segments.  Those index segments will have to be merged eventually.  You have increased the merge policy numbers which will reduce the total number of merges, but each merge is going to be larger than it would with defaults, so it's going to take a little bit longer.  This isn't too big a deal with first-level merges, but at the higher levels, they do get large -- no matter what the configuration is.

*Note*: following is the code snippet we use for indexing / adding solr
document in batch per collection

/for (SolrCollectionList solrCollection : SolrCollectionList.values()) {
        CollectionBucket collectionBucket = getCollectionBucket(solrCollection);
        List<SolrInputDocument> solrInputDocuments =
collectionBucket.getSolrInputDocumentList();
        String collectionName = collectionBucket.getCollectionName();
        try {
                if(solrInputDocuments.size() > 0) {
                        CloudSolrClient solrClient =
PlatformIndexManager.getInstance().getCloudSolrClient(collectionName);
                        solrClient.add(collectionName, solrInputDocuments);
                }
}/

*where solrClient is created as below
*
/this.cloudSolrClient = new
CloudSolrClient.Builder().withZkHost(zooKeeperHost).withHttpClient(HttpClientUtil.HttpClientFactory.createHttpClient()).build();
this.cloudSolrClient.setZkClientTimeout(30000);
/

Is that code running on the Solr server, or on a different machine?  Are you creating a SolrClient each time you use it, or have you created client objects that get re-used?

You don't need a different SolrClient object for each collection.  Your "getCloudSolrClient" method takes a collection name, which suggests there might be a different client object for each one.  Most of the time, you need precisely one client object for the entire application.

Hard commit is kept as automatic and set to 15000 ms.
In this process, we also see, when merge is happening, and already
maxMergeCount ( default one ) is reached, commits are getting delayed and
solrj client ( where we add document ) is getting blocked and once once of
Merge thread process the merge, then solrj client returns the result.
How do we avoid this blocking of solrj client ? Do I need to go out of
default config for this scenario? I mean change the merge factor
configuration ?

Can you suggest what would be merge config for such a scenario ? Based on
forums, I tried to change the merge settings to the following,

What are you trying to accomplish by changing the merge policy?  It's fine to find information for a config on the Internet, but you need to know what that config *does* before you use it, and make sure it aligns with your goals.  On mine, I change maxMergeAtOnce and segmentsPerTier to 35, and maxMergeAtOnceExplicit to 105.  I know exactly what I'm trying to do with this config -- reduce the frequency of merges.  Each merge is going to be larger with this config, but they will happen less frequently.  These three settings are the only ones that I change in my merge policy.  Changing all of the other settings that you have changed should not be necessary.  I make one other adjustment in this area -- to the merge scheduler.

In same solr node, we have multiple index / collection. In that case,
whether TieredMergePolicyFactory will be right option or for multiple
collection in same node we should go for other merge policy ( like LogByte
etc )

TieredMergePolicy was made the default policy after a great deal of testing and discussion by Lucene developers.  They found that it works better than the others for the vast majority of users.  It is likely the best choice for you too.

These are the settings that I use in indexConfig to reduce the impact of merges on my indexing:

  <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
    <int name="maxMergeAtOnce">35</int>
    <int name="segmentsPerTier">35</int>
    <int name="maxMergeAtOnceExplicit">105</int>
  </mergePolicy>
  <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
    <int name="maxThreadCount">1</int>
    <int name="maxMergeCount">6</int>
  </mergeScheduler>

Note that this config is designed for 6.x and earlier.  I do not know if it will work in 7.x.  It probably needs to be adjusted to the new Factory config.  You can use it as a guide, though.

Thanks,
Shawn

Reply via email to