We have SOLR(7.0.1) cloud 3 VM Linux instances wit 4 CPU, 90 GB RAM with zookeeper (3.4.11) ensemble running on the same machines. We have 130 cores of overall size of 45GB. No Sharding, almost all VMs has the same copy of data. These nodes are under LB.
Index Config: ============= <ramBufferSizeMB>300</ramBufferSizeMB> <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory"> <int name="maxMergeAtOnce">30</int> <int name="maxMergeAtOnceExplicit">100</int> <double name="segmentsPerTier">30.0</double> </mergePolicyFactory> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"> <int name="maxMergeCount">18</int> <int name="maxThreadCount">6</int> </mergeScheduler> Commit Configs: =============== <autoCommit> <maxTime>${solr.autoCommit.maxTime:600000}</maxTime> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>${solr.autoSoftCommit.maxTime:60000}</maxTime> </autoSoftCommit> We do 3500 Insert / Updates per second spread across all 130 cores, We yet to start using selects effectively. The problem what we are facing is at times suddenly the thread count increase heavily which results SOLR non responsive or throwing 503 response for client (PHP HTTP CURL) requests. Today 04-04-2018 the thread dump shows that the peak went upto 13000+ Please hlep me in fixing this issue. Thanks! Sample Threads: =============== 1.updateExecutor-2-thread-25746-processing-http://// 172.10.2.19:8983//solr//profileviews x:profileviews r:core_node2 n:172.10.2.18:8983_solr s:shard1 c:profileviews", "state":"TIMED_WAITING", "lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@297be1d5", "cpuTime":"162.4371ms", "userTime":"120.0000ms", "stackTrace":["sun.misc.Unsafe.park(Native Method)", "java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)", "java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)", 2. ERROR true HttpSolrCall null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException: Async exception during distributed update: Error from server at 172.10.2.18:8983/solr/profileviews: Server Error request: http://172.10.2.18:8983/solr/profileviews/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F172.10.2.19%3A8983%2Fsolr%2Fprofileviews%2F&wt=javabin&version=2 Remote error message: empty String 3. So Many Threads like: "name":"qtp959447386-21", "state":"TIMED_WAITING", "lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6a1a2bf4 ", "cpuTime":"4522.0837ms", "userTime":"3770.0000ms", "stackTrace":["sun.misc.Unsafe.park(Native Method)", "java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)", "java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)", "org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392)", "org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:563)", "org.eclipse.jetty.util.thread.QueuedThreadPool.access$800(QueuedThreadPool.java:48)", "org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)", "java.lang.Thread.run(Thread.java:748)"