We have SolrCloud instance with 2 solr nodes and 3 zk ensemble. One of the solr node goes down as soon as we send search traffic to it, but update works fine.
When I analyzed thread dump I saw lot of blocked threads with following error message. This explains why it couldn't create any native threads and ran out of memory. The thread count went from 48 to 900 within minutes and server came down. The other node with same configuration is taking all the search and update traffic, and it running fine. Any pointers would be appreciated. http-bio-52158-exec-59" - Thread t@589 java.lang.Thread.State: BLOCKED on org.apache.lucene.search.FieldCache$CreationPlaceholder@29e0400b owned by: http-bio-52158-exec-61 at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:209) at org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:901) at org.apache.lucene.search.FieldComparator$LongComparator.setNextReader(FieldComparator.java:685) at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:97) at org.apache.lucene.search.TimeLimitingCollector.setNextReader(TimeLimitingCollector.java:158) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1501) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1367) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:474) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:434) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at com.trimp.search.filter.LogAndAuthFilter.execute(LogAndAuthFilter.scala:109) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947) at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:680) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1009) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) - locked org.apache.tomcat.util.net.SocketWrapper@5b4530c8 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Locked ownable synchronizers: - locked java.util.concurrent.ThreadPoolExecutor$Worker@63d2720 -E