Erick, It is just one shard. Indexing traffic is going to the other node and then synched with this one(both are part of cloud). We kept that setting running for 5 days as defective node would just go down with search traffic. So both were in sync when search was turned on. Soft commit is very low, around 2 secs, but that doesn't seem to affect the other node which is functioning normally.
Memory settings for both nodes are identical, including m/c configuration. On Wed, Sep 3, 2014 at 4:23 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Do you have indexing traffic going to it? b/c this _looks_ > like the node is just starting up or a searcher is > being opened and you're loading your > index first time. This happens when you index data and > when you start up your nodes. Adding some autowarming > (firstSearcher in this case) might load up the underlying > caches earlier. This could also be a problem due to > very short commit intervals, although this latter should > be identical for both nodes. > > And when you say 2 solr nodes, is this one shard or two? > > I'm guessing that you have some setting that's significantly > different, memory perhaps? > > Best, > Erick > > > > On Wed, Sep 3, 2014 at 2:40 PM, Ethan <eh198...@gmail.com> wrote: > > Forgot to add the source thread thats blocking every other thread > > > > > > "http-bio-52158-exec-61" - Thread t@591 > > java.lang.Thread.State: RUNNABLE > > at > > > org.apache.lucene.search.FieldCacheImpl$Uninvert.uninvert(FieldCacheImpl.java:312) > > at > > > org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:986) > > at > > > org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:212) > > - locked org.apache.lucene.search.FieldCache$CreationPlaceholder@29e0400b > > at > > org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:901) > > at > > > org.apache.lucene.search.FieldComparator$LongComparator.setNextReader(FieldComparator.java:685) > > at > > > org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:97) > > at > > > org.apache.lucene.search.TimeLimitingCollector.setNextReader(TimeLimitingCollector.java:158) > > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618) > > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) > > at > > > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1501) > > at > > > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1367) > > at > > > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:474) > > at > > > org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:434) > > at > > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) > > at > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) > > at > > > com.trimp.search.filter.LogAndAuthFilter.execute(LogAndAuthFilter.scala:109) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) > > at > > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) > > at > > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) > > at > > > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) > > at > > > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) > > at > > > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) > > at > > > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) > > at > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947) > > at > org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:680) > > at > > > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) > > at > > > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) > > at > > > org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1009) > > at > > > org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) > > at > > > org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) > > - locked org.apache.tomcat.util.net.SocketWrapper@7826692 > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > > at java.lang.Thread.run(Thread.java:722) > > > > Locked ownable synchronizers: > > - locked java.util.concurrent.ThreadPoolExecutor$Worker@2463aef > > > > > > On Wed, Sep 3, 2014 at 2:31 PM, Ethan <eh198...@gmail.com> wrote: > > > >> We have SolrCloud instance with 2 solr nodes and 3 zk ensemble. One of > >> the solr node goes down as soon as we send search traffic to it, but > update > >> works fine. > >> > >> When I analyzed thread dump I saw lot of blocked threads with following > >> error message. This explains why it couldn't create any native threads > and > >> ran out of memory. The thread count went from 48 to 900 within minutes > and > >> server came down. The other node with same configuration is taking all > the > >> search and update traffic, and it running fine. > >> > >> Any pointers would be appreciated. > >> > >> http-bio-52158-exec-59" - Thread t@589 > >> java.lang.Thread.State: BLOCKED on > >> org.apache.lucene.search.FieldCache$CreationPlaceholder@29e0400b owned > >> by: http-bio-52158-exec-61 > >> at > >> > org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:209) > >> at > >> > org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:901) > >> at > >> > org.apache.lucene.search.FieldComparator$LongComparator.setNextReader(FieldComparator.java:685) > >> at > >> > org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:97) > >> at > >> > org.apache.lucene.search.TimeLimitingCollector.setNextReader(TimeLimitingCollector.java:158) > >> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618) > >> at > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) > >> at > >> > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1501) > >> at > >> > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1367) > >> at > >> > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:474) > >> at > >> > org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:434) > >> at > >> > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) > >> at > >> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) > >> at > >> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) > >> at > >> > com.trimp.search.filter.LogAndAuthFilter.execute(LogAndAuthFilter.scala:109) > >> at > >> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) > >> at > >> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) > >> at > >> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) > >> at > >> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) > >> at > >> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) > >> at > >> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) > >> at > >> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) > >> at > >> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) > >> at > >> > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947) > >> at > org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:680) > >> at > >> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) > >> at > >> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) > >> at > >> > org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1009) > >> at > >> > org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) > >> at > >> > org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) > >> - locked org.apache.tomcat.util.net.SocketWrapper@5b4530c8 > >> at > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > >> at > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > >> at java.lang.Thread.run(Thread.java:722) > >> > >> Locked ownable synchronizers: > >> - locked java.util.concurrent.ThreadPoolExecutor$Worker@63d2720 > >> > >> -E > >> >