Hmmm, I'm puzzled then. I'm guessing that the node
that keeps going down is the follower, which means
it should have _less_ work to do than the node that
stays up. Not a lot less, but less still.

I'd try lengthening out my commit interval. I realize you've
set it to 2 seconds for a reason, this is mostly to see
if it has any effect and have a place to _start_ looking.

I'm assuming your hard commit has openSearcher set to false.

Just to double check, these two nodes are just a leader and
follower, right? IOW, they're part of the same collection,
your collection just has one shard.

m/c configuration? What's that? If it's a typo for m/s
(master/slave) then that may be an issue. In a SolrCloud
setup there is no master/slave and you shouldn't configure
them....

Best,
Erick

On Wed, Sep 3, 2014 at 8:52 PM, Ethan <eh198...@gmail.com> wrote:
> Erick,
>
> It is just one shard.  Indexing traffic is going to the other node and then
> synched with this one(both are part of cloud).  We kept that setting
> running for 5 days as defective node would just go down with search
> traffic.  So both were in sync when search was turned on.  Soft commit is
> very low, around 2 secs, but that doesn't seem to affect the other node
> which is functioning normally.
>
> Memory settings for both nodes are identical, including m/c configuration.
>
> On Wed, Sep 3, 2014 at 4:23 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> Do you have indexing traffic going to it? b/c this _looks_
>> like the node is just starting up or a searcher is
>> being opened and you're loading your
>> index first time. This happens when you index data and
>> when you start up your nodes. Adding some autowarming
>> (firstSearcher in this case) might load up the underlying
>> caches earlier. This could also be a problem due to
>> very short commit intervals, although this latter should
>> be identical for both nodes.
>>
>> And when you say 2 solr nodes, is this one shard or two?
>>
>> I'm guessing that you have some setting that's significantly
>> different, memory perhaps?
>>
>> Best,
>> Erick
>>
>>
>>
>> On Wed, Sep 3, 2014 at 2:40 PM, Ethan <eh198...@gmail.com> wrote:
>> > Forgot to add the source thread thats blocking every other thread
>> >
>> >
>> > "http-bio-52158-exec-61" - Thread t@591
>> >    java.lang.Thread.State: RUNNABLE
>> >  at
>> >
>> org.apache.lucene.search.FieldCacheImpl$Uninvert.uninvert(FieldCacheImpl.java:312)
>> > at
>> >
>> org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:986)
>> >  at
>> >
>> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:212)
>> > - locked org.apache.lucene.search.FieldCache$CreationPlaceholder@29e0400b
>> >  at
>> > org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:901)
>> > at
>> >
>> org.apache.lucene.search.FieldComparator$LongComparator.setNextReader(FieldComparator.java:685)
>> >  at
>> >
>> org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:97)
>> > at
>> >
>> org.apache.lucene.search.TimeLimitingCollector.setNextReader(TimeLimitingCollector.java:158)
>> >  at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
>> > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
>> >  at
>> >
>> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1501)
>> > at
>> >
>> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1367)
>> >  at
>> >
>> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:474)
>> > at
>> >
>> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:434)
>> >  at
>> >
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
>> > at
>> >
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>> >  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
>> >  at
>> >
>> com.trimp.search.filter.LogAndAuthFilter.execute(LogAndAuthFilter.scala:109)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
>> >  at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
>> > at
>> >
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>> >  at
>> >
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>> > at
>> >
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
>> >  at
>> >
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
>> > at
>> >
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
>> >  at
>> >
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
>> > at
>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947)
>> >  at
>> org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:680)
>> > at
>> >
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
>> >  at
>> >
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
>> > at
>> >
>> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1009)
>> >  at
>> >
>> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
>> > at
>> >
>> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
>> >  - locked org.apache.tomcat.util.net.SocketWrapper@7826692
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >  at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> > at java.lang.Thread.run(Thread.java:722)
>> >
>> >    Locked ownable synchronizers:
>> > - locked java.util.concurrent.ThreadPoolExecutor$Worker@2463aef
>> >
>> >
>> > On Wed, Sep 3, 2014 at 2:31 PM, Ethan <eh198...@gmail.com> wrote:
>> >
>> >> We have SolrCloud instance with 2 solr nodes and 3 zk ensemble.  One of
>> >> the solr node goes down as soon as we send search traffic to it, but
>> update
>> >> works fine.
>> >>
>> >> When I analyzed thread dump I saw lot of blocked threads with following
>> >> error message.  This explains why it couldn't create any native threads
>> and
>> >> ran out of memory.  The thread count went from 48 to 900 within minutes
>> and
>> >> server came down.  The other node with same configuration is taking all
>> the
>> >> search and update traffic, and it running fine.
>> >>
>> >> Any pointers would be appreciated.
>> >>
>> >> http-bio-52158-exec-59" - Thread t@589
>> >>    java.lang.Thread.State: BLOCKED on
>> >> org.apache.lucene.search.FieldCache$CreationPlaceholder@29e0400b owned
>> >> by: http-bio-52158-exec-61
>> >>  at
>> >>
>> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:209)
>> >> at
>> >>
>> org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:901)
>> >>  at
>> >>
>> org.apache.lucene.search.FieldComparator$LongComparator.setNextReader(FieldComparator.java:685)
>> >> at
>> >>
>> org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:97)
>> >>  at
>> >>
>> org.apache.lucene.search.TimeLimitingCollector.setNextReader(TimeLimitingCollector.java:158)
>> >> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
>> >>  at
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
>> >> at
>> >>
>> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1501)
>> >>  at
>> >>
>> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1367)
>> >> at
>> >>
>> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:474)
>> >>  at
>> >>
>> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:434)
>> >> at
>> >>
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
>> >>  at
>> >>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>> >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
>> >>  at
>> >>
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
>> >> at
>> >>
>> com.trimp.search.filter.LogAndAuthFilter.execute(LogAndAuthFilter.scala:109)
>> >>  at
>> >>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
>> >> at
>> >>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
>> >>  at
>> >>
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>> >> at
>> >>
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>> >>  at
>> >>
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
>> >> at
>> >>
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
>> >>  at
>> >>
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
>> >> at
>> >>
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
>> >>  at
>> >>
>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947)
>> >> at
>> org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:680)
>> >>  at
>> >>
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
>> >> at
>> >>
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
>> >>  at
>> >>
>> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1009)
>> >> at
>> >>
>> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
>> >>  at
>> >>
>> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
>> >> - locked org.apache.tomcat.util.net.SocketWrapper@5b4530c8
>> >>  at
>> >>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >> at
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> >>  at java.lang.Thread.run(Thread.java:722)
>> >>
>> >>    Locked ownable synchronizers:
>> >> - locked java.util.concurrent.ThreadPoolExecutor$Worker@63d2720
>> >>
>> >> -E
>> >>
>>

Reply via email to