Did you upgrade recently to Solr 4.7? 4.7 has a bad bug which can cause out of memory issues. Can you check your logs for out of memory errors?
On Sun, Mar 23, 2014 at 9:07 PM, Lukas Mikuckis <lukasmikuc...@gmail.com> wrote: > Solr version: 4.7 > > Architecture: > 2 solrs (1 shard, leader + replica) > 3 zookeepers > > Servers: > * zookeeper + solr (heap 4gb) - RAM 8gb, 2 cpu cores > * zookeeper + solr (heap 4gb) - RAM 8gb, 2 cpu cores > * zookeeper > > Solr data: > * 21 collections > * Many fields, small docs, docs count per collection from 1k to 500k > > About a week ago solr started crashing. It crashes every day, 3-4 times a > day. Usually at nigh. I can't tell anything what could it be related to > because at that time we haven't done any configuration changes. Load > haven't changed too. > > > Everything starts with Stopping recovery for .. warnings (every warnings is > repeated several times): > > WARN org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for > zkNodeName=core_node1core=****************** > > WARN org.apache.solr.cloud.ElectionContext; cancelElection did not find > election node to remove > > WARN org.apache.solr.update.PeerSync; no frame of reference to tell if > we've missed updates > > WARN - 2014-03-23 04:00:26.286; org.apache.solr.update.PeerSync; no frame > of reference to tell if we've missed updates > > WARN - 2014-03-23 04:00:30.728; org.apache.solr.handler.SnapPuller; File > _f9m_Lucene41_0.doc expected to be 6218278 while it is 7759879 > > WARN - 2014-03-23 04:00:54.126; > org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay > tlog{file=/path/solr/collection1_shard1_replica2/data/tlog/tlog.0000000000000003272 > refcount=2} active=true starting pos=356216606 > > Then again Stopping recovery for .. warnings: > > WARN org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for > zkNodeName=core_node1core=****************** > > ERROR - 2014-03-23 05:19:29.566; org.apache.solr.common.SolrException; > org.apache.solr.common.SolrException: No registered leader was found after > waiting for 4000ms , collection: collection1 slice: shard1 > > ERROR - 2014-03-23 05:20:03.961; org.apache.solr.common.SolrException; > org.apache.solr.common.SolrException: I was asked to wait on state down for > IP:PORT_solr but I still do not see the requested state. I see state: > active live:false > > > After this serves mostly didn't recover. -- Regards, Shalin Shekhar Mangar.