thanks Mark. will further dig into the logs. there is another problem
related.

we have collections with 3 shards (2 nodes in one shard), the collection
have about 1000 records in it. but unfortunately that after the leader is
down, replica node failed to become the leader.the detail is : after the
leader node is down, replica node try to become the new leader, but it said

=======================
ShardLeaderElectionContext.runLeaderProcess(131) - Running the leader
process.
ShardLeaderElectionContext.shouldIBeLeader(331) - Checking if I should try
and be the leader.
ShardLeaderElectionContext.shouldIBeLeader(339) - My last published State
was Active, it's okay to be the leader.
ShardLeaderElectionContext.runLeaderProcess(164) - I may be the new leader
- try and sync
SyncStrategy.sync(89) - Sync replicas to
http://localhost:8486/solr/exception/
PeerSync.sync(182) - PeerSync: core=exception
url=http://localhost:8486/solr START
replicas=[http://localhost:8483/solr/exception/] nUpdates=100
PeerSync.sync(250) - PeerSync: core=exception
url=http://localhost:8486/solr DONE.
 We have no versions.  sync failed.
SyncStrategy.log(114) - Sync Failed
ShardLeaderElectionContext.rejoinLeaderElection(311) - There is a better
leader candidate than us - going back into recovery
DefaultSolrCoreState.doRecovery(214) - Running recovery - first canceling
any ongoing recovery
========================

after that, it try to recovery from the leader node, which is already down.
then recovery + failed + recovery.....

is it related to SOLR-3939 and SOLR-3940? but the index data isn't empty.


On Thu, Jan 10, 2013 at 10:09 AM, Mark Miller <markrmil...@gmail.com> wrote:

> It may be able to do that because it's forwarding requests to other nodes
> that are up?
>
> Would be good to dig into the logs to see if you can narrow in on the
> reason for the recovery_failed.
>
> - Mark
>
> On Jan 9, 2013, at 8:52 PM, Zeng Lames <lezhi.z...@gmail.com> wrote:
>
> > Hi ,
> >
> > we meet below strange case in production environment. from the Solr Admin
> > Console -> Cloud -> Graph, we can find that one node is in
> recovery_failed
> > status. but at the same time, we found that the recovery_failed node can
> > server query/update request normally.
> >
> > any idea about it? thanks!
> >
> > --
> > Best Wishes!
> > Lames
>
>


-- 
Best Wishes!
Lames

Reply via email to