Solr = 4.6.1, attached solrcloud admin console view Zookeeper 3.4.5 = 3 node ensemble
In my test setup, I have 3 Node SolrCloud setup with 2 shard. Today we had power failure and all node went down. I started 3 node zookeeper ensemble first then followed with 3 node solrcloud, and one of replica ip address was change due to dynamic ip allocation but zookeeper clusterstate is not updated with new ip address and it was still holding old ip address for that bad node. Do I need to manually update clusterstate in zookeeper? what are my options if this could happen in production. Bad node: old IP:10.249.132.35 (still exist in zookeeper) new IP: 10.249.133.10 Log from Node1: 11:26:25,242 INFO [STDOUT] 49170786 [Thread-2-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader â A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 3) 11:26:41,072 INFO [STDOUT] 49186615 [RecoveryThread] INFO org.apache.solr.cloud.ZkController â publishing core=genre_shard1_replica1 state=recovering 11:26:41,079 INFO [STDOUT] 49186622 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy â Error while trying to recover. core=genre_shard1_replica1:org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://10.249.132.35:8080/solr 11:26:41,079 INFO [STDOUT] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:496) 11:26:41,079 INFO [STDOUT] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197) 11:26:41,079 INFO [STDOUT] at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:221) 11:26:41,079 INFO [STDOUT] at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:367) 11:26:41,079 INFO [STDOUT] at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:244) 11:26:41,079 INFO [STDOUT] Caused by: org.apache.http.conn.HttpHostConnectException: Connection to http://10.249.132.35:8080 refused 11:27:14,036 INFO [STDOUT] 49219580 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy â Recovery failed - trying again... (9) core=geo_shard1_replica1 11:27:14,037 INFO [STDOUT] 49219581 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy â Wait 600.0 seconds before trying to recover again (10) 11:27:14,958 INFO [STDOUT] 49220498 [Thread-40] INFO org.apache.solr.common.cloud.ZkStateReader â Updating cloud state from ZooKeeper... Log from bad node with new ip address: 11:06:29,551 INFO [STDOUT] 6234 [coreLoadExecutor-4-thread-10] INFO org.apache.solr.cloud.ShardLeaderElectionContext â Enough replicas found to continue. 11:06:29,552 INFO [STDOUT] 6236 [coreLoadExecutor-4-thread-10] INFO org.apache.solr.cloud.ShardLeaderElectionContext â I may be the new leader - try and sync 11:06:29,554 INFO [STDOUT] 6237 [coreLoadExecutor-4-thread-10] INFO org.apache.solr.cloud.SyncStrategy â Sync replicas to http://10.249.132.35:8080/solr/venue_shard2_replica2/ 11:06:29,555 INFO [STDOUT] 6239 [coreLoadExecutor-4-thread-10] INFO org.apache.solr.update.PeerSync â PeerSync: core=venue_shard2_replica2 url=http://10.249.132.35:8080/solr START replicas=[ http://10.249.132.56:8080/solr/venue_shard2_replica1/] nUpdates=100 11:06:29,556 INFO [STDOUT] 6240 [coreLoadExecutor-4-thread-10] INFO org.apache.solr.update.PeerSync â PeerSync: core=venue_shard2_replica2 url=http://10.249.132.35:8080/solr DONE. We have no versions. sync failed. 11:06:29,556 INFO [STDOUT] 6241 [coreLoadExecutor-4-thread-10] INFO org.apache.solr.cloud.SyncStrategy â Leader's attempt to sync with shard failed, moving to the next candidate 11:06:29,558 INFO [STDOUT] 6241 [coreLoadExecutor-4-thread-10] INFO org.apache.solr.cloud.ShardLeaderElectionContext â We failed sync, but we have no versions - we can't sync in that case - we were active before, so become leader anyway 11:06:29,559 INFO [STDOUT] 6243 [coreLoadExecutor-4-thread-10] INFO org.apache.solr.cloud.ShardLeaderElectionContext â I am the new leader: http://10.249.132.35:8080/solr/venue_shard2_replica2/ shard2 11:06:29,561 INFO [STDOUT] 6245 [coreLoadExecutor-4-thread-10] INFO org.apache.solr.common.cloud.SolrZkClient â makePath: /collections/venue/leaders/shard2 11:06:29,577 INFO [STDOUT] 6261 [Thread-2-EventThread] INFO org.apache.solr.update.PeerSync â PeerSync: core=event_shard2_replica2 url=http://10.249.132.35:8080/solr Received 18 versions from 10.249.132.56:8080/solr/event_shard2_replica1/ 11:06:29,578 INFO [STDOUT] 6263 [Thread-2-EventThread] INFO org.apache.solr.update.PeerSync â PeerSync: core=event_shard2_replica2 url=http://10.249.132.35:8080/solr Requesting updates from 10.249.132.56:8080/solr/event_shard2_replica1/n=10versions=[1457764666067386368, 1456709993140060160, 1456709989863260160, 1456709986075803648, 1456709971758546944, 1456709179685208064, 1456709137524064256, 1456709130040377344, 1456707444339113984, 1456707435417829376] 11:06:29,775 INFO [STDOUT] 6459 [Thread-2-EventThread] INFO org.apache.solr.update.processor.LogUpdateProcessor â [event_shard2_replica2] {add=[4319389 (1456707435417829376), 43185