RE: Nodes cannot recover and become unavailable
Hi Team, What is the preferred way to upgrade from SOLR 4.0.0-BETA to SOLR 4.0.0? We saw the same errors happening when we did the upgrade:- Oct 29, 2012 4:55:00 PM org.apache.solr.common.SolrException log SEVERE: Error while trying to recover. core=mediacms:org.apache.solr.common.SolrException: We are not the leader at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:401) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220) Thanks, Balaji -- View this message in context: http://lucene.472066.n3.nabble.com/Nodes-cannot-recover-and-become-unavailable-tp4008916p4017037.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Nodes cannot recover and become unavailable
It seems my clusterstate.json is still old. Is there a method to recreate is without taking all nodes down at the same time? -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Thu 20-Sep-2012 10:14 To: solr-user@lucene.apache.org Subject: RE: Nodes cannot recover and become unavailable Hi - at first i didn't recreate the Zookeeper data but i got it to work. I'll check the removal of the LOG line. thanks -Original message- From:Sami Siren ssi...@gmail.com Sent: Wed 19-Sep-2012 17:45 To: solr-user@lucene.apache.org Subject: Re: Nodes cannot recover and become unavailable also, did you re create the cluster after upgrading to a newer version? I believe there were some changes made to the clusterstate.json recently that are not backwards compatible. -- Sami Siren On Wed, Sep 19, 2012 at 6:21 PM, Sami Siren ssi...@gmail.com wrote: Hi, I am having troubles understanding the reason for that NPE. First you could try removing the line #102 in HttpClientUtility so that logging does not prevent creation of the http client in SyncStrategy. -- Sami Siren On Wed, Sep 19, 2012 at 5:29 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, Since the 2012-09-17 11:10:41 build shards start to have trouble coming back online. When i restart one node the slices on the other nodes are throwing exceptions and cannot be queried. I'm not sure how to remedy the problem but stopping a node or restarting it a few times seems to help it. The problem is when i restart a node, and it happens, i must not restart another node because that may trigger other slices becoming unavailable. Here are some parts of the log: 2012-09-19 14:13:18,149 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - trying again... core=oi_i 2012-09-19 14:13:25,818 WARN [solr.cloud.RecoveryStrategy] - [main-EventThread] - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i 2012-09-19 14:13:44,497 WARN [solr.cloud.RecoveryStrategy] - [Thread-4] - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_jcore=oi_j 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Error while trying to recover. core=oi_i:org.apache.solr.common.SolrException: We are not the leader at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220) 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - trying again... core=oi_i 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - max retries exceeded. core=oi_i 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - I give up. core=oi_i 2012-09-19 14:14:00,333 WARN [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : Sync request error: java.lang.NullPointerException ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : http://nl10.host:8080/solr/oi_i/: Could not tell a replica to recover:java.lang.NullPointerException at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305) at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:102) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:155) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:128) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:262) at org.apache.solr.cloud.SyncStrategy.requestRecovery(SyncStrategy.java:272) at org.apache.solr.cloud.SyncStrategy.syncToMe(SyncStrategy.java:203) at org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:125) at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:87) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:169) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102) at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:275
RE: Nodes cannot recover and become unavailable
Hi - at first i didn't recreate the Zookeeper data but i got it to work. I'll check the removal of the LOG line. thanks -Original message- From:Sami Siren ssi...@gmail.com Sent: Wed 19-Sep-2012 17:45 To: solr-user@lucene.apache.org Subject: Re: Nodes cannot recover and become unavailable also, did you re create the cluster after upgrading to a newer version? I believe there were some changes made to the clusterstate.json recently that are not backwards compatible. -- Sami Siren On Wed, Sep 19, 2012 at 6:21 PM, Sami Siren ssi...@gmail.com wrote: Hi, I am having troubles understanding the reason for that NPE. First you could try removing the line #102 in HttpClientUtility so that logging does not prevent creation of the http client in SyncStrategy. -- Sami Siren On Wed, Sep 19, 2012 at 5:29 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, Since the 2012-09-17 11:10:41 build shards start to have trouble coming back online. When i restart one node the slices on the other nodes are throwing exceptions and cannot be queried. I'm not sure how to remedy the problem but stopping a node or restarting it a few times seems to help it. The problem is when i restart a node, and it happens, i must not restart another node because that may trigger other slices becoming unavailable. Here are some parts of the log: 2012-09-19 14:13:18,149 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - trying again... core=oi_i 2012-09-19 14:13:25,818 WARN [solr.cloud.RecoveryStrategy] - [main-EventThread] - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i 2012-09-19 14:13:44,497 WARN [solr.cloud.RecoveryStrategy] - [Thread-4] - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_jcore=oi_j 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Error while trying to recover. core=oi_i:org.apache.solr.common.SolrException: We are not the leader at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220) 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - trying again... core=oi_i 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - max retries exceeded. core=oi_i 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - I give up. core=oi_i 2012-09-19 14:14:00,333 WARN [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : Sync request error: java.lang.NullPointerException ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : http://nl10.host:8080/solr/oi_i/: Could not tell a replica to recover:java.lang.NullPointerException at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305) at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:102) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:155) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:128) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:262) at org.apache.solr.cloud.SyncStrategy.requestRecovery(SyncStrategy.java:272) at org.apache.solr.cloud.SyncStrategy.syncToMe(SyncStrategy.java:203) at org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:125) at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:87) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:169) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102) at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:275) at org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:326) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:159) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102
Nodes cannot recover and become unavailable
Hi, Since the 2012-09-17 11:10:41 build shards start to have trouble coming back online. When i restart one node the slices on the other nodes are throwing exceptions and cannot be queried. I'm not sure how to remedy the problem but stopping a node or restarting it a few times seems to help it. The problem is when i restart a node, and it happens, i must not restart another node because that may trigger other slices becoming unavailable. Here are some parts of the log: 2012-09-19 14:13:18,149 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - trying again... core=oi_i 2012-09-19 14:13:25,818 WARN [solr.cloud.RecoveryStrategy] - [main-EventThread] - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i 2012-09-19 14:13:44,497 WARN [solr.cloud.RecoveryStrategy] - [Thread-4] - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_jcore=oi_j 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Error while trying to recover. core=oi_i:org.apache.solr.common.SolrException: We are not the leader at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220) 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - trying again... core=oi_i 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - max retries exceeded. core=oi_i 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - I give up. core=oi_i 2012-09-19 14:14:00,333 WARN [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : Sync request error: java.lang.NullPointerException ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : http://nl10.host:8080/solr/oi_i/: Could not tell a replica to recover:java.lang.NullPointerException at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305) at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:102) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:155) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:128) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:262) at org.apache.solr.cloud.SyncStrategy.requestRecovery(SyncStrategy.java:272) at org.apache.solr.cloud.SyncStrategy.syncToMe(SyncStrategy.java:203) at org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:125) at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:87) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:169) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102) at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:275) at org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:326) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:159) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102) at org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:56) at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:131) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) ERROR [apache.zookeeper.ClientCnxn] - [main-EventThread] - : Error while calling watcher java.lang.NullPointerException at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:139) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) ERROR [apache.zookeeper.ClientCnxn] - [main-EventThread] - : Error while calling watcher java.lang.NullPointerException at org.apache.solr.common.cloud.ZkStateReader$3.process(ZkStateReader.java:238) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) ERROR [apache.zookeeper.ClientCnxn]
Re: Nodes cannot recover and become unavailable
Hi, I am having troubles understanding the reason for that NPE. First you could try removing the line #102 in HttpClientUtility so that logging does not prevent creation of the http client in SyncStrategy. -- Sami Siren On Wed, Sep 19, 2012 at 5:29 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, Since the 2012-09-17 11:10:41 build shards start to have trouble coming back online. When i restart one node the slices on the other nodes are throwing exceptions and cannot be queried. I'm not sure how to remedy the problem but stopping a node or restarting it a few times seems to help it. The problem is when i restart a node, and it happens, i must not restart another node because that may trigger other slices becoming unavailable. Here are some parts of the log: 2012-09-19 14:13:18,149 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - trying again... core=oi_i 2012-09-19 14:13:25,818 WARN [solr.cloud.RecoveryStrategy] - [main-EventThread] - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i 2012-09-19 14:13:44,497 WARN [solr.cloud.RecoveryStrategy] - [Thread-4] - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_jcore=oi_j 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Error while trying to recover. core=oi_i:org.apache.solr.common.SolrException: We are not the leader at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220) 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - trying again... core=oi_i 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - max retries exceeded. core=oi_i 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - I give up. core=oi_i 2012-09-19 14:14:00,333 WARN [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : Sync request error: java.lang.NullPointerException ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : http://nl10.host:8080/solr/oi_i/: Could not tell a replica to recover:java.lang.NullPointerException at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305) at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:102) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:155) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:128) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:262) at org.apache.solr.cloud.SyncStrategy.requestRecovery(SyncStrategy.java:272) at org.apache.solr.cloud.SyncStrategy.syncToMe(SyncStrategy.java:203) at org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:125) at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:87) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:169) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102) at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:275) at org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:326) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:159) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102) at org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:56) at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:131) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) ERROR [apache.zookeeper.ClientCnxn] - [main-EventThread] - : Error while calling watcher java.lang.NullPointerException at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:139) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) ERROR
Re: Nodes cannot recover and become unavailable
also, did you re create the cluster after upgrading to a newer version? I believe there were some changes made to the clusterstate.json recently that are not backwards compatible. -- Sami Siren On Wed, Sep 19, 2012 at 6:21 PM, Sami Siren ssi...@gmail.com wrote: Hi, I am having troubles understanding the reason for that NPE. First you could try removing the line #102 in HttpClientUtility so that logging does not prevent creation of the http client in SyncStrategy. -- Sami Siren On Wed, Sep 19, 2012 at 5:29 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, Since the 2012-09-17 11:10:41 build shards start to have trouble coming back online. When i restart one node the slices on the other nodes are throwing exceptions and cannot be queried. I'm not sure how to remedy the problem but stopping a node or restarting it a few times seems to help it. The problem is when i restart a node, and it happens, i must not restart another node because that may trigger other slices becoming unavailable. Here are some parts of the log: 2012-09-19 14:13:18,149 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - trying again... core=oi_i 2012-09-19 14:13:25,818 WARN [solr.cloud.RecoveryStrategy] - [main-EventThread] - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i 2012-09-19 14:13:44,497 WARN [solr.cloud.RecoveryStrategy] - [Thread-4] - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_jcore=oi_j 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Error while trying to recover. core=oi_i:org.apache.solr.common.SolrException: We are not the leader at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220) 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - trying again... core=oi_i 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - max retries exceeded. core=oi_i 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - I give up. core=oi_i 2012-09-19 14:14:00,333 WARN [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : Sync request error: java.lang.NullPointerException ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : http://nl10.host:8080/solr/oi_i/: Could not tell a replica to recover:java.lang.NullPointerException at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305) at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:102) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:155) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:128) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:262) at org.apache.solr.cloud.SyncStrategy.requestRecovery(SyncStrategy.java:272) at org.apache.solr.cloud.SyncStrategy.syncToMe(SyncStrategy.java:203) at org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:125) at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:87) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:169) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102) at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:275) at org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:326) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:159) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102) at org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:56) at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:131) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) ERROR [apache.zookeeper.ClientCnxn] - [main-EventThread] - : Error while calling watcher
Re: Nodes cannot recover and become unavailable
bq. I believe there were some changes made to the clusterstate.json recently that are not backwards compatible. Indeed - I think yonik committed something the other day - we prob should send an email out about this. Not sure exactly how easy an upgrade is or what steps to take - it may be something like stop your whole cluster, delete clusterstate.json and then it works, or it may take more or less than that - if that's the issue here, i don't know, but it's likely an issue. On Wed, Sep 19, 2012 at 8:41 AM, Sami Siren ssi...@gmail.com wrote: also, did you re create the cluster after upgrading to a newer version? I believe there were some changes made to the clusterstate.json recently that are not backwards compatible. -- Sami Siren On Wed, Sep 19, 2012 at 6:21 PM, Sami Siren ssi...@gmail.com wrote: Hi, I am having troubles understanding the reason for that NPE. First you could try removing the line #102 in HttpClientUtility so that logging does not prevent creation of the http client in SyncStrategy. -- Sami Siren On Wed, Sep 19, 2012 at 5:29 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, Since the 2012-09-17 11:10:41 build shards start to have trouble coming back online. When i restart one node the slices on the other nodes are throwing exceptions and cannot be queried. I'm not sure how to remedy the problem but stopping a node or restarting it a few times seems to help it. The problem is when i restart a node, and it happens, i must not restart another node because that may trigger other slices becoming unavailable. Here are some parts of the log: 2012-09-19 14:13:18,149 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - trying again... core=oi_i 2012-09-19 14:13:25,818 WARN [solr.cloud.RecoveryStrategy] - [main-EventThread] - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i 2012-09-19 14:13:44,497 WARN [solr.cloud.RecoveryStrategy] - [Thread-4] - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_jcore=oi_j 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Error while trying to recover. core=oi_i:org.apache.solr.common.SolrException: We are not the leader at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220) 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - trying again... core=oi_i 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - max retries exceeded. core=oi_i 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - I give up. core=oi_i 2012-09-19 14:14:00,333 WARN [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : Sync request error: java.lang.NullPointerException ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : http://nl10.host:8080/solr/oi_i/: Could not tell a replica to recover:java.lang.NullPointerException at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305) at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:102) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:155) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:128) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:262) at org.apache.solr.cloud.SyncStrategy.requestRecovery(SyncStrategy.java:272) at org.apache.solr.cloud.SyncStrategy.syncToMe(SyncStrategy.java:203) at org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:125) at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:87) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:169) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102) at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:275) at org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:326) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:159) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
Re: Nodes cannot recover and become unavailable
On Wed, Sep 19, 2012 at 4:25 PM, Mark Miller markrmil...@gmail.com wrote: bq. I believe there were some changes made to the clusterstate.json recently that are not backwards compatible. Indeed - I think yonik committed something the other day - we prob should send an email out about this. Yeah, I was just in the process of committing another change, updating CHANGES and sending a message. -Yonik http://lucidworks.com