RE: Nodes cannot recover and become unavailable

2012-10-30 Thread balaji.gandhi
Hi Team,

What is the preferred way to upgrade from SOLR 4.0.0-BETA to SOLR 4.0.0?

We saw the same errors happening when we did the upgrade:-

Oct 29, 2012 4:55:00 PM org.apache.solr.common.SolrException log
SEVERE: Error while trying to recover.
core=mediacms:org.apache.solr.common.SolrException: We are not the leader
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:401)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)

Thanks,
Balaji



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Nodes-cannot-recover-and-become-unavailable-tp4008916p4017037.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Nodes cannot recover and become unavailable

2012-09-24 Thread Markus Jelsma
It seems my clusterstate.json is still old.  Is there a method to recreate is 
without taking all nodes down at the same time?

 
 
-Original message-
 From:Markus Jelsma markus.jel...@openindex.io
 Sent: Thu 20-Sep-2012 10:14
 To: solr-user@lucene.apache.org
 Subject: RE: Nodes cannot recover and become unavailable
 
 Hi - at first i didn't recreate the Zookeeper data but i got it to work. I'll 
 check the removal of the LOG line.
 
 thanks
  
 -Original message-
  From:Sami Siren ssi...@gmail.com
  Sent: Wed 19-Sep-2012 17:45
  To: solr-user@lucene.apache.org
  Subject: Re: Nodes cannot recover and become unavailable
  
  also, did you re create the cluster after upgrading to a newer
  version? I believe there were some changes made to the
  clusterstate.json recently that are not backwards compatible.
  
  --
   Sami Siren
  
  
  
  On Wed, Sep 19, 2012 at 6:21 PM, Sami Siren ssi...@gmail.com wrote:
   Hi,
  
   I am having troubles understanding the reason for that NPE.
  
   First you could try removing the line #102 in HttpClientUtility so
   that logging does not prevent creation of the http client in
   SyncStrategy.
  
   --
Sami Siren
  
   On Wed, Sep 19, 2012 at 5:29 PM, Markus Jelsma
   markus.jel...@openindex.io wrote:
   Hi,
  
   Since the 2012-09-17 11:10:41 build shards start to have trouble coming 
   back online. When i restart one node the slices on the other nodes are 
   throwing exceptions and cannot be queried. I'm not sure how to remedy 
   the problem but stopping a node or restarting it a few times seems to 
   help it. The problem is when i restart a node, and it happens, i must 
   not restart another node because that may trigger other slices becoming 
   unavailable.
  
   Here are some parts of the log:
  
   2012-09-19 14:13:18,149 ERROR [solr.cloud.RecoveryStrategy] - 
   [RecoveryThread] - : Recovery failed - trying again... core=oi_i
   2012-09-19 14:13:25,818 WARN [solr.cloud.RecoveryStrategy] - 
   [main-EventThread] - : Stopping recovery for 
   zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
   2012-09-19 14:13:44,497 WARN [solr.cloud.RecoveryStrategy] - [Thread-4] 
   - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_jcore=oi_j
   2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
   [RecoveryThread] - : Error while trying to recover. 
   core=oi_i:org.apache.solr.common.SolrException: We are not the leader
   at 
   org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
   at 
   org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
   at 
   org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199)
   at 
   org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388)
   at 
   org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
  
   2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
   [RecoveryThread] - : Recovery failed - trying again... core=oi_i
   2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
   [RecoveryThread] - : Recovery failed - max retries exceeded. core=oi_i
   2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
   [RecoveryThread] - : Recovery failed - I give up. core=oi_i
   2012-09-19 14:14:00,333 WARN [solr.cloud.RecoveryStrategy] - 
   [RecoveryThread] - : Stopping recovery for 
   zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : Sync request 
   error: java.lang.NullPointerException
ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : 
   http://nl10.host:8080/solr/oi_i/: Could not tell a replica to 
   recover:java.lang.NullPointerException
   at 
   org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
   at 
   org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:102)
   at 
   org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:155)
   at 
   org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:128)
   at 
   org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:262)
   at 
   org.apache.solr.cloud.SyncStrategy.requestRecovery(SyncStrategy.java:272)
   at 
   org.apache.solr.cloud.SyncStrategy.syncToMe(SyncStrategy.java:203)
   at 
   org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:125)
   at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:87)
   at 
   org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:169)
   at 
   org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
   at 
   org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102)
   at 
   org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:275

RE: Nodes cannot recover and become unavailable

2012-09-20 Thread Markus Jelsma
Hi - at first i didn't recreate the Zookeeper data but i got it to work. I'll 
check the removal of the LOG line.

thanks
 
-Original message-
 From:Sami Siren ssi...@gmail.com
 Sent: Wed 19-Sep-2012 17:45
 To: solr-user@lucene.apache.org
 Subject: Re: Nodes cannot recover and become unavailable
 
 also, did you re create the cluster after upgrading to a newer
 version? I believe there were some changes made to the
 clusterstate.json recently that are not backwards compatible.
 
 --
  Sami Siren
 
 
 
 On Wed, Sep 19, 2012 at 6:21 PM, Sami Siren ssi...@gmail.com wrote:
  Hi,
 
  I am having troubles understanding the reason for that NPE.
 
  First you could try removing the line #102 in HttpClientUtility so
  that logging does not prevent creation of the http client in
  SyncStrategy.
 
  --
   Sami Siren
 
  On Wed, Sep 19, 2012 at 5:29 PM, Markus Jelsma
  markus.jel...@openindex.io wrote:
  Hi,
 
  Since the 2012-09-17 11:10:41 build shards start to have trouble coming 
  back online. When i restart one node the slices on the other nodes are 
  throwing exceptions and cannot be queried. I'm not sure how to remedy the 
  problem but stopping a node or restarting it a few times seems to help it. 
  The problem is when i restart a node, and it happens, i must not restart 
  another node because that may trigger other slices becoming unavailable.
 
  Here are some parts of the log:
 
  2012-09-19 14:13:18,149 ERROR [solr.cloud.RecoveryStrategy] - 
  [RecoveryThread] - : Recovery failed - trying again... core=oi_i
  2012-09-19 14:13:25,818 WARN [solr.cloud.RecoveryStrategy] - 
  [main-EventThread] - : Stopping recovery for 
  zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
  2012-09-19 14:13:44,497 WARN [solr.cloud.RecoveryStrategy] - [Thread-4] - 
  : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_jcore=oi_j
  2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
  [RecoveryThread] - : Error while trying to recover. 
  core=oi_i:org.apache.solr.common.SolrException: We are not the leader
  at 
  org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
  at 
  org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
  at 
  org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199)
  at 
  org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388)
  at 
  org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
 
  2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
  [RecoveryThread] - : Recovery failed - trying again... core=oi_i
  2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
  [RecoveryThread] - : Recovery failed - max retries exceeded. core=oi_i
  2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
  [RecoveryThread] - : Recovery failed - I give up. core=oi_i
  2012-09-19 14:14:00,333 WARN [solr.cloud.RecoveryStrategy] - 
  [RecoveryThread] - : Stopping recovery for 
  zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
   ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : Sync request 
  error: java.lang.NullPointerException
   ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : 
  http://nl10.host:8080/solr/oi_i/: Could not tell a replica to 
  recover:java.lang.NullPointerException
  at 
  org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
  at 
  org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:102)
  at 
  org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:155)
  at 
  org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:128)
  at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:262)
  at 
  org.apache.solr.cloud.SyncStrategy.requestRecovery(SyncStrategy.java:272)
  at 
  org.apache.solr.cloud.SyncStrategy.syncToMe(SyncStrategy.java:203)
  at 
  org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:125)
  at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:87)
  at 
  org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:169)
  at 
  org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
  at 
  org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102)
  at 
  org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:275)
  at 
  org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:326)
  at 
  org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:159)
  at 
  org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
  at 
  org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102

Nodes cannot recover and become unavailable

2012-09-19 Thread Markus Jelsma
Hi,

Since the 2012-09-17 11:10:41 build shards start to have trouble coming back 
online. When i restart one node the slices on the other nodes are throwing 
exceptions and cannot be queried. I'm not sure how to remedy the problem but 
stopping a node or restarting it a few times seems to help it. The problem is 
when i restart a node, and it happens, i must not restart another node because 
that may trigger other slices becoming unavailable.

Here are some parts of the log:

2012-09-19 14:13:18,149 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] 
- : Recovery failed - trying again... core=oi_i
2012-09-19 14:13:25,818 WARN [solr.cloud.RecoveryStrategy] - [main-EventThread] 
- : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
2012-09-19 14:13:44,497 WARN [solr.cloud.RecoveryStrategy] - [Thread-4] - : 
Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_jcore=oi_j
2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] 
- : Error while trying to recover. 
core=oi_i:org.apache.solr.common.SolrException: We are not the leader
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
at 
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)

2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] 
- : Recovery failed - trying again... core=oi_i
2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] 
- : Recovery failed - max retries exceeded. core=oi_i
2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] 
- : Recovery failed - I give up. core=oi_i
2012-09-19 14:14:00,333 WARN [solr.cloud.RecoveryStrategy] - [RecoveryThread] - 
: Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
 ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : Sync request error: 
java.lang.NullPointerException
 ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : 
http://nl10.host:8080/solr/oi_i/: Could not tell a replica to 
recover:java.lang.NullPointerException
at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
at 
org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:102)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:155)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:128)
at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:262)
at 
org.apache.solr.cloud.SyncStrategy.requestRecovery(SyncStrategy.java:272)
at org.apache.solr.cloud.SyncStrategy.syncToMe(SyncStrategy.java:203)
at 
org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:125)
at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:87)
at 
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:169)
at 
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102)
at 
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:275)
at 
org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:326)
at 
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:159)
at 
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102)
at org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:56)
at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:131)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)

 ERROR [apache.zookeeper.ClientCnxn] - [main-EventThread] - : Error while 
calling watcher 
java.lang.NullPointerException
at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:139)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 ERROR [apache.zookeeper.ClientCnxn] - [main-EventThread] - : Error while 
calling watcher 
java.lang.NullPointerException
at 
org.apache.solr.common.cloud.ZkStateReader$3.process(ZkStateReader.java:238)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 ERROR [apache.zookeeper.ClientCnxn] 

Re: Nodes cannot recover and become unavailable

2012-09-19 Thread Sami Siren
Hi,

I am having troubles understanding the reason for that NPE.

First you could try removing the line #102 in HttpClientUtility so
that logging does not prevent creation of the http client in
SyncStrategy.

--
 Sami Siren

On Wed, Sep 19, 2012 at 5:29 PM, Markus Jelsma
markus.jel...@openindex.io wrote:
 Hi,

 Since the 2012-09-17 11:10:41 build shards start to have trouble coming back 
 online. When i restart one node the slices on the other nodes are throwing 
 exceptions and cannot be queried. I'm not sure how to remedy the problem but 
 stopping a node or restarting it a few times seems to help it. The problem is 
 when i restart a node, and it happens, i must not restart another node 
 because that may trigger other slices becoming unavailable.

 Here are some parts of the log:

 2012-09-19 14:13:18,149 ERROR [solr.cloud.RecoveryStrategy] - 
 [RecoveryThread] - : Recovery failed - trying again... core=oi_i
 2012-09-19 14:13:25,818 WARN [solr.cloud.RecoveryStrategy] - 
 [main-EventThread] - : Stopping recovery for 
 zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
 2012-09-19 14:13:44,497 WARN [solr.cloud.RecoveryStrategy] - [Thread-4] - : 
 Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_jcore=oi_j
 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
 [RecoveryThread] - : Error while trying to recover. 
 core=oi_i:org.apache.solr.common.SolrException: We are not the leader
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
 at 
 org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199)
 at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388)
 at 
 org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)

 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
 [RecoveryThread] - : Recovery failed - trying again... core=oi_i
 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
 [RecoveryThread] - : Recovery failed - max retries exceeded. core=oi_i
 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
 [RecoveryThread] - : Recovery failed - I give up. core=oi_i
 2012-09-19 14:14:00,333 WARN [solr.cloud.RecoveryStrategy] - [RecoveryThread] 
 - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
  ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : Sync request error: 
 java.lang.NullPointerException
  ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : 
 http://nl10.host:8080/solr/oi_i/: Could not tell a replica to 
 recover:java.lang.NullPointerException
 at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
 at 
 org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:102)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:155)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:128)
 at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:262)
 at 
 org.apache.solr.cloud.SyncStrategy.requestRecovery(SyncStrategy.java:272)
 at org.apache.solr.cloud.SyncStrategy.syncToMe(SyncStrategy.java:203)
 at 
 org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:125)
 at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:87)
 at 
 org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:169)
 at 
 org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
 at 
 org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102)
 at 
 org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:275)
 at 
 org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:326)
 at 
 org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:159)
 at 
 org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
 at 
 org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102)
 at 
 org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:56)
 at 
 org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:131)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)

  ERROR [apache.zookeeper.ClientCnxn] - [main-EventThread] - : Error while 
 calling watcher
 java.lang.NullPointerException
 at 
 org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:139)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
  ERROR 

Re: Nodes cannot recover and become unavailable

2012-09-19 Thread Sami Siren
also, did you re create the cluster after upgrading to a newer
version? I believe there were some changes made to the
clusterstate.json recently that are not backwards compatible.

--
 Sami Siren



On Wed, Sep 19, 2012 at 6:21 PM, Sami Siren ssi...@gmail.com wrote:
 Hi,

 I am having troubles understanding the reason for that NPE.

 First you could try removing the line #102 in HttpClientUtility so
 that logging does not prevent creation of the http client in
 SyncStrategy.

 --
  Sami Siren

 On Wed, Sep 19, 2012 at 5:29 PM, Markus Jelsma
 markus.jel...@openindex.io wrote:
 Hi,

 Since the 2012-09-17 11:10:41 build shards start to have trouble coming back 
 online. When i restart one node the slices on the other nodes are throwing 
 exceptions and cannot be queried. I'm not sure how to remedy the problem but 
 stopping a node or restarting it a few times seems to help it. The problem 
 is when i restart a node, and it happens, i must not restart another node 
 because that may trigger other slices becoming unavailable.

 Here are some parts of the log:

 2012-09-19 14:13:18,149 ERROR [solr.cloud.RecoveryStrategy] - 
 [RecoveryThread] - : Recovery failed - trying again... core=oi_i
 2012-09-19 14:13:25,818 WARN [solr.cloud.RecoveryStrategy] - 
 [main-EventThread] - : Stopping recovery for 
 zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
 2012-09-19 14:13:44,497 WARN [solr.cloud.RecoveryStrategy] - [Thread-4] - : 
 Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_jcore=oi_j
 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
 [RecoveryThread] - : Error while trying to recover. 
 core=oi_i:org.apache.solr.common.SolrException: We are not the leader
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
 at 
 org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199)
 at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388)
 at 
 org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)

 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
 [RecoveryThread] - : Recovery failed - trying again... core=oi_i
 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
 [RecoveryThread] - : Recovery failed - max retries exceeded. core=oi_i
 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
 [RecoveryThread] - : Recovery failed - I give up. core=oi_i
 2012-09-19 14:14:00,333 WARN [solr.cloud.RecoveryStrategy] - 
 [RecoveryThread] - : Stopping recovery for 
 zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
  ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : Sync request 
 error: java.lang.NullPointerException
  ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : 
 http://nl10.host:8080/solr/oi_i/: Could not tell a replica to 
 recover:java.lang.NullPointerException
 at 
 org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
 at 
 org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:102)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:155)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:128)
 at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:262)
 at 
 org.apache.solr.cloud.SyncStrategy.requestRecovery(SyncStrategy.java:272)
 at org.apache.solr.cloud.SyncStrategy.syncToMe(SyncStrategy.java:203)
 at 
 org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:125)
 at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:87)
 at 
 org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:169)
 at 
 org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
 at 
 org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102)
 at 
 org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:275)
 at 
 org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:326)
 at 
 org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:159)
 at 
 org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
 at 
 org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102)
 at 
 org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:56)
 at 
 org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:131)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)

  ERROR [apache.zookeeper.ClientCnxn] - [main-EventThread] - : Error while 
 calling watcher
 

Re: Nodes cannot recover and become unavailable

2012-09-19 Thread Mark Miller
bq. I believe there were some changes made to the clusterstate.json
recently that are not backwards compatible.

Indeed - I think yonik committed something the other day - we prob
should send an email out about this. Not sure exactly how easy an
upgrade is or what steps to take - it may be something like stop your
whole cluster, delete clusterstate.json and then it works, or it may
take more or less than that - if that's the issue here, i don't know,
but it's likely an issue.

On Wed, Sep 19, 2012 at 8:41 AM, Sami Siren ssi...@gmail.com wrote:
 also, did you re create the cluster after upgrading to a newer
 version? I believe there were some changes made to the
 clusterstate.json recently that are not backwards compatible.

 --
  Sami Siren



 On Wed, Sep 19, 2012 at 6:21 PM, Sami Siren ssi...@gmail.com wrote:
 Hi,

 I am having troubles understanding the reason for that NPE.

 First you could try removing the line #102 in HttpClientUtility so
 that logging does not prevent creation of the http client in
 SyncStrategy.

 --
  Sami Siren

 On Wed, Sep 19, 2012 at 5:29 PM, Markus Jelsma
 markus.jel...@openindex.io wrote:
 Hi,

 Since the 2012-09-17 11:10:41 build shards start to have trouble coming 
 back online. When i restart one node the slices on the other nodes are 
 throwing exceptions and cannot be queried. I'm not sure how to remedy the 
 problem but stopping a node or restarting it a few times seems to help it. 
 The problem is when i restart a node, and it happens, i must not restart 
 another node because that may trigger other slices becoming unavailable.

 Here are some parts of the log:

 2012-09-19 14:13:18,149 ERROR [solr.cloud.RecoveryStrategy] - 
 [RecoveryThread] - : Recovery failed - trying again... core=oi_i
 2012-09-19 14:13:25,818 WARN [solr.cloud.RecoveryStrategy] - 
 [main-EventThread] - : Stopping recovery for 
 zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
 2012-09-19 14:13:44,497 WARN [solr.cloud.RecoveryStrategy] - [Thread-4] - : 
 Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_jcore=oi_j
 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
 [RecoveryThread] - : Error while trying to recover. 
 core=oi_i:org.apache.solr.common.SolrException: We are not the leader
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
 at 
 org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199)
 at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388)
 at 
 org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)

 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
 [RecoveryThread] - : Recovery failed - trying again... core=oi_i
 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
 [RecoveryThread] - : Recovery failed - max retries exceeded. core=oi_i
 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
 [RecoveryThread] - : Recovery failed - I give up. core=oi_i
 2012-09-19 14:14:00,333 WARN [solr.cloud.RecoveryStrategy] - 
 [RecoveryThread] - : Stopping recovery for 
 zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
  ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : Sync request 
 error: java.lang.NullPointerException
  ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : 
 http://nl10.host:8080/solr/oi_i/: Could not tell a replica to 
 recover:java.lang.NullPointerException
 at 
 org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
 at 
 org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:102)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:155)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:128)
 at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:262)
 at 
 org.apache.solr.cloud.SyncStrategy.requestRecovery(SyncStrategy.java:272)
 at 
 org.apache.solr.cloud.SyncStrategy.syncToMe(SyncStrategy.java:203)
 at 
 org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:125)
 at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:87)
 at 
 org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:169)
 at 
 org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
 at 
 org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102)
 at 
 org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:275)
 at 
 org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:326)
 at 
 org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:159)
 at 
 org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)

Re: Nodes cannot recover and become unavailable

2012-09-19 Thread Yonik Seeley
On Wed, Sep 19, 2012 at 4:25 PM, Mark Miller markrmil...@gmail.com wrote:
 bq. I believe there were some changes made to the clusterstate.json
 recently that are not backwards compatible.

 Indeed - I think yonik committed something the other day - we prob
 should send an email out about this.

Yeah, I was just in the process of committing another change, updating
CHANGES and sending a message.

-Yonik
http://lucidworks.com