[jira] [Commented] (SOLR-6923) kill -9 doesn't change the replica state in clusterstate.json

2015-01-11 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272953#comment-14272953
 ] 

Varun Thacker commented on SOLR-6923:
-

Thanks Tim for pointing it out. I was not aware of this.

I'll rename the issue appropriately with this information and come up up with a 
patch for AutoAddReplicas to consult live nodes too.

 kill -9 doesn't change the replica state in clusterstate.json
 -

 Key: SOLR-6923
 URL: https://issues.apache.org/jira/browse/SOLR-6923
 Project: Solr
  Issue Type: Bug
Reporter: Varun Thacker

 - I did the following 
 {code}
 ./solr start -e cloud -noprompt
 kill -9 pid-of-node2 //Not the node which is running ZK
 {code}
 - /live_nodes reflects that the node is gone.
 - This is the only message which gets logged on the node1 server after 
 killing node2
 {code}
 45812 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] WARN  
 org.apache.zookeeper.server.NIOServerCnxn  – caught end of stream exception
 EndOfStreamException: Unable to read additional data from client sessionid 
 0x14ac40f26660001, likely client has closed socket
 at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 - The graph shows the node2 as 'Gone' state
 - clusterstate.json keeps showing the replica as 'active'
 {code}
 {collection1:{
 shards:{shard1:{
 range:8000-7fff,
 state:active,
 replicas:{
   core_node1:{
 state:active,
 core:collection1,
 node_name:169.254.113.194:8983_solr,
 base_url:http://169.254.113.194:8983/solr;,
 leader:true},
   core_node2:{
 state:active,
 core:collection1,
 node_name:169.254.113.194:8984_solr,
 base_url:http://169.254.113.194:8984/solr,
 maxShardsPerNode:1,
 router:{name:compositeId},
 replicationFactor:1,
 autoAddReplicas:false,
 autoCreated:true}}
 {code}
 One immediate problem I can see is that AutoAddReplicas doesn't work since 
 the clusterstate.json never changes. There might be more features which are 
 affected by this.
 On first thought I think we can handle this - The shard leader could listen 
 to changes on /live_nodes and if it has replicas that were on that node, mark 
 it as 'down' in the clusterstate.json?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6923) kill -9 doesn't change the replica state in clusterstate.json

2015-01-07 Thread Timothy Potter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268278#comment-14268278
 ] 

Timothy Potter commented on SOLR-6923:
--

The actual runtime state of a replica is determined by 1) what's in 
clusterstate.json and 2) check that the node hosting the replica is live. If 
the node is not live, the state reported in clusterstate.json can be stale 
for some time. It has always worked this way in SolrCloud. Thus, 
AutoAddReplicas needs to consult live nodes prior to thinking a node is live.

 kill -9 doesn't change the replica state in clusterstate.json
 -

 Key: SOLR-6923
 URL: https://issues.apache.org/jira/browse/SOLR-6923
 Project: Solr
  Issue Type: Bug
Reporter: Varun Thacker

 - I did the following 
 {code}
 ./solr start -e cloud -noprompt
 kill -9 pid-of-node2 //Not the node which is running ZK
 {code}
 - /live_nodes reflects that the node is gone.
 - This is the only message which gets logged on the node1 server after 
 killing node2
 {code}
 45812 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] WARN  
 org.apache.zookeeper.server.NIOServerCnxn  – caught end of stream exception
 EndOfStreamException: Unable to read additional data from client sessionid 
 0x14ac40f26660001, likely client has closed socket
 at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 - The graph shows the node2 as 'Gone' state
 - clusterstate.json keeps showing the replica as 'active'
 {code}
 {collection1:{
 shards:{shard1:{
 range:8000-7fff,
 state:active,
 replicas:{
   core_node1:{
 state:active,
 core:collection1,
 node_name:169.254.113.194:8983_solr,
 base_url:http://169.254.113.194:8983/solr;,
 leader:true},
   core_node2:{
 state:active,
 core:collection1,
 node_name:169.254.113.194:8984_solr,
 base_url:http://169.254.113.194:8984/solr,
 maxShardsPerNode:1,
 router:{name:compositeId},
 replicationFactor:1,
 autoAddReplicas:false,
 autoCreated:true}}
 {code}
 One immediate problem I can see is that AutoAddReplicas doesn't work since 
 the clusterstate.json never changes. There might be more features which are 
 affected by this.
 On first thought I think we can handle this - The shard leader could listen 
 to changes on /live_nodes and if it has replicas that were on that node, mark 
 it as 'down' in the clusterstate.json?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org