[jira] [Updated] (SOLR-6923) AutoAddReplicas should consult live nodes also to see if a state has changed
[ https://issues.apache.org/jira/browse/SOLR-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-6923: Component/s: SolrCloud Fix Version/s: 4.10.5 I am marking this for 4.10.5 whenever that happens. I fixed the bug I reported in my last comment with SOLR-7178. AutoAddReplicas should consult live nodes also to see if a state has changed Key: SOLR-6923 URL: https://issues.apache.org/jira/browse/SOLR-6923 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Varun Thacker Assignee: Mark Miller Fix For: 4.10.5, 5.0, Trunk Attachments: SOLR-6923.patch - I did the following {code} ./solr start -e cloud -noprompt kill -9 pid-of-node2 //Not the node which is running ZK {code} - /live_nodes reflects that the node is gone. - This is the only message which gets logged on the node1 server after killing node2 {code} 45812 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] WARN org.apache.zookeeper.server.NIOServerCnxn – caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x14ac40f26660001, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) {code} - The graph shows the node2 as 'Gone' state - clusterstate.json keeps showing the replica as 'active' {code} {collection1:{ shards:{shard1:{ range:8000-7fff, state:active, replicas:{ core_node1:{ state:active, core:collection1, node_name:169.254.113.194:8983_solr, base_url:http://169.254.113.194:8983/solr;, leader:true}, core_node2:{ state:active, core:collection1, node_name:169.254.113.194:8984_solr, base_url:http://169.254.113.194:8984/solr, maxShardsPerNode:1, router:{name:compositeId}, replicationFactor:1, autoAddReplicas:false, autoCreated:true}} {code} One immediate problem I can see is that AutoAddReplicas doesn't work since the clusterstate.json never changes. There might be more features which are affected by this. On first thought I think we can handle this - The shard leader could listen to changes on /live_nodes and if it has replicas that were on that node, mark it as 'down' in the clusterstate.json? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6923) AutoAddReplicas should consult live nodes also to see if a state has changed
[ https://issues.apache.org/jira/browse/SOLR-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-6923: -- Assignee: Anshum Gupta AutoAddReplicas should consult live nodes also to see if a state has changed Key: SOLR-6923 URL: https://issues.apache.org/jira/browse/SOLR-6923 Project: Solr Issue Type: Bug Reporter: Varun Thacker Assignee: Anshum Gupta Fix For: 5.0, Trunk Attachments: SOLR-6923.patch - I did the following {code} ./solr start -e cloud -noprompt kill -9 pid-of-node2 //Not the node which is running ZK {code} - /live_nodes reflects that the node is gone. - This is the only message which gets logged on the node1 server after killing node2 {code} 45812 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] WARN org.apache.zookeeper.server.NIOServerCnxn – caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x14ac40f26660001, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) {code} - The graph shows the node2 as 'Gone' state - clusterstate.json keeps showing the replica as 'active' {code} {collection1:{ shards:{shard1:{ range:8000-7fff, state:active, replicas:{ core_node1:{ state:active, core:collection1, node_name:169.254.113.194:8983_solr, base_url:http://169.254.113.194:8983/solr;, leader:true}, core_node2:{ state:active, core:collection1, node_name:169.254.113.194:8984_solr, base_url:http://169.254.113.194:8984/solr, maxShardsPerNode:1, router:{name:compositeId}, replicationFactor:1, autoAddReplicas:false, autoCreated:true}} {code} One immediate problem I can see is that AutoAddReplicas doesn't work since the clusterstate.json never changes. There might be more features which are affected by this. On first thought I think we can handle this - The shard leader could listen to changes on /live_nodes and if it has replicas that were on that node, mark it as 'down' in the clusterstate.json? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6923) AutoAddReplicas should consult live nodes also to see if a state has changed
[ https://issues.apache.org/jira/browse/SOLR-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated SOLR-6923: Attachment: SOLR-6923.patch Simple patch which checks against live nodes before short circuiting. SharedFSAutoReplicaFailoverTest passes. AutoAddReplicas should consult live nodes also to see if a state has changed Key: SOLR-6923 URL: https://issues.apache.org/jira/browse/SOLR-6923 Project: Solr Issue Type: Bug Reporter: Varun Thacker Attachments: SOLR-6923.patch - I did the following {code} ./solr start -e cloud -noprompt kill -9 pid-of-node2 //Not the node which is running ZK {code} - /live_nodes reflects that the node is gone. - This is the only message which gets logged on the node1 server after killing node2 {code} 45812 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] WARN org.apache.zookeeper.server.NIOServerCnxn – caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x14ac40f26660001, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) {code} - The graph shows the node2 as 'Gone' state - clusterstate.json keeps showing the replica as 'active' {code} {collection1:{ shards:{shard1:{ range:8000-7fff, state:active, replicas:{ core_node1:{ state:active, core:collection1, node_name:169.254.113.194:8983_solr, base_url:http://169.254.113.194:8983/solr;, leader:true}, core_node2:{ state:active, core:collection1, node_name:169.254.113.194:8984_solr, base_url:http://169.254.113.194:8984/solr, maxShardsPerNode:1, router:{name:compositeId}, replicationFactor:1, autoAddReplicas:false, autoCreated:true}} {code} One immediate problem I can see is that AutoAddReplicas doesn't work since the clusterstate.json never changes. There might be more features which are affected by this. On first thought I think we can handle this - The shard leader could listen to changes on /live_nodes and if it has replicas that were on that node, mark it as 'down' in the clusterstate.json? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6923) AutoAddReplicas should consult live nodes also to see if a state has changed
[ https://issues.apache.org/jira/browse/SOLR-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-6923: --- Fix Version/s: Trunk 5.0 AutoAddReplicas should consult live nodes also to see if a state has changed Key: SOLR-6923 URL: https://issues.apache.org/jira/browse/SOLR-6923 Project: Solr Issue Type: Bug Reporter: Varun Thacker Fix For: 5.0, Trunk Attachments: SOLR-6923.patch - I did the following {code} ./solr start -e cloud -noprompt kill -9 pid-of-node2 //Not the node which is running ZK {code} - /live_nodes reflects that the node is gone. - This is the only message which gets logged on the node1 server after killing node2 {code} 45812 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] WARN org.apache.zookeeper.server.NIOServerCnxn – caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x14ac40f26660001, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) {code} - The graph shows the node2 as 'Gone' state - clusterstate.json keeps showing the replica as 'active' {code} {collection1:{ shards:{shard1:{ range:8000-7fff, state:active, replicas:{ core_node1:{ state:active, core:collection1, node_name:169.254.113.194:8983_solr, base_url:http://169.254.113.194:8983/solr;, leader:true}, core_node2:{ state:active, core:collection1, node_name:169.254.113.194:8984_solr, base_url:http://169.254.113.194:8984/solr, maxShardsPerNode:1, router:{name:compositeId}, replicationFactor:1, autoAddReplicas:false, autoCreated:true}} {code} One immediate problem I can see is that AutoAddReplicas doesn't work since the clusterstate.json never changes. There might be more features which are affected by this. On first thought I think we can handle this - The shard leader could listen to changes on /live_nodes and if it has replicas that were on that node, mark it as 'down' in the clusterstate.json? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6923) AutoAddReplicas should consult live nodes also to see if a state has changed
[ https://issues.apache.org/jira/browse/SOLR-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated SOLR-6923: Summary: AutoAddReplicas should consult live nodes also to see if a state has changed (was: kill -9 doesn't change the replica state in clusterstate.json) AutoAddReplicas should consult live nodes also to see if a state has changed Key: SOLR-6923 URL: https://issues.apache.org/jira/browse/SOLR-6923 Project: Solr Issue Type: Bug Reporter: Varun Thacker - I did the following {code} ./solr start -e cloud -noprompt kill -9 pid-of-node2 //Not the node which is running ZK {code} - /live_nodes reflects that the node is gone. - This is the only message which gets logged on the node1 server after killing node2 {code} 45812 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] WARN org.apache.zookeeper.server.NIOServerCnxn – caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x14ac40f26660001, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) {code} - The graph shows the node2 as 'Gone' state - clusterstate.json keeps showing the replica as 'active' {code} {collection1:{ shards:{shard1:{ range:8000-7fff, state:active, replicas:{ core_node1:{ state:active, core:collection1, node_name:169.254.113.194:8983_solr, base_url:http://169.254.113.194:8983/solr;, leader:true}, core_node2:{ state:active, core:collection1, node_name:169.254.113.194:8984_solr, base_url:http://169.254.113.194:8984/solr, maxShardsPerNode:1, router:{name:compositeId}, replicationFactor:1, autoAddReplicas:false, autoCreated:true}} {code} One immediate problem I can see is that AutoAddReplicas doesn't work since the clusterstate.json never changes. There might be more features which are affected by this. On first thought I think we can handle this - The shard leader could listen to changes on /live_nodes and if it has replicas that were on that node, mark it as 'down' in the clusterstate.json? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org