[ 
https://issues.apache.org/jira/browse/CASSANDRA-7825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162504#comment-14162504
 ] 

Gabriel Rosendorf commented on CASSANDRA-7825:
----------------------------------------------

I'm encountering this issue on DSC 2.0.7. Same thing as described by 
[~naydencho] above; I terminated an EC2 instance to simulate a failure, 
executed a nodetool removenode, and the node is still shown in system.peers and 
jmx. Hosts in other data centers show the correct nodes in the peers table, but 
still show the removed host in JMX. 

Is this specific to DS[C,E]? Looking in jmx at 
org.apache.cassandra.net:type=FailureDetector,AllEndpointStates, I see the host 
with a status of 'removed'. 

> node decommission leaves ghost nodes in system.peers table and JMX
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-7825
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7825
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: OS: Ubuntu 12.04.4 LTS
> Cassandra: ReleaseVersion: 2.0.8.39
> DSE 4.5.1
> OpsCenter: 5.0.0
>            Reporter: nayden kolev
>            Assignee: Brandon Williams
>            Priority: Minor
>
> I have a 4-node cluster (split in 2 DCs) running DSE 4.5.1, C* 2.0.8.39. I 
> needed to cycle a node (add a new node and remove one). I followed this doc 
> (more specifically steps 1 and 2):
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_remove_node_t.html
> After the decom, the decommissioned node logged this:
> INFO [RMI TCP Connection(17)-10.1.129.27] 2014-08-23 09:57:08,243 
> ThriftServer.java (line 141) Stop listening to thrift clients
> INFO [RMI TCP Connection(17)-10.1.129.27] 2014-08-23 09:57:08,269 Server.java 
> (line 182) Stop listening for CQL clients
> INFO [RMI TCP Connection(17)-10.1.129.27] 2014-08-23 09:57:08,270 
> Gossiper.java (line 1279) Announcing shutdown
> INFO [RMI TCP Connection(17)-10.1.129.27] 2014-08-23 09:57:10,271 
> MessagingService.java (line 683) Waiting for messaging service to quiesce
> INFO [ACCEPT-/10.1.129.27] 2014-08-23 09:57:10,272 MessagingService.java 
> (line 923) MessagingService has terminated the accept() thread
> INFO [RMI TCP Connection(17)-10.1.129.27] 2014-08-23 09:57:10,280 
> StorageService.java (line 1007) DECOMMISSIONED
> The decommissioned node no longer appears in OpsCenter, and 'nodetool status' 
> shows it gone from the cluster as well, with the remaining 4 nodes un UN 
> state.
> All is good... Then I noticed that the DownEndpointCount (still) shows as 1 - 
> using a JMX console, and browsing to org.apache.cassandra.net, 
> FailureDetector, Attributes, DownEdpointCount. While there, I also noticed 
> that SimpleStates shows the decommissioned node as down, and the 
> AllEndpointStates shows it as STATUS:LEFT
> I tried running a 'nodetool removenode decom-node's-host-id', but it failed 
> with "Host ID not found", which I expected, given I decommissioned it and it 
> does not show in nodetool status.
> nodetool describecluster lists only the expected 4 nodes (does not show the 
> decommissioned node)
> checking the system.peers table lists the decomm-ed node with a null host_id, 
> rack, release_version, rpc_address, schema_version, etc.
> Adding JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false" to the 
> Cassandra-env.sh as suggested here:
> https://issues.apache.org/jira/browse/CASSANDRA-6053
> does not help. I have actually tried this before, when I was decommissioning 
> a node on an older C* version and it worked, but now it does nothing. If I 
> delete the row mentioning the decommissioned node from the system.peers table 
> it stays out of there until the next dse service restart.
> This is causing apps to timeout, since they get a invalid node's IP... As a 
> workaround I remove the entry from the peers table, but it is not permanent...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to