We are seeing the same issue with Cassandra 2.0.8. The nodetool gossipinfo 
reports a node being down even after we decommission the node from the cluster.

Thanks,
Pratik

From: kurt greaves <k...@instaclustr.com<mailto:k...@instaclustr.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Friday, January 27, 2017 at 5:54 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Re : Decommissioned nodes show as DOWN in Cassandra versions 
2.1.12 - 2.1.16

we've seen this issue on a few clusters, including on 2.1.7 and 2.1.8. pretty 
sure it is an issue in gossip that's known about. in later versions it seems to 
be fixed.

On 24 Jan 2017 06:09, "sai krishnam raju potturi" 
<pskraj...@gmail.com<mailto:pskraj...@gmail.com>> wrote:

In the Cassandra versions 2.1.11 - 2.1.16, after we decommission a node or 
datacenter, we observe the decommissioned nodes marked as DOWN in the cluster 
when you do a "nodetool describecluster". The nodes however do not show up in 
the "nodetool status" command.
The decommissioned node also does not show up in the "system_peers" table on 
the nodes.

The workaround we follow is rolling restart of the cluster, which removes the 
decommissioned nodes from the "UNREACHABLE STATE", and shows the actual state 
of the cluster. The workaround is tedious for huge clusters.

We also verified the decommission process in CCM tool, and observed the same 
issue for clusters with versions from 2.1.12 to 2.1.16. The issue was not 
observed in versions prior to or later than the ones mentioned above.


Has anybody in the community observed similar issue? We've also raised a JIRA 
issue regarding this.   https://issues.apache.org/jira/browse/CASSANDRA-13144


Below are the observed logs from the versions without the bug, and with the 
bug.  The one's highlighted in yellow show the expected logs. The one's 
highlighted in red are the one's where the node is recognized as down, and 
shows as UNREACHABLE.



Cassandra 2.1.1 Logs showing the decommissioned node :  (Without the bug)

2017-01-19 20:18:56,415 [GossipStage:1] DEBUG ArrivalWindow Ignoring interval 
time of 2049943233<tel:(204)%20994-3233> for /X.X.X.X
2017-01-19 20:18:56,416 [GossipStage:1] DEBUG StorageService Node /X.X.X.X 
state left, tokens [ 59353109817657926242901533144729725259, 
60254520910109313597677907197875221475, 75698727618038614819889933974570742305, 
84508739091270910297310401957975430578]
2017-01-19 20:18:56,416 [GossipStage:1] DEBUG Gossiper adding expire time for 
endpoint : /X.X.X.X (1485116334088)
2017-01-19 20:18:56,417 [GossipStage:1] INFO StorageService Removing tokens 
[100434964734820719895982857900842892337, 
114144647582686041354301802358217767299, 
132090888860517964702932350041942412177, 
138409460913927199437556572481804704749] for /X.X.X.X
2017-01-19 20:18:56,418 [HintedHandoff:3] INFO HintedHandOffManager Deleting 
any stored hints for /X.X.X.X
2017-01-19 20:18:56,424 [GossipStage:1] DEBUG MessagingService Resetting 
version for /X.X.X.X
2017-01-19 20:18:56,424 [GossipStage:1] DEBUG Gossiper removing endpoint 
/X.X.X.X
2017-01-19 20:18:56,437 [GossipStage:1] DEBUG StorageService Ignoring state 
change for dead or unknown endpoint: /X.X.X.X
2017-01-19 20:19:02,022 [WRITE-/X.X.X.X] DEBUG OutboundTcpConnection attempting 
to connect to /X.X.X.X
2017-01-19 20:19:02,023 [HANDSHAKE-/X.X.X.X] INFO OutboundTcpConnection 
Handshaking version with /X.X.X.X
2017-01-19 20:19:02,023 [WRITE-/X.X.X.X] DEBUG MessagingService Setting version 
7 for /X.X.X.X
2017-01-19 20:19:08,096 [GossipStage:1] DEBUG ArrivalWindow Ignoring interval 
time of 2074454222<tel:(207)%20445-4222> for /X.X.X.X
2017-01-19 20:19:54,407 [GossipStage:1] DEBUG ArrivalWindow Ignoring interval 
time of 4302985797<tel:(430)%20298-5797> for /X.X.X.X
2017-01-19 20:19:57,405 [GossipTasks:1] DEBUG Gossiper 60000 elapsed, /X.X.X.X 
gossip quarantine over
2017-01-19 20:19:57,455 [GossipStage:1] DEBUG ArrivalWindow Ignoring interval 
time of 3047826501<tel:(304)%20782-6501> for /X.X.X.X
2017-01-19 20:19:57,455 [GossipStage:1] DEBUG StorageService Ignoring state 
change for dead or unknown endpoint: /X.X.X.X


Cassandra 2.1.16 Logs showing the decommissioned node :   (The logs in 2.1.16 
show the same as 2.1.1 upto "DEBUG Gossiper 60000 elapsed, /X.X.X.X gossip 
quarantine over", and then is followed by "NODE is now DOWN"

017-01-19 19:52:23,687 [GossipStage:1] DEBUG StorageService.java:1883 - Node 
/X.X.X.X state left, tokens [-1112888759032625467, -228773855963737699, 
-311455042375
4381391, -4848625944949064281, -6920961603460018610, -8566729719076824066, 
1611098831406674636, 7278843689020594771, 7565410054791352413, 
9166885764<tel:(916)%20688-5764>, 8654747784805453046]
2017-01-19 19:52:23,688 [GossipStage:1] DEBUG Gossiper.java:1520 - adding 
expire time for endpoint : /X.X.X.X (1485114743567)
2017-01-19 19:52:23,688 [GossipStage:1] INFO StorageService.java:1965 - 
Removing tokens [-1112888759032625467, -228773855963737699, 
-3114550423754381391, -48486259449
49064281, -6920961603460018610, 5690722015779071557, 6202373691525063547, 
7191120402564284381, 7278843689020594771, 7565410054791352413, 
8524200089166885764, 865474778
4805453046<tel:(480)%20545-3046>] for /X.X.X.X
2017-01-19 19:52:23,689 [HintedHandoffManager:1] INFO 
HintedHandOffManager.java:230 - Deleting any stored hints for /X.X.X.X
2017-01-19 19:52:23,689 [GossipStage:1] DEBUG MessagingService.java:840 - 
Resetting version for /X.X.X.X
2017-01-19 19:52:23,690 [GossipStage:1] DEBUG Gossiper.java:417 - removing 
endpoint /X.X.X.X
2017-01-19 19:52:23,691 [GossipStage:1] DEBUG StorageService.java:1552 - 
Ignoring state change for dead or unknown endpoint: /X.X.X.X
2017-01-19 19:52:31,617 [MessagingService-Outgoing-/X.X.X.X] DEBUG 
OutboundTcpConnection.java:372 - attempting to connect to /X.X.X.X
2017-01-19 19:52:31,618 [HANDSHAKE-/X.X.X.X] INFO 
OutboundTcpConnection.java:488 - Handshaking version with /X.X.X.X
2017-01-19 19:52:31,619 [MessagingService-Outgoing-/X.X.X.X] DEBUG 
MessagingService.java:826 - Setting version 8 for /X.X.X.X
2017-01-19 19:53:19,914 [GossipStage:1] DEBUG FailureDetector.java:423 - 
Ignoring interval time of 6004119075<tel:(600)%20411-9075> for /X.X.X.X
2017-01-19 19:53:23,702 [GossipTasks:1] DEBUG Gossiper.java:795 - 60000 
elapsed, /X.X.X.X gossip quarantine over
2017-01-19 19:53:23,985 [GossipStage:1] DEBUG StorageService.java:1552 - 
Ignoring state change for dead or unknown endpoint: /X.X.X.X
2017-01-19 19:53:26,223 [GossipStage:1] DEBUG FailureDetector.java:423 - 
Ignoring interval time of 6309159352<tel:(630)%20915-9352> for /X.X.X.X
2017-01-19 19:53:50,709 [GossipTasks:1] DEBUG Gossiper.java:336 - Convicting 
/X.X.X.X with status LEFT - alive true
2017-01-19 19:53:50,709 [GossipTasks:1] INFO Gossiper.java:1008 - InetAddress 
/X.X.X.X is now DOWN
2017-01-19 19:53:50,709 [GossipTasks:1] DEBUG MessagingService.java:429 - 
Resetting pool for /X.X.X.X
2017-01-19 19:53:51,710 [GossipTasks:1] DEBUG Gossiper.java:336 - Convicting 
/X.X.X.X with status LEFT - alive false
2017-01-19 19:53:53,711 [MessagingService-Outgoing-/X.X.X.X] DEBUG 
OutboundTcpConnection.java:372 - attempting to connect to /X.X.X.X
2017-01-19 19:53:53,711 [GossipTasks:1] DEBUG Gossiper.java:336 - Convicting 
/X.X.X.X with status LEFT - alive false
2017-01-19 19:53:54,711 [GossipTasks:1] DEBUG Gossiper.java:336 - Convicting 
/X.X.X.X with status LEFT - alive false



thanks

Sai

Reply via email to