[jira] [Comment Edited] (CASSANDRA-10371) Decommissioned nodes can remain in gossip

2015-12-22 Thread Didier (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068186#comment-15068186
 ] 

Didier edited comment on CASSANDRA-10371 at 12/22/15 2:43 PM:
--

Hi Stefania,

You are perfectly right ! I just fix my issue when you wrote your answer. My 
problem is that in fact there is a lot of nodes impacted in this mess (not just 
one : Multi DC Europe / US).



I have setup these entries in the log4j-server.properties in one node :

{code}
log4j.logger.org.apache.cassandra.gms.GossipDigestSynVerbHandler=TRACE
log4j.logger.org.apache.cassandra.gms.FailureDetector=TRACE
{code}

With this trick I have found the culpurit nodes with a simple tail in the 
system.log :

I just run a tail -f system.log | grep "TRACE" | grep -A 10 -B 10 
"192.168.136.28"

{code}
TRACE [GossipStage:1] 2015-12-22 14:25:10,262 GossipDigestSynVerbHandler.java 
(line 40) Received a GossipDigestSynMessage from /10.0.2.110
TRACE [GossipStage:1] 2015-12-22 14:25:10,262 GossipDigestSynVerbHandler.java 
(line 71) Gossip syn digests are : /10.10.102.97:1448271725:7650177 
/10.10.2.23:1450793863:1377 /10.0.102.190:1448275278:7636527 
/10.0.2.36:1450792729:4816 /192.168.136.28:1449485228:258388
{code}

Every time I found a match with a phantom node IP in the Gossip syn digests, I 
run this on the affected node (in this example 10.0.2.110) : 

{code}
nodetool drain && /etc/init.d/cassandra restart
{code}

After some nodes (15 nodes), I check if I get some entries in my system.log 
with the phantom nodes ... and voila ! 
No more phantom nodes.

Thanks for your help ;)

Didier


was (Author: didier.seg...@gmail.com):
Hi Stefania,

You are perfectly right ! I just fix my issue when you wrote your answer. My 
problem is that in fact there is a lot of nodes impacted in this mess (not just 
one : Multi DC Europe / US).



I have setup these entries in the log4j-server.properties in one node :

{code}
log4j.logger.org.apache.cassandra.gms.GossipDigestSynVerbHandler=TRACE
log4j.logger.org.apache.cassandra.gms.FailureDetector=TRACE
{/code}

With this trick I have found the culpurit nodes with a simple tail in the 
system.log :

I just run a tail -f system.log | grep "TRACE" | grep -A 10 -B 10 
"192.168.136.28"

{code}
TRACE [GossipStage:1] 2015-12-22 14:25:10,262 GossipDigestSynVerbHandler.java 
(line 40) Received a GossipDigestSynMessage from /10.0.2.110
TRACE [GossipStage:1] 2015-12-22 14:25:10,262 GossipDigestSynVerbHandler.java 
(line 71) Gossip syn digests are : /10.10.102.97:1448271725:7650177 
/10.10.2.23:1450793863:1377 /10.0.102.190:1448275278:7636527 
/10.0.2.36:1450792729:4816 /192.168.136.28:1449485228:258388
{code}

Every time I found a match with a phantom node IP in the Gossip syn digests, I 
run this on the affected node (in this example 10.0.2.110) : 

{code}
nodetool drain && /etc/init.d/cassandra restart
{/code}

After some nodes (15 nodes), I check if I get some entries in my system.log 
with the phantom nodes ... and voila ! 
No more phantom nodes.

Thanks for your help ;)

Didier

> Decommissioned nodes can remain in gossip
> -
>
> Key: CASSANDRA-10371
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10371
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Brandon Williams
>Assignee: Stefania
>Priority: Minor
>
> This may apply to other dead states as well.  Dead states should be expired 
> after 3 days.  In the case of decom we attach a timestamp to let the other 
> nodes know when it should be expired.  It has been observed that sometimes a 
> subset of nodes in the cluster never expire the state, and through heap 
> analysis of these nodes it is revealed that the epstate.isAlive check returns 
> true when it should return false, which would allow the state to be evicted.  
> This may have been affected by CASSANDRA-8336.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10371) Decommissioned nodes can remain in gossip

2015-12-22 Thread Didier (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068186#comment-15068186
 ] 

Didier commented on CASSANDRA-10371:


Hi Stefania,

You are perfectly right ! I just fix my issue when you wrote your answer. My 
problem is that in fact there is a lot of nodes impacted in this mess (not just 
one : Multi DC Europe / US).



I have setup these entries in the log4j-server.properties in one node :

{code}
log4j.logger.org.apache.cassandra.gms.GossipDigestSynVerbHandler=TRACE
log4j.logger.org.apache.cassandra.gms.FailureDetector=TRACE
{/code}

With this trick I have found the culpurit nodes with a simple tail in the 
system.log :

I just run a tail -f system.log | grep "TRACE" | grep -A 10 -B 10 
"192.168.136.28"

{code}
TRACE [GossipStage:1] 2015-12-22 14:25:10,262 GossipDigestSynVerbHandler.java 
(line 40) Received a GossipDigestSynMessage from /10.0.2.110
TRACE [GossipStage:1] 2015-12-22 14:25:10,262 GossipDigestSynVerbHandler.java 
(line 71) Gossip syn digests are : /10.10.102.97:1448271725:7650177 
/10.10.2.23:1450793863:1377 /10.0.102.190:1448275278:7636527 
/10.0.2.36:1450792729:4816 /192.168.136.28:1449485228:258388
{code}

Every time I found a match with a phantom node IP in the Gossip syn digests, I 
run this on the affected node (in this example 10.0.2.110) : 

{code}
nodetool drain && /etc/init.d/cassandra restart
{/code}

After some nodes (15 nodes), I check if I get some entries in my system.log 
with the phantom nodes ... and voila ! 
No more phantom nodes.

Thanks for your help ;)

Didier

> Decommissioned nodes can remain in gossip
> -
>
> Key: CASSANDRA-10371
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10371
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Brandon Williams
>Assignee: Stefania
>Priority: Minor
>
> This may apply to other dead states as well.  Dead states should be expired 
> after 3 days.  In the case of decom we attach a timestamp to let the other 
> nodes know when it should be expired.  It has been observed that sometimes a 
> subset of nodes in the cluster never expire the state, and through heap 
> analysis of these nodes it is revealed that the epstate.isAlive check returns 
> true when it should return false, which would allow the state to be evicted.  
> This may have been affected by CASSANDRA-8336.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10371) Decommissioned nodes can remain in gossip

2015-12-21 Thread Didier (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066681#comment-15066681
 ] 

Didier edited comment on CASSANDRA-10371 at 12/21/15 4:43 PM:
--

Hi Stefania,

Thanks to your quick answer.

I attach TRACE log for phantom node 192.168.128.28 :

{code}
3614313:TRACE [GossipStage:2] 2015-12-21 17:21:19,984 Gossiper.java (line 1155) 
requestAll for /192.168.128.28
3616877:TRACE [GossipStage:2] 2015-12-21 17:21:20,123 FailureDetector.java 
(line 205) reporting /192.168.128.28
3616881:TRACE [GossipStage:2] 2015-12-21 17:21:20,124 Gossiper.java (line 986) 
Adding endpoint state for /192.168.128.28
3616892:DEBUG [GossipStage:2] 2015-12-21 17:21:20,124 Gossiper.java (line 999) 
Not marking /192.168.128.28 alive due to dead state
3616897:TRACE [GossipStage:2] 2015-12-21 17:21:20,125 Gossiper.java (line 958) 
marking as down /192.168.128.28
3616908: INFO [GossipStage:2] 2015-12-21 17:21:20,125 Gossiper.java (line 962) 
InetAddress /192.168.128.28 is now DOWN
3616912:DEBUG [GossipStage:2] 2015-12-21 17:21:20,126 MessagingService.java 
(line 397) Resetting pool for /192.168.128.28
3616937:DEBUG [GossipStage:2] 2015-12-21 17:21:20,128 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3616955:DEBUG [GossipStage:2] 2015-12-21 17:21:20,128 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3616956:DEBUG [GossipStage:2] 2015-12-21 17:21:20,129 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3616958:DEBUG [GossipStage:2] 2015-12-21 17:21:20,129 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3616976:DEBUG [GossipStage:2] 2015-12-21 17:21:20,129 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3616977:DEBUG [GossipStage:2] 2015-12-21 17:21:20,130 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3616979:DEBUG [GossipStage:2] 2015-12-21 17:21:20,130 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3616992:DEBUG [GossipStage:2] 2015-12-21 17:21:20,130 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3616993:DEBUG [GossipStage:2] 2015-12-21 17:21:20,131 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3616995:DEBUG [GossipStage:2] 2015-12-21 17:21:20,131 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3617008:DEBUG [GossipStage:2] 2015-12-21 17:21:20,131 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3617317:DEBUG [GossipStage:2] 2015-12-21 17:21:20,143 StorageService.java (line 
1699) Node /192.168.128.28 state left, tokens 
[100310405581336885248896672411729131592, ... , 
99937615223192795414082780446763257757, 99975703478103230193804512094895677044]
3617321:DEBUG [GossipStage:2] 2015-12-21 17:21:20,144 Gossiper.java (line 1463) 
adding expire time for endpoint : /192.168.128.28 (1449830784335)
3617337: INFO [GossipStage:2] 2015-12-21 17:21:20,145 StorageService.java (line 
1781) Removing tokens [100310405581336885248896672411729131592, 
100598580285540169800869916837708042668, ., 
99743016911284542884064313061048682083, 99937615223192795414082780446763257757, 
99975703478103230193804512094895677044] for /192.168.128.28
3617362:DEBUG [GossipStage:2] 2015-12-21 17:21:20,146 MessagingService.java 
(line 795) Resetting version for /192.168.128.28
3617367:DEBUG [GossipStage:2] 2015-12-21 17:21:20,147 Gossiper.java (line 410) 
removing endpoint /192.168.128.28
3631829:TRACE [GossipTasks:1] 2015-12-21 17:21:20,964 Gossiper.java (line 492) 
Gossip Digests are : /10.10.102.96:1448271659:7409547 
/10.0.102.190:1448275278:7395730 /10.10.102.94:1448271818:7409091 
/192.168.128.23:1450707984:20939 /10.10.102.8:1448271443:7409972 
/10.0.2.97:1448276012:7395072 /10.0.102.93:1448274183:7401036 
/192.168.136.26:1450708061:20700 /192.168.136.23:1450708062:20695 
/10.10.2.239:1448533274:6614346 /10.0.102.206:1448273613:7402527 
/10.0.102.92:1448274024:7401356 /10.0.2.143:1448275597:7396779 
/10.10.2.11:1448270678:7412474 /10.10.2.145:1448271264:7410576 
/192.168.128.32:1449151772:4740947 /10.0.2.5:1449149504:4746745 
/192.168.128.26:1450707983:20947 /192.168.136.22:1450708061:20700 
/10.0.102.94:1448274372:7400487 /10.0.2.109:1448276688:7393112 
/10.10.2.18:1448271203:7410982 /10.10.102.49:1448271974:7408616 
/10.10.102.192:1448271561:7409839 /192.168.128.31:1449151700:4741174 
/10.0.102.90:1448273911:7401771 /192.168.128.21:1450714541:1013 
/10.0.102.138:1448273504:7402737 /10.0.2.107:1448276554:7393892 
/10.0.2.105:

[jira] [Commented] (CASSANDRA-10371) Decommissioned nodes can remain in gossip

2015-12-21 Thread Didier (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066681#comment-15066681
 ] 

Didier commented on CASSANDRA-10371:


Hi Stefania,

Thanks to your quick answer.

I attach TRACE log for phantom node 192.168.128.28 :

3614313:TRACE [GossipStage:2] 2015-12-21 17:21:19,984 Gossiper.java (line 1155) 
requestAll for /192.168.128.28
3616877:TRACE [GossipStage:2] 2015-12-21 17:21:20,123 FailureDetector.java 
(line 205) reporting /192.168.128.28
3616881:TRACE [GossipStage:2] 2015-12-21 17:21:20,124 Gossiper.java (line 986) 
Adding endpoint state for /192.168.128.28
3616892:DEBUG [GossipStage:2] 2015-12-21 17:21:20,124 Gossiper.java (line 999) 
Not marking /192.168.128.28 alive due to dead state
3616897:TRACE [GossipStage:2] 2015-12-21 17:21:20,125 Gossiper.java (line 958) 
marking as down /192.168.128.28
3616908: INFO [GossipStage:2] 2015-12-21 17:21:20,125 Gossiper.java (line 962) 
InetAddress /192.168.128.28 is now DOWN
3616912:DEBUG [GossipStage:2] 2015-12-21 17:21:20,126 MessagingService.java 
(line 397) Resetting pool for /192.168.128.28
3616937:DEBUG [GossipStage:2] 2015-12-21 17:21:20,128 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3616955:DEBUG [GossipStage:2] 2015-12-21 17:21:20,128 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3616956:DEBUG [GossipStage:2] 2015-12-21 17:21:20,129 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3616958:DEBUG [GossipStage:2] 2015-12-21 17:21:20,129 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3616976:DEBUG [GossipStage:2] 2015-12-21 17:21:20,129 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3616977:DEBUG [GossipStage:2] 2015-12-21 17:21:20,130 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3616979:DEBUG [GossipStage:2] 2015-12-21 17:21:20,130 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3616992:DEBUG [GossipStage:2] 2015-12-21 17:21:20,130 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3616993:DEBUG [GossipStage:2] 2015-12-21 17:21:20,131 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3616995:DEBUG [GossipStage:2] 2015-12-21 17:21:20,131 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3617008:DEBUG [GossipStage:2] 2015-12-21 17:21:20,131 StorageService.java (line 
1370) Ignoring state change for dead or unknown endpoint: /192.168.128.28
3617317:DEBUG [GossipStage:2] 2015-12-21 17:21:20,143 StorageService.java (line 
1699) Node /192.168.128.28 state left, tokens 
[100310405581336885248896672411729131592, ... , 
99937615223192795414082780446763257757, 99975703478103230193804512094895677044]
3617321:DEBUG [GossipStage:2] 2015-12-21 17:21:20,144 Gossiper.java (line 1463) 
adding expire time for endpoint : /192.168.128.28 (1449830784335)
3617337: INFO [GossipStage:2] 2015-12-21 17:21:20,145 StorageService.java (line 
1781) Removing tokens [100310405581336885248896672411729131592, 
100598580285540169800869916837708042668, ., 
99743016911284542884064313061048682083, 99937615223192795414082780446763257757, 
99975703478103230193804512094895677044] for /192.168.128.28
3617362:DEBUG [GossipStage:2] 2015-12-21 17:21:20,146 MessagingService.java 
(line 795) Resetting version for /192.168.128.28
3617367:DEBUG [GossipStage:2] 2015-12-21 17:21:20,147 Gossiper.java (line 410) 
removing endpoint /192.168.128.28
3631829:TRACE [GossipTasks:1] 2015-12-21 17:21:20,964 Gossiper.java (line 492) 
Gossip Digests are : /10.10.102.96:1448271659:7409547 
/10.0.102.190:1448275278:7395730 /10.10.102.94:1448271818:7409091 
/192.168.128.23:1450707984:20939 /10.10.102.8:1448271443:7409972 
/10.0.2.97:1448276012:7395072 /10.0.102.93:1448274183:7401036 
/192.168.136.26:1450708061:20700 /192.168.136.23:1450708062:20695 
/10.10.2.239:1448533274:6614346 /10.0.102.206:1448273613:7402527 
/10.0.102.92:1448274024:7401356 /10.0.2.143:1448275597:7396779 
/10.10.2.11:1448270678:7412474 /10.10.2.145:1448271264:7410576 
/192.168.128.32:1449151772:4740947 /10.0.2.5:1449149504:4746745 
/192.168.128.26:1450707983:20947 /192.168.136.22:1450708061:20700 
/10.0.102.94:1448274372:7400487 /10.0.2.109:1448276688:7393112 
/10.10.2.18:1448271203:7410982 /10.10.102.49:1448271974:7408616 
/10.10.102.192:1448271561:7409839 /192.168.128.31:1449151700:4741174 
/10.0.102.90:1448273911:7401771 /192.168.128.21:1450714541:1013 
/10.0.102.138:1448273504:7402737 /10.0.2.107:1448276554:7393892 
/10.0.2.105:1448276464:7393834 /10.10.2.10:1448270541:7412796 
/10.10.

[jira] [Comment Edited] (CASSANDRA-10371) Decommissioned nodes can remain in gossip

2015-12-18 Thread Didier (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063764#comment-15063764
 ] 

Didier edited comment on CASSANDRA-10371 at 12/18/15 9:46 AM:
--

Is it planned to release a fix for the 2.0.x branch for this issue ?

I have this problem in production with C* 2.0.16, is it fixed in C* 2.0.17 ?

Every n minutes we have a gossiping flood like that : 

 INFO [GossipStage:2] 2015-12-18 10:29:05,082 Gossiper.java (line 962) 
InetAddress /192.168.128.27 is now DOWN
 INFO [GossipStage:2] 2015-12-18 10:29:05,083 StorageService.java (line 1781) 
Removing tokens [100029758220565479311893935069170672938, , 
99324782484008101117663863086419168046] for /192.168.128.27
 INFO [GossipStage:2] 2015-12-18 10:40:44,253 Gossiper.java (line 962) 
InetAddress /192.168.128.27 is now DOWN
 INFO [GossipStage:2] 2015-12-18 10:40:44,254 StorageService.java (line 1781) 
Removing tokens [100029758220565479311893935069170672938, ..., 
99324782484008101117663863086419168046] for /192.168.128.27

The impacted nodes aren't in system.peers and nodetool ring/status, and they 
have been decommissioned properly from the DC.

Do you plan to release a new release 2.0.18 with a fix or do you recommand to 
upgrade to C* 2.1 or later ?

We also tried to assassinate the impacted nodes via JMX but without any success.

Best regards,

Didier


was (Author: didier.seg...@gmail.com):
Is it planned to release a fix for the 2.0.x branch for this issue ?

I have this problem in production with C* 2.0.16, is it fixed in C* 2.0.17 ?

Every n minutes we have a gossiping flood like that : 

 INFO [GossipStage:2] 2015-12-18 10:29:05,082 Gossiper.java (line 962) 
InetAddress /192.168.128.27 is now DOWN
 INFO [GossipStage:2] 2015-12-18 10:29:05,083 StorageService.java (line 1781) 
Removing tokens [100029758220565479311893935069170672938, , 
99324782484008101117663863086419168046] for /192.168.128.27
 INFO [GossipStage:2] 2015-12-18 10:40:44,253 Gossiper.java (line 962) 
InetAddress /192.168.128.27 is now DOWN
 INFO [GossipStage:2] 2015-12-18 10:40:44,254 StorageService.java (line 1781) 
Removing tokens [100029758220565479311893935069170672938, ..., 
99324782484008101117663863086419168046] for /192.168.128.27

The impacted nodes aren't in system.peers and nodetool ring/status, and they 
have been decommissioned properly from the DC.

Do you plan to release a new release 2.0.18 with a fix or do you recommand to 
upgrade to C* 2.1 or later ?

Best regards,

Didier

> Decommissioned nodes can remain in gossip
> -
>
> Key: CASSANDRA-10371
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10371
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Brandon Williams
>Assignee: Stefania
>Priority: Minor
>
> This may apply to other dead states as well.  Dead states should be expired 
> after 3 days.  In the case of decom we attach a timestamp to let the other 
> nodes know when it should be expired.  It has been observed that sometimes a 
> subset of nodes in the cluster never expire the state, and through heap 
> analysis of these nodes it is revealed that the epstate.isAlive check returns 
> true when it should return false, which would allow the state to be evicted.  
> This may have been affected by CASSANDRA-8336.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10371) Decommissioned nodes can remain in gossip

2015-12-18 Thread Didier (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063764#comment-15063764
 ] 

Didier commented on CASSANDRA-10371:


Is it planned to release a fix for the 2.0.x branch for this issue ?

I have this problem in production with C* 2.0.16, is it fixed in C* 2.0.17 ?

Every n minutes we have a gossiping flood like that : 

 INFO [GossipStage:2] 2015-12-18 10:29:05,082 Gossiper.java (line 962) 
InetAddress /192.168.128.27 is now DOWN
 INFO [GossipStage:2] 2015-12-18 10:29:05,083 StorageService.java (line 1781) 
Removing tokens [100029758220565479311893935069170672938, , 
99324782484008101117663863086419168046] for /192.168.128.27
 INFO [GossipStage:2] 2015-12-18 10:40:44,253 Gossiper.java (line 962) 
InetAddress /192.168.128.27 is now DOWN
 INFO [GossipStage:2] 2015-12-18 10:40:44,254 StorageService.java (line 1781) 
Removing tokens [100029758220565479311893935069170672938, ..., 
99324782484008101117663863086419168046] for /192.168.128.27

The impacted nodes aren't in system.peers and nodetool ring/status, and they 
have been decommissioned properly from the DC.

Do you plan to release a new release 2.0.18 with a fix or do you recommand to 
upgrade to C* 2.1 or later ?

Best regards,

Didier

> Decommissioned nodes can remain in gossip
> -
>
> Key: CASSANDRA-10371
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10371
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Brandon Williams
>Assignee: Stefania
>Priority: Minor
>
> This may apply to other dead states as well.  Dead states should be expired 
> after 3 days.  In the case of decom we attach a timestamp to let the other 
> nodes know when it should be expired.  It has been observed that sometimes a 
> subset of nodes in the cluster never expire the state, and through heap 
> analysis of these nodes it is revealed that the epstate.isAlive check returns 
> true when it should return false, which would allow the state to be evicted.  
> This may have been affected by CASSANDRA-8336.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)