[jira] [Commented] (CASSANDRA-15335) Node can corrupt gossip state and become unreplaceable
[ https://issues.apache.org/jira/browse/CASSANDRA-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944263#comment-16944263 ] Marcus Eriksson commented on CASSANDRA-15335: - +1 > Node can corrupt gossip state and become unreplaceable > -- > > Key: CASSANDRA-15335 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15335 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Normal > Fix For: 3.0.19, 3.11.5, 4.0 > > > In {{StorageService#prepareToJoin}}, a starting node first sends out an > endpoint state without any tokens. Later, in > {{StorageService#finishJoiningRing}} it sends out an endpoint state _with_ > it’s tokens. If that node dies between these 2 events and cannot be restarted > due to some unrecoverable error, the ring’s gossip state will be missing > tokens for that node. This won’t cause any immediate data loss since TMD is > populated from system.peers, but it will prevent a replacement node from > associating that address with it’s tokens and replacing it. It could also > cause data loss if other nodes are added to the ring and don’t see an owned > token where there should be one. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15335) Node can corrupt gossip state and become unreplaceable
[ https://issues.apache.org/jira/browse/CASSANDRA-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937077#comment-16937077 ] Brandon Williams commented on CASSANDRA-15335: -- I _think_ that's just an artifact of how the code is organized. I can't think of a reason not to advertise them if we know they are set. > Node can corrupt gossip state and become unreplaceable > -- > > Key: CASSANDRA-15335 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15335 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Priority: Normal > > In {{StorageService#prepareToJoin}}, a starting node first sends out an > endpoint state without any tokens. Later, in > {{StorageService#finishJoiningRing}} it sends out an endpoint state _with_ > it’s tokens. If that node dies between these 2 events and cannot be restarted > due to some unrecoverable error, the ring’s gossip state will be missing > tokens for that node. This won’t cause any immediate data loss since TMD is > populated from system.peers, but it will prevent a replacement node from > associating that address with it’s tokens and replacing it. It could also > cause data loss if other nodes are added to the ring and don’t see an owned > token where there should be one. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15335) Node can corrupt gossip state and become unreplaceable
[ https://issues.apache.org/jira/browse/CASSANDRA-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937067#comment-16937067 ] Blake Eggleston commented on CASSANDRA-15335: - [~brandon.williams] do you know if there's a problem we're trying to avoid by omitting tokens in the first endpoint state? > Node can corrupt gossip state and become unreplaceable > -- > > Key: CASSANDRA-15335 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15335 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Priority: Normal > > In {{StorageService#prepareToJoin}}, a starting node first sends out an > endpoint state without any tokens. Later, in > {{StorageService#finishJoiningRing}} it sends out an endpoint state _with_ > it’s tokens. If that node dies between these 2 events and cannot be restarted > due to some unrecoverable error, the ring’s gossip state will be missing > tokens for that node. This won’t cause any immediate data loss since TMD is > populated from system.peers, but it will prevent a replacement node from > associating that address with it’s tokens and replacing it. It could also > cause data loss if other nodes are added to the ring and don’t see an owned > token where there should be one. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org