[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-11-06 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227544#comment-17227544
 ] 

Brandon Williams commented on CASSANDRA-16146:
--

Committed, thanks!

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-11-06 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227536#comment-17227536
 ] 

Yifan Cai commented on CASSANDRA-16146:
---

Sure thing. Comment was just added for the unsafe method in each branch. 

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-11-06 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227448#comment-17227448
 ] 

Brandon Williams commented on CASSANDRA-16146:
--

LGTM, supernit: I think it's a good idea to have some comment around unsafe 
methods, but that's easy enough to add on commit.

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-11-05 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227175#comment-17227175
 ] 

Yifan Cai commented on CASSANDRA-16146:
---

Hi [~brandon.williams] and [~bdeggleston],

Would you please take another look at the fixup to each branch? Links are 
posted above. Although the CI result looks good when committing, several newly 
added tests were actually broken since we did not rebase and rerun the CI. 

In the 3.x branches, the fixup sets the operationMode to normal directly in 
StorageService when using mock gossip.

In the trunk, the fixup corrects the bytebuddy in the failed test class in 
addition to the operationMode fix in test. 

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-11-04 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226364#comment-17226364
 ] 

David Capwell commented on CASSANDRA-16146:
---

3.x line LGTM

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-11-04 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226352#comment-17226352
 ] 

Yifan Cai commented on CASSANDRA-16146:
---

Updated the 3.0 and 3.11 fixup branchs to correctly finish ring initialization 
in the mock gossip mode, instead of enabling GOSSOP in the test. 

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-11-04 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226338#comment-17226338
 ] 

David Capwell commented on CASSANDRA-16146:
---

for the 3.x you added Feature.GOSSIP which wasn't needed in the test, this 
tells me the non-gossip state is not fixed; can we fix jvm dtest?

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-11-03 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225750#comment-17225750
 ] 

Yifan Cai commented on CASSANDRA-16146:
---

Fixups and CI
3.0: https://github.com/yifan-c/cassandra/tree/fixup/CASSANDRA-16146-3.0
https://app.circleci.com/pipelines/github/yifan-c/cassandra/143/workflows/1a55cee6-36ef-4e8a-a8d4-6798549ca202

3.11: https://github.com/yifan-c/cassandra/tree/fixup/CASSANDRA-16146-3.11
https://app.circleci.com/pipelines/github/yifan-c/cassandra/144/workflows/52cfbd53-92cb-4644-ae0d-694a9bef398c

trunk: https://github.com/yifan-c/cassandra/tree/fixup/CASSANDRA-16146-trunk
https://app.circleci.com/pipelines/github/yifan-c/cassandra/142/workflows/480e513f-2cf5-47b7-a608-d37cb91e1393

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-11-02 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225128#comment-17225128
 ] 

Yifan Cai commented on CASSANDRA-16146:
---

Thanks David!

It looks like the backports in CASSANDRA-16127 is merged after I posted the [CI 
link|https://issues.apache.org/jira/browse/CASSANDRA-16146?focusedCommentId=17203554=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17203554].
 Therefore, CI did not show any jvm dtest failures. 

For the failures in ClientNetworkStopStartTest, the cause is that the cluster 
without GOSSIP enabled does not set the instance's operation mode correctly. 
(So it stuck at Starting and fail).


> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-11-02 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225109#comment-17225109
 ] 

David Capwell commented on CASSANDRA-16146:
---

Also org.apache.cassandra.distributed.test.GossipTest on trunk

{code}
[junit-timeout] Testcase: 
org.apache.cassandra.distributed.test.GossipTest:testPreventStoppingGossipDuringBootstrap:
Caused an ERROR
[junit-timeout] Timeout occurred. Please note the time in the report does not 
reflect the time until the timeout.
[junit-timeout] junit.framework.AssertionFailedError: Timeout occurred. Please 
note the time in the report does not reflect the time until the timeout.
[junit-timeout] at java.util.Vector.forEach(Vector.java:1275)
[junit-timeout] at java.util.Vector.forEach(Vector.java:1275)
[junit-timeout] at java.lang.Thread.run(Thread.java:748)
[junit-timeout]
[junit-timeout]
[junit-timeout] Test org.apache.cassandra.distributed.test.GossipTest FAILED 
(timeout)
{code}

https://app.circleci.com/pipelines/github/dcapwell/cassandra/754/workflows/d47aee80-0f62-4010-aca6-39adad2d986d/jobs/4285

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-11-02 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225093#comment-17225093
 ] 

David Capwell commented on CASSANDRA-16146:
---

Turns out this broke jvm-dtest 
org.apache.cassandra.distributed.test.ClientNetworkStopStartTest

{code}
[junit-timeout] Testcase: 
stopStartNative(org.apache.cassandra.distributed.test.ClientNetworkStopStartTest):
FAILED
[junit-timeout] nodetool command [enablebinary] was not successful
[junit-timeout] Notifications:
[junit-timeout] Error:
[junit-timeout] java.lang.IllegalStateException: Unable to start native 
transport because the node is not in the normal state.
[junit-timeout] at 
org.apache.cassandra.service.StorageService.checkServiceAllowedToStart(StorageService.java:4389)
[junit-timeout] at 
org.apache.cassandra.service.StorageService.startNativeTransport(StorageService.java:427)
[junit-timeout] at 
org.apache.cassandra.tools.NodeProbe.startNativeTransport(NodeProbe.java:963)
[junit-timeout] at 
org.apache.cassandra.tools.nodetool.EnableBinary.execute(EnableBinary.java:31)
[junit-timeout] at 
org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:287)
[junit-timeout] at 
org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:272)
[junit-timeout] at 
org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:178)
[junit-timeout] at 
org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:843)
[junit-timeout] at 
org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$30(Instance.java:778)
[junit-timeout] at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
[junit-timeout] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[junit-timeout] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[junit-timeout] at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:83)
[junit-timeout] at java.lang.Thread.run(Thread.java:748)
{code}

found in 
https://app.circleci.com/pipelines/github/dcapwell/cassandra/752/workflows/cf4c3766-de15-4903-88f9-20a3dbcfd331/jobs/4268

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-10-09 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211232#comment-17211232
 ] 

Brandon Williams commented on CASSANDRA-16146:
--

+1

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-10-08 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210504#comment-17210504
 ] 

Yifan Cai commented on CASSANDRA-16146:
---

3.11: [https://github.com/yifan-c/cassandra/tree/f/CASSANDRA-16146-3.11]

trunk: [https://github.com/yifan-c/cassandra/tree/f/CASSANDRA-16146-trunk]

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-10-02 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17206535#comment-17206535
 ] 

Yifan Cai commented on CASSANDRA-16146:
---

yep. 3.11 and trunk.

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-10-02 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17206534#comment-17206534
 ] 

Brandon Williams commented on CASSANDRA-16146:
--

I am +1 on 3.0, are you still planning to do patches for other versions?

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-09-29 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204050#comment-17204050
 ] 

Brandon Williams commented on CASSANDRA-16146:
--

Indeed, that looks related to CASSANDRA-16127.

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-09-28 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17203554#comment-17203554
 ] 

Yifan Cai commented on CASSANDRA-16146:
---

Updated the error message. 

Unit and jvm dtest passed. One test failure ({{test_closing_connections - 
thrift_hsha_test.TestThriftHSHA}}) from dtest. It does not look related and 
exists before this patch. 

 
[https://app.circleci.com/pipelines/github/yifan-c/cassandra/111/workflows/ae86acf0-b416-4a42-92e8-cb845d5393a7]

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-09-28 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17203523#comment-17203523
 ] 

Brandon Williams commented on CASSANDRA-16146:
--

+1, with minor bikeshed: it would be nice if the error mentioned shutting the 
node down instead if they really want to do that.

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-09-28 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17203515#comment-17203515
 ] 

Yifan Cai commented on CASSANDRA-16146:
---

PR (to 3.0): [https://github.com/apache/cassandra/pull/760]

CI: 
[https://app.circleci.com/pipelines/github/yifan-c/cassandra/108/workflows/d4fc0b93-111e-4cbc-bd2c-c68e1a72fe09]

The patch simply prevents calling stopGossip and startGossip when not in the 
normal mode. 

I will prepare the patch to the other branches (3.11 and trunk) once this one 
looks good. 

cc: [~bdeggleston]

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-09-28 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17203429#comment-17203429
 ] 

Yifan Cai commented on CASSANDRA-16146:
---

Thanks [~brandon.williams] for commenting so quickly. 

bq. perhaps just not allow this outside of NORMAL

Adding a pre-check before both starting and stopping gossip to make sure the 
current mode is NORMAL sounds good to me.

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-09-28 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17203409#comment-17203409
 ] 

Brandon Williams commented on CASSANDRA-16146:
--

bq. Operator runs nodetool to stop and re-start gossip. The gossip state gets 
flipped to NORMAL

We should perhaps just not allow this outside of NORMAL, since in that case you 
probably want to just stop the the node instead.

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org