[jira] [Commented] (CASSANDRA-18968) StartupClusterConnectivityChecker fails on upgrade from 3.X

2023-11-10 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784759#comment-17784759
 ] 

Stefan Miklosovic commented on CASSANDRA-18968:
---

I evaluated the failures as flakiness. These tests are passing locally. We are 
good to merge.

> StartupClusterConnectivityChecker fails on upgrade from 3.X
> ---
>
> Key: CASSANDRA-18968
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18968
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown
>Reporter: Paulo Motta
>Assignee: Isaac Reath
>Priority: Normal
>  Labels: lhf
> Fix For: 4.0.x, 4.1.x
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Starting up a new 4.X node on a 3.x cluster throws the following warning:
> {noformat}
> WARN  [main] 2023-10-27 15:58:22,234 
> StartupClusterConnectivityChecker.java:183 - Timed out after 10002 
> milliseconds, was waiting for remaining peers to connect: {dc1=[X.Y.Z.W, 
> A.B.C.D]}
> {noformat}
> I think this is because the PING messages used by the startup check are not 
> available on 3.X.
> To provide a smoother upgrade experience we should probably disable this 
> check on a mixed version clusters, or skip peers on versions < 4.x when doing 
> the connectivity check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18968) StartupClusterConnectivityChecker fails on upgrade from 3.X

2023-11-09 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784400#comment-17784400
 ] 

Brandon Williams commented on CASSANDRA-18968:
--

Yeah, those failures looked possibly environmental to me, let's doublecheck.

> StartupClusterConnectivityChecker fails on upgrade from 3.X
> ---
>
> Key: CASSANDRA-18968
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18968
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown
>Reporter: Paulo Motta
>Assignee: Isaac Reath
>Priority: Normal
>  Labels: lhf
> Fix For: 4.0.x, 4.1.x
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Starting up a new 4.X node on a 3.x cluster throws the following warning:
> {noformat}
> WARN  [main] 2023-10-27 15:58:22,234 
> StartupClusterConnectivityChecker.java:183 - Timed out after 10002 
> milliseconds, was waiting for remaining peers to connect: {dc1=[X.Y.Z.W, 
> A.B.C.D]}
> {noformat}
> I think this is because the PING messages used by the startup check are not 
> available on 3.X.
> To provide a smoother upgrade experience we should probably disable this 
> check on a mixed version clusters, or skip peers on versions < 4.x when doing 
> the connectivity check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18968) StartupClusterConnectivityChecker fails on upgrade from 3.X

2023-11-08 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784286#comment-17784286
 ] 

Stefan Miklosovic commented on CASSANDRA-18968:
---

Well, I think we have a problem: 
https://app.circleci.com/pipelines/github/driftx/cassandra/1363/workflows/5fafb76b-683f-43a6-9803-5e67161bac4c/jobs/61070

> StartupClusterConnectivityChecker fails on upgrade from 3.X
> ---
>
> Key: CASSANDRA-18968
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18968
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown
>Reporter: Paulo Motta
>Assignee: Isaac Reath
>Priority: Normal
>  Labels: lhf
> Fix For: 4.0.x, 4.1.x
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Starting up a new 4.X node on a 3.x cluster throws the following warning:
> {noformat}
> WARN  [main] 2023-10-27 15:58:22,234 
> StartupClusterConnectivityChecker.java:183 - Timed out after 10002 
> milliseconds, was waiting for remaining peers to connect: {dc1=[X.Y.Z.W, 
> A.B.C.D]}
> {noformat}
> I think this is because the PING messages used by the startup check are not 
> available on 3.X.
> To provide a smoother upgrade experience we should probably disable this 
> check on a mixed version clusters, or skip peers on versions < 4.x when doing 
> the connectivity check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18968) StartupClusterConnectivityChecker fails on upgrade from 3.X

2023-11-08 Thread Paulo Motta (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784225#comment-17784225
 ] 

Paulo Motta commented on CASSANDRA-18968:
-

bq. Whole "waiting for gossip to settle" machinery is ... not ideal. Yes, it 
works in most of the situations but there are edge cases when it does not, e.g. 
when there are large clusters, it may happen that it may evaluate that gossip 
is "settled" falsely because it took so much time to detect any changes that it 
was thinking it is settled.

I'm aware waitToSettle is not reliable. Nevertheless I think having a 
"best-effort" skipping of this check when 3.X nodes are detected in gossip is 
valuable. This will mostly work as long as gossip with a single node was 
successful, since it will get the latest known versions of the other nodes. 

In the case where the gossip information is absent and there are 3.X nodes 
present in the cluster, it's not a big deal - the check will just be executed 
and the timeout warning above will be unnecessarily emitted.

We just don't want to skip this check when *all nodes are upgraded to 4.x* but 
I don't think this would happen if waitToSettle fails.

> StartupClusterConnectivityChecker fails on upgrade from 3.X
> ---
>
> Key: CASSANDRA-18968
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18968
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown
>Reporter: Paulo Motta
>Assignee: Isaac Reath
>Priority: Normal
>  Labels: lhf
> Fix For: 4.0.x, 4.1.x
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Starting up a new 4.X node on a 3.x cluster throws the following warning:
> {noformat}
> WARN  [main] 2023-10-27 15:58:22,234 
> StartupClusterConnectivityChecker.java:183 - Timed out after 10002 
> milliseconds, was waiting for remaining peers to connect: {dc1=[X.Y.Z.W, 
> A.B.C.D]}
> {noformat}
> I think this is because the PING messages used by the startup check are not 
> available on 3.X.
> To provide a smoother upgrade experience we should probably disable this 
> check on a mixed version clusters, or skip peers on versions < 4.x when doing 
> the connectivity check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18968) StartupClusterConnectivityChecker fails on upgrade from 3.X

2023-11-08 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784224#comment-17784224
 ] 

Brandon Williams commented on CASSANDRA-18968:
--

I think it would make a lot of sense to run the upgrade tests here.

> StartupClusterConnectivityChecker fails on upgrade from 3.X
> ---
>
> Key: CASSANDRA-18968
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18968
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown
>Reporter: Paulo Motta
>Assignee: Isaac Reath
>Priority: Normal
>  Labels: lhf
> Fix For: 4.0.x, 4.1.x
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Starting up a new 4.X node on a 3.x cluster throws the following warning:
> {noformat}
> WARN  [main] 2023-10-27 15:58:22,234 
> StartupClusterConnectivityChecker.java:183 - Timed out after 10002 
> milliseconds, was waiting for remaining peers to connect: {dc1=[X.Y.Z.W, 
> A.B.C.D]}
> {noformat}
> I think this is because the PING messages used by the startup check are not 
> available on 3.X.
> To provide a smoother upgrade experience we should probably disable this 
> check on a mixed version clusters, or skip peers on versions < 4.x when doing 
> the connectivity check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18968) StartupClusterConnectivityChecker fails on upgrade from 3.X

2023-11-08 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784219#comment-17784219
 ] 

Stefan Miklosovic commented on CASSANDRA-18968:
---

[~paulo] well ... the waiting for a gossip as you shown that method has its own 
set of problems (1) my colleague recently hit and tried to solve it but we had 
to revert that, unfortunately. It is quite a read though (2), (2) is the 
followup ticket which was trying to improve (1) but we never got there because 
we reverted it.

Whole "waiting for gossip to settle" machinery is ... not ideal. Yes, it works 
in most of the situations but there are edge cases when it does not, e.g. when 
there are large clusters, it may happen that it may evaluate that gossip is 
"settled" falsely because it took so much time to detect any changes that it 
was thinking it is settled.

(1) https://issues.apache.org/jira/browse/CASSANDRA-18543
(2) https://issues.apache.org/jira/browse/CASSANDRA-18845

> StartupClusterConnectivityChecker fails on upgrade from 3.X
> ---
>
> Key: CASSANDRA-18968
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18968
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown
>Reporter: Paulo Motta
>Assignee: Isaac Reath
>Priority: Normal
>  Labels: lhf
> Fix For: 4.0.x, 4.1.x
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Starting up a new 4.X node on a 3.x cluster throws the following warning:
> {noformat}
> WARN  [main] 2023-10-27 15:58:22,234 
> StartupClusterConnectivityChecker.java:183 - Timed out after 10002 
> milliseconds, was waiting for remaining peers to connect: {dc1=[X.Y.Z.W, 
> A.B.C.D]}
> {noformat}
> I think this is because the PING messages used by the startup check are not 
> available on 3.X.
> To provide a smoother upgrade experience we should probably disable this 
> check on a mixed version clusters, or skip peers on versions < 4.x when doing 
> the connectivity check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18968) StartupClusterConnectivityChecker fails on upgrade from 3.X

2023-11-08 Thread Paulo Motta (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784212#comment-17784212
 ] 

Paulo Motta commented on CASSANDRA-18968:
-

It seems like this issue was raised by [~aleksey] on CASSANDRA-13993:
{quote}As implemented currently, we are going to send PINGs potentially to 
3.11/3.0 - unless we switch to gating by version, which we do sometimes.
{quote}
{quote}So I was thinking about a major upgrade bounce scenario. Think the first 
ever node to upgrade to 4.0 in a cluster of 3.0 nodes - will send out pings to 
every node, but receive no pongs, correct? So every node until a threshold will 
have a significantly longer bounce. Do we care about this case?
{quote}
Which was replied by [~jasobrown] with:
{quote}So here's the rub: we don't necessarily know the peer's version yet. The 
ping messages are sent on the large/small connections, but we're not guaranteed 
that at least one round of gossip has completed wherein we would learn the 
version of the peers (we're still at in the startup process).
{quote}
However I don't think this is a problem since we [wait for gossip to 
settle|https://github.com/apache/cassandra/blob/7b891db36d4bcfa116ee04e3f4b3f31af798d5b2/src/java/org/apache/cassandra/service/CassandraDaemon.java#L401]
 before executing this check? Can you confirm this [~brandon.williams] ?

The worst that can happen if the version of a peer is unknown is to 
unnecessarily execute this check which will just fallback to the current 
behavior which is not a big deal IMO - it will just make startup slightly 
slower and log a warning.

Confirmed that when upgrading a cluster from 3.11 to 4.1 the following message 
is print on the debug.log for all except the last node:
{noformat}
DEBUG [main] 2023-11-08 21:08:42,056 StartupClusterConnectivityChecker.java:97 
- Skipping startup connectivity check as some nodes may be running Cassandra 
version 3 or older which does not support connectivity checking.
{noformat}
In the last node to be upgraded the check is executed as expected:
{noformat}
INFO  [main] 2023-11-08 21:35:30,387 StartupClusterConnectivityChecker.java:128 
- Blocking coordination until only a single peer is DOWN in the local 
datacenter, timeout=10s
INFO  [main] 2023-11-08 21:35:30,453 StartupClusterConnectivityChecker.java:181 
- Ensured sufficient healthy connections with [DC1] after 63 milliseconds
{noformat}
{quote}+1, I am waiting for another formal +1 and we can ship this!
{quote}
LGTM, would you like to commit this [~smiklosovic] ? If so perhaps update the 
CHANGES.txt and commit message to "Skip connectivity check when upgrading from 
3.X".

> StartupClusterConnectivityChecker fails on upgrade from 3.X
> ---
>
> Key: CASSANDRA-18968
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18968
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown
>Reporter: Paulo Motta
>Assignee: Isaac Reath
>Priority: Normal
>  Labels: lhf
> Fix For: 4.0.x, 4.1.x
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Starting up a new 4.X node on a 3.x cluster throws the following warning:
> {noformat}
> WARN  [main] 2023-10-27 15:58:22,234 
> StartupClusterConnectivityChecker.java:183 - Timed out after 10002 
> milliseconds, was waiting for remaining peers to connect: {dc1=[X.Y.Z.W, 
> A.B.C.D]}
> {noformat}
> I think this is because the PING messages used by the startup check are not 
> available on 3.X.
> To provide a smoother upgrade experience we should probably disable this 
> check on a mixed version clusters, or skip peers on versions < 4.x when doing 
> the connectivity check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18968) StartupClusterConnectivityChecker fails on upgrade from 3.X

2023-11-07 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783911#comment-17783911
 ] 

Stefan Miklosovic commented on CASSANDRA-18968:
---

[4.0 
j11-precommit|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3457/workflows/10731e97-21b1-4846-9450-800f8dc0bc05]
[4.0 
j8-precommit|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3457/workflows/b37d8e69-cc51-48ea-9d12-75e493eb1351]
[4.1 
j11-precommit|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3458/workflows/122fcd49-3e1e-494b-8949-b75bf2b9f6af]
[4.1 
j8-precommit|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3458/workflows/5539e1f8-ec73-4201-9ef4-6c12f9858aa9]

+1, I am waiting for another format +1 and we can ship this!

these were the squashed branches I was running CI for:

[4.0|https://github.com/instaclustr/cassandra/commits/CASSANDRA-18968-4.0]
[4.1|https://github.com/instaclustr/cassandra/commits/CASSANDRA-18968-4.1]

> StartupClusterConnectivityChecker fails on upgrade from 3.X
> ---
>
> Key: CASSANDRA-18968
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18968
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown
>Reporter: Paulo Motta
>Assignee: Isaac Reath
>Priority: Normal
>  Labels: lhf
> Fix For: 4.0.x, 4.1.x
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Starting up a new 4.X node on a 3.x cluster throws the following warning:
> {noformat}
> WARN  [main] 2023-10-27 15:58:22,234 
> StartupClusterConnectivityChecker.java:183 - Timed out after 10002 
> milliseconds, was waiting for remaining peers to connect: {dc1=[X.Y.Z.W, 
> A.B.C.D]}
> {noformat}
> I think this is because the PING messages used by the startup check are not 
> available on 3.X.
> To provide a smoother upgrade experience we should probably disable this 
> check on a mixed version clusters, or skip peers on versions < 4.x when doing 
> the connectivity check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18968) StartupClusterConnectivityChecker fails on upgrade from 3.X

2023-11-07 Thread Isaac Reath (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783834#comment-17783834
 ] 

Isaac Reath commented on CASSANDRA-18968:
-

Yeah just 4.0 and 4.1. Afaik 5.0 isn't supposed to support upgrading from 3.x I 
think we're ok. 

> StartupClusterConnectivityChecker fails on upgrade from 3.X
> ---
>
> Key: CASSANDRA-18968
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18968
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown
>Reporter: Paulo Motta
>Assignee: Isaac Reath
>Priority: Normal
>  Labels: lhf
> Fix For: 4.0.x, 4.1.x
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Starting up a new 4.X node on a 3.x cluster throws the following warning:
> {noformat}
> WARN  [main] 2023-10-27 15:58:22,234 
> StartupClusterConnectivityChecker.java:183 - Timed out after 10002 
> milliseconds, was waiting for remaining peers to connect: {dc1=[X.Y.Z.W, 
> A.B.C.D]}
> {noformat}
> I think this is because the PING messages used by the startup check are not 
> available on 3.X.
> To provide a smoother upgrade experience we should probably disable this 
> check on a mixed version clusters, or skip peers on versions < 4.x when doing 
> the connectivity check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18968) StartupClusterConnectivityChecker fails on upgrade from 3.X

2023-11-07 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783684#comment-17783684
 ] 

Stefan Miklosovic commented on CASSANDRA-18968:
---

This goes just to 4.0 and 4.1, correct?  [~isaacreath] [~paulo]
 

> StartupClusterConnectivityChecker fails on upgrade from 3.X
> ---
>
> Key: CASSANDRA-18968
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18968
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown
>Reporter: Paulo Motta
>Assignee: Isaac Reath
>Priority: Normal
>  Labels: lhf
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Starting up a new 4.X node on a 3.x cluster throws the following warning:
> {noformat}
> WARN  [main] 2023-10-27 15:58:22,234 
> StartupClusterConnectivityChecker.java:183 - Timed out after 10002 
> milliseconds, was waiting for remaining peers to connect: {dc1=[X.Y.Z.W, 
> A.B.C.D]}
> {noformat}
> I think this is because the PING messages used by the startup check are not 
> available on 3.X.
> To provide a smoother upgrade experience we should probably disable this 
> check on a mixed version clusters, or skip peers on versions < 4.x when doing 
> the connectivity check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18968) StartupClusterConnectivityChecker fails on upgrade from 3.X

2023-11-03 Thread Isaac Reath (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782742#comment-17782742
 ] 

Isaac Reath commented on CASSANDRA-18968:
-

Updated for 4.0 (https://github.com/apache/cassandra/pull/2863) and 4.1 
(https://github.com/apache/cassandra/pull/2862)

> StartupClusterConnectivityChecker fails on upgrade from 3.X
> ---
>
> Key: CASSANDRA-18968
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18968
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown
>Reporter: Paulo Motta
>Assignee: Isaac Reath
>Priority: Normal
>  Labels: lhf
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Starting up a new 4.X node on a 3.x cluster throws the following warning:
> {noformat}
> WARN  [main] 2023-10-27 15:58:22,234 
> StartupClusterConnectivityChecker.java:183 - Timed out after 10002 
> milliseconds, was waiting for remaining peers to connect: {dc1=[X.Y.Z.W, 
> A.B.C.D]}
> {noformat}
> I think this is because the PING messages used by the startup check are not 
> available on 3.X.
> To provide a smoother upgrade experience we should probably disable this 
> check on a mixed version clusters, or skip peers on versions < 4.x when doing 
> the connectivity check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18968) StartupClusterConnectivityChecker fails on upgrade from 3.X

2023-10-27 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780532#comment-17780532
 ] 

Brandon Williams commented on CASSANDRA-18968:
--

bq.  this does not fail startup, it just unnecessarily emits a misleading 
warning. It's pretty harmless.

Ah, I see, that explains why tests pass.

bq. We could add a verbhandler to 3.x but at this stage I think it's 
easier/simpler to just version gate this feature

I agree.

> StartupClusterConnectivityChecker fails on upgrade from 3.X
> ---
>
> Key: CASSANDRA-18968
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18968
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown
>Reporter: Paulo Motta
>Priority: Normal
>  Labels: lhf
>
> Starting up a new 4.X node on a 3.x cluster throws the following warning:
> {noformat}
> WARN  [main] 2023-10-27 15:58:22,234 
> StartupClusterConnectivityChecker.java:183 - Timed out after 10002 
> milliseconds, was waiting for remaining peers to connect: {dc1=[X.Y.Z.W, 
> A.B.C.D]}
> {noformat}
> I think this is because the PING messages used by the startup check are not 
> available on 3.X.
> To provide a smoother upgrade experience we should probably disable this 
> check on a mixed version clusters, or skip peers on versions < 4.x when doing 
> the connectivity check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18968) StartupClusterConnectivityChecker fails on upgrade from 3.X

2023-10-27 Thread Paulo Motta (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780515#comment-17780515
 ] 

Paulo Motta commented on CASSANDRA-18968:
-

It looks like support to PING was added to 3.x on CASSANDRA-14447 but [no verb 
handler was 
implemented|https://github.com/jasobrown/cassandra/commit/795de31194f1109490d759d8b339efcf13118971]
 which explains why there are no acknowledgements. We could add a verbhandler 
to 3.x but at this stage I think it's easier/simpler to just version gate this 
feature to fully upgraded 4.x clusters.

> StartupClusterConnectivityChecker fails on upgrade from 3.X
> ---
>
> Key: CASSANDRA-18968
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18968
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown
>Reporter: Paulo Motta
>Priority: Normal
>  Labels: lhf
>
> Starting up a new 4.X node on a 3.x cluster throws the following warning:
> {noformat}
> WARN  [main] 2023-10-27 15:58:22,234 
> StartupClusterConnectivityChecker.java:183 - Timed out after 10002 
> milliseconds, was waiting for remaining peers to connect: {dc1=[X.Y.Z.W, 
> A.B.C.D]}
> {noformat}
> I think this is because the PING messages used by the startup check are not 
> available on 3.X.
> To provide a smoother upgrade experience we should probably disable this 
> check on a mixed version clusters, or skip peers on versions < 4.x when doing 
> the connectivity check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18968) StartupClusterConnectivityChecker fails on upgrade from 3.X

2023-10-27 Thread Paulo Motta (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780509#comment-17780509
 ] 

Paulo Motta commented on CASSANDRA-18968:
-

Hi [~brandon.williams] - this does not fail startup, it just unnecessarily 
emits a misleading warning. It's pretty harmless.

I'm not familiar with this patch but I suspect 3.x nodes do not have the 
{{PING_REQ}} message [sent 
here|https://github.com/apache/cassandra/blob/f8c240147c307bf5c527ff3a34e3c0f3043b7e9c/src/java/org/apache/cassandra/net/StartupClusterConnectivityChecker.java#L143]
  so the connectivity checker does not get the necessary number of 
acknowledgements (3) to succeed.

If this is the case, I think the easiest fix is just to disable the check if 
there are 3.X nodes, so the warning is not unnecessarily emitted.

> StartupClusterConnectivityChecker fails on upgrade from 3.X
> ---
>
> Key: CASSANDRA-18968
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18968
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown
>Reporter: Paulo Motta
>Priority: Normal
>  Labels: lhf
>
> Starting up a new 4.X node on a 3.x cluster throws the following warning:
> {noformat}
> WARN  [main] 2023-10-27 15:58:22,234 
> StartupClusterConnectivityChecker.java:183 - Timed out after 10002 
> milliseconds, was waiting for remaining peers to connect: {dc1=[X.Y.Z.W, 
> A.B.C.D]}
> {noformat}
> I think this is because the PING messages used by the startup check are not 
> available on 3.X.
> To provide a smoother upgrade experience we should probably disable this 
> check on a mixed version clusters, or skip peers on versions < 4.x when doing 
> the connectivity check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18968) StartupClusterConnectivityChecker fails on upgrade from 3.X

2023-10-27 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780500#comment-17780500
 ] 

Brandon Williams commented on CASSANDRA-18968:
--

Hmm, are you sure about this?  The upgrade tests have some environmental 
flakiness, but generally have been passing: 
https://ci-cassandra.apache.org/job/Cassandra-4.0/lastSuccessfulBuild/testReport/dtest-upgrade.upgrade_tests.upgrade_through_versions_test/

> StartupClusterConnectivityChecker fails on upgrade from 3.X
> ---
>
> Key: CASSANDRA-18968
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18968
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown
>Reporter: Paulo Motta
>Priority: Normal
>  Labels: lhf
>
> Starting up a new 4.X node on a 3.x cluster throws the following warning:
> {noformat}
> WARN  [main] 2023-10-27 15:58:22,234 
> StartupClusterConnectivityChecker.java:183 - Timed out after 10002 
> milliseconds, was waiting for remaining peers to connect: {dc1=[X.Y.Z.W, 
> A.B.C.D]}
> {noformat}
> I think this is because the PING messages used by the startup check are not 
> available on 3.X.
> To provide a smoother upgrade experience we should probably disable this 
> check on a mixed version clusters, or skip peers on versions < 4.x when doing 
> the connectivity check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org