[jira] [Updated] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails due to IP change

2023-12-06 Thread Aldo (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aldo updated CASSANDRA-19178:
-
Summary: Cluster upgrade 3.x -> 4.x fails due to IP change  (was: Cluster 
upgrade 3.x -> 4.x fails with no internode encryption)

> Cluster upgrade 3.x -> 4.x fails due to IP change
> -
>
> Key: CASSANDRA-19178
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19178
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Aldo
>Priority: Normal
> Attachments: cassandra7.downgrade.log, cassandra7.log
>
>
> I have a Docker swarm cluster with 3 distinct Cassandra services (named 
> {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different 
> servers. The 3 services are running the version 3.11.16, using the official 
> Cassandra image 3.11.16 on Docker Hub. The first service is configured just 
> with the following environment variables
> {code:java}
> CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7"
> CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code}
> which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance 
> the _cassandra.yaml_ for the first service contains the following (and the 
> rest is the image default):
> {code:java}
> # grep tasks /etc/cassandra/cassandra.yaml
>           - seeds: "tasks.cassandra7,tasks.cassandra9"
> listen_address: tasks.cassandra7
> broadcast_address: tasks.cassandra7
> broadcast_rpc_address: tasks.cassandra7 {code}
> Other services (8 and 9) have a similar configuration, obviously with a 
> different {{CASSANDRA_LISTEN_ADDRESS }}(\{{{}tasks.cassandra8}} and 
> {{{}tasks.cassandra9{}}}).
> The cluster is running smoothly and all the nodes are perfectly able to 
> rejoin the cluster whichever event occurs, thanks to the Docker Swarm 
> {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for 
> Docker swarm to restart it, force update it in order to force a restart, 
> scale to 0 and then 1 the service, restart an entire server, turn off and 
> then turn on all the 3 servers. Never found an issue on this.
> I also just completed a full upgrade of the cluster from version 2.2.8 to 
> 3.11.16 (simply upgrading the Docker official image associated with the 
> services) without issues. I was also able, thanks to a 2.2.8 snapshot on each 
> server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I 
> finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables 
> have now the {{me-*}} prefix.
>  
> The problem I'm facing right now is the upgrade from 3.11.16 to 4.x. The 
> procedure that I follow is very simple:
>  # I start from the _cassandra7_ service (which is a seed node)
>  # {{nodetool drain}}
>  # Wait for the {{DRAINING ... DRAINED}} messages to appear in the log
>  # Upgrade the Docker image of _cassandra7_ to the official 4.1.3 version
> The procedure is exactly the same I followed for the upgrade 2.2.8 --> 
> 3.11.16, obviously with a different version at step 4. Unfortunately the 
> upgrade 3.x --> 4.x is not working, the _cassandra7_ service restarts and 
> attempts to communicate with the other seed node ({_}cassandra9{_}) but the 
> log of _cassandra7_ shows the following:
> {code:java}
> INFO  [Messaging-EventLoop-3-3] 2023-12-06 17:15:04,727 
> OutboundConnectionInitiator.java:390 - Failed to connect to peer 
> tasks.cassandra9/10.0.2.196:7000(tasks.cassandra9/10.0.2.196:7000)
> io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: 
> Connection reset by peer{code}
> The relevant port of the log, related to the missing internode communication, 
> is attached in _cassandra7.log_
> In the log of _cassandra9_ there is nothing after the abovementioned step #4. 
> So only _cassandra7_ is saying something in the logs.
> I tried with multiple versions (4.0.11 but also 4.0.0) but the outcome is 
> always the same. Of course when I follow the steps 1..3, then restore the 3.x 
> snapshot and finally perform the step #4 using the official 3.11.16 version 
> the node 7 restarts correctly and joins the cluster. I attached the relevant 
> part of the log (see {_}cassandra7.downgrade.log{_}) where you can see that 
> node 7 and 9 can communicate.
> I suspect this could be related to the port 7000 now (with Cassandra 4.x) 
> supporting both encrypted and unencrypted traffic. As stated previously I'm 
> using the untouched official Cassandra images so all my cluster, inside the 
> Docker Swarm, is not (and has never been) configured with encryption.
> I can also add the following: if I perform the 4 above steps also for the 
> _cassandra9_ and _cassandra8_ services, in the end the cluster works. But 
> this is not acceptable, because the cluster is unavailable until I finish the 
> full upgrade of all nodes: 

[jira] [Updated] (CASSANDRA-19180) Support reloading certificate stores in cassandra-java-driver

2023-12-06 Thread Abe Ratnofsky (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abe Ratnofsky updated CASSANDRA-19180:
--
Change Category: Operability
 Complexity: Normal
 Status: Open  (was: Triage Needed)

> Support reloading certificate stores in cassandra-java-driver
> -
>
> Key: CASSANDRA-19180
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19180
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Client/java-driver
>Reporter: Abe Ratnofsky
>Assignee: Abe Ratnofsky
>Priority: Normal
>
> Currently, apache/cassandra-java-driver does not reload SSLContext when the 
> underlying certificate store files change. When the DefaultSslEngineFactory 
> (and the other factories) are set up, they build a fixed instance of 
> javax.net.ssl.SSLContext that doesn't change: 
> https://github.com/apache/cassandra-java-driver/blob/12e3e3ea027c51c5807e5e46ba542f894edfa4e7/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/DefaultSslEngineFactory.java#L74
> This fixed SSLContext is used to negotiate SSL with the cluster, and if a 
> keystore is reloaded on disk it isn't picked up by the driver, and future 
> reconnections will fail if the keystore certificates have expired by the time 
> they're used to handshake a new connection.
> We should reload client certificates so that applications that provide them 
> can use short-lived certificates and not require a bounce to pick up new 
> certificates. This is especially relevant in a world with CASSANDRA-18554 and 
> broad use of mTLS.
> I have a patch for this that is nearly ready. Now that the project has moved 
> under apache/ - who can I work with to understand how CI works now?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19180) Support reloading certificate stores in cassandra-java-driver

2023-12-06 Thread Abe Ratnofsky (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abe Ratnofsky updated CASSANDRA-19180:
--
Impacts: Clients  (was: None)

> Support reloading certificate stores in cassandra-java-driver
> -
>
> Key: CASSANDRA-19180
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19180
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Client/java-driver
>Reporter: Abe Ratnofsky
>Assignee: Abe Ratnofsky
>Priority: Normal
>
> Currently, apache/cassandra-java-driver does not reload SSLContext when the 
> underlying certificate store files change. When the DefaultSslEngineFactory 
> (and the other factories) are set up, they build a fixed instance of 
> javax.net.ssl.SSLContext that doesn't change: 
> https://github.com/apache/cassandra-java-driver/blob/12e3e3ea027c51c5807e5e46ba542f894edfa4e7/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/DefaultSslEngineFactory.java#L74
> This fixed SSLContext is used to negotiate SSL with the cluster, and if a 
> keystore is reloaded on disk it isn't picked up by the driver, and future 
> reconnections will fail if the keystore certificates have expired by the time 
> they're used to handshake a new connection.
> We should reload client certificates so that applications that provide them 
> can use short-lived certificates and not require a bounce to pick up new 
> certificates. This is especially relevant in a world with CASSANDRA-18554 and 
> broad use of mTLS.
> I have a patch for this that is nearly ready. Now that the project has moved 
> under apache/ - who can I work with to understand how CI works now?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19180) Support reloading certificate stores in cassandra-java-driver

2023-12-06 Thread Abe Ratnofsky (Jira)
Abe Ratnofsky created CASSANDRA-19180:
-

 Summary: Support reloading certificate stores in 
cassandra-java-driver
 Key: CASSANDRA-19180
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19180
 Project: Cassandra
  Issue Type: New Feature
  Components: Client/java-driver
Reporter: Abe Ratnofsky
Assignee: Abe Ratnofsky


Currently, apache/cassandra-java-driver does not reload SSLContext when the 
underlying certificate store files change. When the DefaultSslEngineFactory 
(and the other factories) are set up, they build a fixed instance of 
javax.net.ssl.SSLContext that doesn't change: 
https://github.com/apache/cassandra-java-driver/blob/12e3e3ea027c51c5807e5e46ba542f894edfa4e7/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/DefaultSslEngineFactory.java#L74

This fixed SSLContext is used to negotiate SSL with the cluster, and if a 
keystore is reloaded on disk it isn't picked up by the driver, and future 
reconnections will fail if the keystore certificates have expired by the time 
they're used to handshake a new connection.

We should reload client certificates so that applications that provide them can 
use short-lived certificates and not require a bounce to pick up new 
certificates. This is especially relevant in a world with CASSANDRA-18554 and 
broad use of mTLS.

I have a patch for this that is nearly ready. Now that the project has moved 
under apache/ - who can I work with to understand how CI works now?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails with no internode encryption

2023-12-06 Thread Aldo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793944#comment-17793944
 ] 

Aldo edited comment on CASSANDRA-19178 at 12/6/23 11:06 PM:


I apologize in advance if reopening is not the correct behavior, please tell me 
if I need to open a new issue. 
I think I've discovered the source cause of the issue, and wonder if it's a bug 
or it's caused by a misconfiguration on my side.
 
Using {{nodetool setlogginglevel org.apache.cassandra TRACE}} on both the 4.x 
upgraded node (cassandra7) and on the running 3.x seed node (cassandra9) I was 
able to isolate the relevant logs:
 
On cassandra7:
 
 
{code:java}
TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,410 
EndpointMessagingVersions.java:67 - Assuming current protocol version for 
tasks.cassandra9/10.0.2.92:7000 
TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,410 
OutboundConnectionInitiator.java:131 - creating outbound bootstrap to peer: 
(tasks.cassandra9/10.0.2.92:7000, tasks.cassandra9/10.0.2.92:7000), framing: 
CRC, encryption: unencrypted, requestVersion: 12
TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,411 
OutboundConnectionInitiator.java:236 - starting handshake with peer 
tasks.cassandra9/10.0.2.92:7000(tasks.cassandra9/10.0.2.92:7000), msg = 
Initiate(request: 12, min: 10, max: 12, type: URGENT_MESSAGES, framing: true, 
from: tasks.cassandra7/10.0.2.137:7000) 
INFO  [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,412 
OutboundConnectionInitiator.java:390 - Failed to connect to peer 
tasks.cassandra9/10.0.2.92:7000(tasks.cassandra9/10.0.2.92:7000) 
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: 
Connection reset by peer  {code}
 
On cassandra9:
 
{code:java}
TRACE [ACCEPT-tasks.cassandra9/10.0.2.92] 2023-12-06 22:16:56,411 
MessagingService.java:1315 - Connection version 12 from /10.0.2.137
TRACE [MessagingService-Incoming-/10.0.2.137] 2023-12-06 22:16:56,412 
IncomingTcpConnection.java:111 - IOException reading from socket; closing
java.io.IOException: Peer-used messaging version 12 is larger than max 
supported 11
        at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:153)
        at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:98)
TRACE [MessagingService-Incoming-/10.0.2.137] 2023-12-06 22:16:56,412 
IncomingTcpConnection.java:125 - Closing socket 
Socket[addr=/10.0.2.137,port=45680,localport=7000] - isclosed: false {code}
 
So it seems there is a mismatch on this {_}messaging version{_}.

 I'm trying to understand the behaviour of _EndpointMessagingVersions.java_ and 
_OutboundConnectionInitiator.java_ on the 4.1.x trunk and it seems that there 
are few facts:
 # the internal map of _EndpointMessagingVersions_ on the node just restarted 
(cassandra7) for sure doesn't include information about the existing node 
(cassandra9). This because on my network configuration cassandra7 (or more 
precisely the tasks.cassandra7 hostname) changed IP due to the restart. So 
cassandra9 (the 3.x running node) cannot send its messaging version (=11) to 
the newest cassandra7 until the handshake completes.
 # therefore inside _OutboundConnectionInitiator_ the messaging version for the 
cassandra7–> cassandra9 handshake is assumed equal to the current (=12)
 # when the 3.x node (cassandra9) determines the messaging version mismatch it 
throws an IOException and closed the connection
 # the 4.x node (cassandra7) just sees a connection reset by peer and seems not 
capable of downgrade the messaging version and retry the handshake

I can again state that a similar upgrade path, with different involved versions 
(2.2.8 --> to 3.11.16) on the same exact architecture, involving the same 
Docker swarm services, the same IP-changing behaviour, etc... worked like a 
charm. So I think something changed on the source code and breaked that 
behavior when the upgrade is 3.11.16 --> 4.1.3.


was (Author: JIRAUSER303409):
I apologize in advance if reopening is not the correct behavior, please tell me 
if I need to open a new issue. 
I think I've discovered the source cause of the issue, and wonder if it's a bug 
or it's caused by a misconfiguration on my side.
 
Using {{nodetool setlogginglevel org.apache.cassandra TRACE}} on both the 4.x 
upgraded node (cassandra7) and on the running 3.x seed node (cassandra9) I was 
able to isolate the relevant logs:
 
On cassandra7:
 
 
{code:java}
TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,410 
EndpointMessagingVersions.java:67 - Assuming current protocol version for 
tasks.cassandra9/10.0.2.92:7000 
TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,410 
OutboundConnectionInitiator.java:131 - creating outbound bootstrap to peer: 
(tasks.cassandra9/10.0.2.92:7000, tasks.cassandra9/10.0.2.92:7000), framing: 
CRC, encryption: unencrypted, requestVersion: 12
TRACE 

[jira] [Commented] (CASSANDRA-19116) History Builder API 2.0

2023-12-06 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793948#comment-17793948
 ] 

Caleb Rackliffe commented on CASSANDRA-19116:
-

+1

> History Builder API 2.0
> ---
>
> Key: CASSANDRA-19116
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19116
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Test/fuzz
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Urgent
>
> Harry history Builder 2.0
>   * New History Builder API
>   * Add an ability to track LTS visiteb by partition in visited_lts 
> static column
>   * Add a model checker that checks against a different Cluster instance 
> (for example, flush vs no flush, local vs nonlocal, etc)
>   * Add an ability to issue LTSs out-of-order



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19148: Remove unused dead code [cassandra-analytics]

2023-12-06 Thread via GitHub


frankgh commented on PR #23:
URL: 
https://github.com/apache/cassandra-analytics/pull/23#issuecomment-1843813701

   Closed via 
https://github.com/apache/cassandra-analytics/commit/680cc9395c55a88217f2de975f62ad588e8c95d5


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19148: Remove unused dead code [cassandra-analytics]

2023-12-06 Thread via GitHub


frankgh closed pull request #23: CASSANDRA-19148: Remove unused dead code
URL: https://github.com/apache/cassandra-analytics/pull/23


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails with no internode encryption

2023-12-06 Thread Aldo (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aldo updated CASSANDRA-19178:
-
Resolution: (was: Invalid)
Status: Open  (was: Resolved)

> Cluster upgrade 3.x -> 4.x fails with no internode encryption
> -
>
> Key: CASSANDRA-19178
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19178
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Aldo
>Priority: Normal
> Attachments: cassandra7.downgrade.log, cassandra7.log
>
>
> I have a Docker swarm cluster with 3 distinct Cassandra services (named 
> {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different 
> servers. The 3 services are running the version 3.11.16, using the official 
> Cassandra image 3.11.16 on Docker Hub. The first service is configured just 
> with the following environment variables
> {code:java}
> CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7"
> CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code}
> which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance 
> the _cassandra.yaml_ for the first service contains the following (and the 
> rest is the image default):
> {code:java}
> # grep tasks /etc/cassandra/cassandra.yaml
>           - seeds: "tasks.cassandra7,tasks.cassandra9"
> listen_address: tasks.cassandra7
> broadcast_address: tasks.cassandra7
> broadcast_rpc_address: tasks.cassandra7 {code}
> Other services (8 and 9) have a similar configuration, obviously with a 
> different {{CASSANDRA_LISTEN_ADDRESS }}(\{{{}tasks.cassandra8}} and 
> {{{}tasks.cassandra9{}}}).
> The cluster is running smoothly and all the nodes are perfectly able to 
> rejoin the cluster whichever event occurs, thanks to the Docker Swarm 
> {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for 
> Docker swarm to restart it, force update it in order to force a restart, 
> scale to 0 and then 1 the service, restart an entire server, turn off and 
> then turn on all the 3 servers. Never found an issue on this.
> I also just completed a full upgrade of the cluster from version 2.2.8 to 
> 3.11.16 (simply upgrading the Docker official image associated with the 
> services) without issues. I was also able, thanks to a 2.2.8 snapshot on each 
> server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I 
> finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables 
> have now the {{me-*}} prefix.
>  
> The problem I'm facing right now is the upgrade from 3.11.16 to 4.x. The 
> procedure that I follow is very simple:
>  # I start from the _cassandra7_ service (which is a seed node)
>  # {{nodetool drain}}
>  # Wait for the {{DRAINING ... DRAINED}} messages to appear in the log
>  # Upgrade the Docker image of _cassandra7_ to the official 4.1.3 version
> The procedure is exactly the same I followed for the upgrade 2.2.8 --> 
> 3.11.16, obviously with a different version at step 4. Unfortunately the 
> upgrade 3.x --> 4.x is not working, the _cassandra7_ service restarts and 
> attempts to communicate with the other seed node ({_}cassandra9{_}) but the 
> log of _cassandra7_ shows the following:
> {code:java}
> INFO  [Messaging-EventLoop-3-3] 2023-12-06 17:15:04,727 
> OutboundConnectionInitiator.java:390 - Failed to connect to peer 
> tasks.cassandra9/10.0.2.196:7000(tasks.cassandra9/10.0.2.196:7000)
> io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: 
> Connection reset by peer{code}
> The relevant port of the log, related to the missing internode communication, 
> is attached in _cassandra7.log_
> In the log of _cassandra9_ there is nothing after the abovementioned step #4. 
> So only _cassandra7_ is saying something in the logs.
> I tried with multiple versions (4.0.11 but also 4.0.0) but the outcome is 
> always the same. Of course when I follow the steps 1..3, then restore the 3.x 
> snapshot and finally perform the step #4 using the official 3.11.16 version 
> the node 7 restarts correctly and joins the cluster. I attached the relevant 
> part of the log (see {_}cassandra7.downgrade.log{_}) where you can see that 
> node 7 and 9 can communicate.
> I suspect this could be related to the port 7000 now (with Cassandra 4.x) 
> supporting both encrypted and unencrypted traffic. As stated previously I'm 
> using the untouched official Cassandra images so all my cluster, inside the 
> Docker Swarm, is not (and has never been) configured with encryption.
> I can also add the following: if I perform the 4 above steps also for the 
> _cassandra9_ and _cassandra8_ services, in the end the cluster works. But 
> this is not acceptable, because the cluster is unavailable until I finish the 
> full upgrade of all nodes: I need to perform a step-update, one 

[jira] [Commented] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails with no internode encryption

2023-12-06 Thread Aldo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793944#comment-17793944
 ] 

Aldo commented on CASSANDRA-19178:
--

I apologize in advance if reopening is not the correct behavior, please tell me 
if I need to open a new issue. 
I think I've discovered the source cause of the issue, and wonder if it's a bug 
or it's caused by a misconfiguration on my side.
 
Using {{nodetool setlogginglevel org.apache.cassandra TRACE}} on both the 4.x 
upgraded node (cassandra7) and on the running 3.x seed node (cassandra9) I was 
able to isolate the relevant logs:
 
On cassandra7:
 
 
{code:java}
TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,410 
EndpointMessagingVersions.java:67 - Assuming current protocol version for 
tasks.cassandra9/10.0.2.92:7000 
TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,410 
OutboundConnectionInitiator.java:131 - creating outbound bootstrap to peer: 
(tasks.cassandra9/10.0.2.92:7000, tasks.cassandra9/10.0.2.92:7000), framing: 
CRC, encryption: unencrypted, requestVersion: 12
TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,411 
OutboundConnectionInitiator.java:236 - starting handshake with peer 
tasks.cassandra9/10.0.2.92:7000(tasks.cassandra9/10.0.2.92:7000), msg = 
Initiate(request: 12, min: 10, max: 12, type: URGENT_MESSAGES, framing: true, 
from: tasks.cassandra7/10.0.2.137:7000) 
INFO  [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,412 
OutboundConnectionInitiator.java:390 - Failed to connect to peer 
tasks.cassandra9/10.0.2.92:7000(tasks.cassandra9/10.0.2.92:7000) 
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: 
Connection reset by peer  {code}
 
On cassandra9:
 
{code:java}
TRACE [ACCEPT-tasks.cassandra9/10.0.2.92] 2023-12-06 22:16:56,411 
MessagingService.java:1315 - Connection version 12 from /10.0.2.137
TRACE [MessagingService-Incoming-/10.0.2.137] 2023-12-06 22:16:56,412 
IncomingTcpConnection.java:111 - IOException reading from socket; closing
java.io.IOException: Peer-used messaging version 12 is larger than max 
supported 11
        at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:153)
        at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:98)
TRACE [MessagingService-Incoming-/10.0.2.137] 2023-12-06 22:16:56,412 
IncomingTcpConnection.java:125 - Closing socket 
Socket[addr=/10.0.2.137,port=45680,localport=7000] - isclosed: false {code}
 
So it seems there is a mismatch on this {_}messaging version{_}.
 

> Cluster upgrade 3.x -> 4.x fails with no internode encryption
> -
>
> Key: CASSANDRA-19178
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19178
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Aldo
>Priority: Normal
> Attachments: cassandra7.downgrade.log, cassandra7.log
>
>
> I have a Docker swarm cluster with 3 distinct Cassandra services (named 
> {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different 
> servers. The 3 services are running the version 3.11.16, using the official 
> Cassandra image 3.11.16 on Docker Hub. The first service is configured just 
> with the following environment variables
> {code:java}
> CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7"
> CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code}
> which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance 
> the _cassandra.yaml_ for the first service contains the following (and the 
> rest is the image default):
> {code:java}
> # grep tasks /etc/cassandra/cassandra.yaml
>           - seeds: "tasks.cassandra7,tasks.cassandra9"
> listen_address: tasks.cassandra7
> broadcast_address: tasks.cassandra7
> broadcast_rpc_address: tasks.cassandra7 {code}
> Other services (8 and 9) have a similar configuration, obviously with a 
> different {{CASSANDRA_LISTEN_ADDRESS }}(\{{{}tasks.cassandra8}} and 
> {{{}tasks.cassandra9{}}}).
> The cluster is running smoothly and all the nodes are perfectly able to 
> rejoin the cluster whichever event occurs, thanks to the Docker Swarm 
> {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for 
> Docker swarm to restart it, force update it in order to force a restart, 
> scale to 0 and then 1 the service, restart an entire server, turn off and 
> then turn on all the 3 servers. Never found an issue on this.
> I also just completed a full upgrade of the cluster from version 2.2.8 to 
> 3.11.16 (simply upgrading the Docker official image associated with the 
> services) without issues. I was also able, thanks to a 2.2.8 snapshot on each 
> server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I 
> finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables 
> have 

Re: [PR] CASSANDRA-18969: source files missing from sources jars due to maven … [cassandra-java-driver]

2023-12-06 Thread via GitHub


michaelsembwever commented on PR #1900:
URL: 
https://github.com/apache/cassandra-java-driver/pull/1900#issuecomment-1843771617

   I'll try to wrap up the review tomorrow, and if it looks ok merge and cut 
and stage a release.  That will help us get eyes on anything in the release 
we're missed… 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19148: Remove unused dead code [cassandra-analytics]

2023-12-06 Thread via GitHub


jberragan commented on PR #23:
URL: 
https://github.com/apache/cassandra-analytics/pull/23#issuecomment-1843751944

   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19179) BLOG - Apache Cassandra 5.0 Features: Vector Search

2023-12-06 Thread Paul Au (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793938#comment-17793938
 ] 

Paul Au commented on CASSANDRA-19179:
-

* 
[https://raw.githack.com/Paul-TT/cassandra-website/CASSANDRA-19179_generated/content/_/blog/Introducing-the-Apache-Cassandra-Catalyst-Program.html]
 * 
[https://raw.githack.com/Paul-TT/cassandra-website/CASSANDRA-19179_generated/content/_/blog.html]
 * 
https://raw.githack.com/Paul-TT/cassandra-website/CASSANDRA-19179_generated/content/_/cassandra-catalyst-program.html

> BLOG - Apache Cassandra 5.0 Features: Vector Search
> ---
>
> Key: CASSANDRA-19179
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19179
> Project: Cassandra
>  Issue Type: Task
>Reporter: Paul Au
>Priority: Normal
> Attachments: blog-index.png, blog-post.png, catalyst-page.png
>
>
> This ticket it to add a blog post to the site. The change includes:
>  * Adding the post : Apache Cassandra 5.0 Features: Vector Search
>  * updating the blog index page with the new post
>  * A small text change the Apache Catalyst program page.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19104) Standardize tablestats formatting and data units

2023-12-06 Thread Brad Schoening (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793912#comment-17793912
 ] 

Brad Schoening edited comment on CASSANDRA-19104 at 12/6/23 9:22 PM:
-

[~zaaath] this link explains the difference, so it just depends on how it is 
calculating size/1000 (KB) or size/1024 (KiB).

[What exactly are the storage/memory Units 
KiB/MiB/GiB/TiB|https://community.hpe.com/t5/hpe-primera-storage/what-exactly-are-the-storage-memory-units-kib-mib-gib-tib-in/td-p/7123450]


was (Author: bschoeni):
[~zaaath] this link explains the difference, so it just depends on how its 
calculated size/1000 (KB) or size/1024 (KiB).

[What exactly are the storage/memory Units 
KiB/MiB/GiB/TiB|https://community.hpe.com/t5/hpe-primera-storage/what-exactly-are-the-storage-memory-units-kib-mib-gib-tib-in/td-p/7123450]

> Standardize tablestats formatting and data units
> 
>
> Key: CASSANDRA-19104
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19104
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/nodetool
>Reporter: Brad Schoening
>Assignee: Leo Toff
>Priority: Normal
>
> Tablestats reports output in plaintext, JSON or YAML. The human readable 
> output currently has a mix of KiB, bytes with inconsistent spacing
> Simplify and default output to 'human readable'. Machine readable output is 
> available as an option and the current mixed output formatting is neither 
> friendly for human or machine reading and can be replaced.
> !image-2023-11-27-13-49-14-247.png!
> *Not a goal now (consider a follow up Jira):*
> Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting:
>  * gcstats - uses MB
>  * getcompactionthroughput - uses MB/s
>  * getstreamthroughput - uses MB/s
>  * info - uses MiB/GiB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19179) BLOG - Apache Cassandra 5.0 Features: Vector Search

2023-12-06 Thread Paul Au (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Au updated CASSANDRA-19179:

Attachment: blog-index.png
blog-post.png
catalyst-page.png

> BLOG - Apache Cassandra 5.0 Features: Vector Search
> ---
>
> Key: CASSANDRA-19179
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19179
> Project: Cassandra
>  Issue Type: Task
>Reporter: Paul Au
>Priority: Normal
> Attachments: blog-index.png, blog-post.png, catalyst-page.png
>
>
> This ticket it to add a blog post to the site. The change includes:
>  * Adding the post : Apache Cassandra 5.0 Features: Vector Search
>  * updating the blog index page with the new post
>  * A small text change the Apache Catalyst program page.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19179) BLOG - Apache Cassandra 5.0 Features: Vector Search

2023-12-06 Thread Paul Au (Jira)
Paul Au created CASSANDRA-19179:
---

 Summary: BLOG - Apache Cassandra 5.0 Features: Vector Search
 Key: CASSANDRA-19179
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19179
 Project: Cassandra
  Issue Type: Task
Reporter: Paul Au


This ticket it to add a blog post to the site. The change includes:
 * Adding the post : Apache Cassandra 5.0 Features: Vector Search
 * updating the blog index page with the new post
 * A small text change the Apache Catalyst program page.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory

2023-12-06 Thread Brad Schoening (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793935#comment-17793935
 ] 

Brad Schoening edited comment on CASSANDRA-18762 at 12/6/23 9:13 PM:
-

An update: we are still seeing this occur on a cluster. They have configured 
native_transport_max_thread = 256. A large number of repair Merkle trees 
precedes the OOM crash.

System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs

file_cache_size_in_mb =4096, G1HeapRegionSize=16M

!image-2023-12-06-15-58-55-007.png!

above graph is missing time ticks, but the spike occurs at 06:16:00

!image-2023-12-06-15-29-31-491.png!

 

Summary of the cassandra log:

11:17:10,289  [INFO ]  RepairSession.java:202 - [repair 
#838c24c0-935f-11ee-97ba-d79b6a12ccbe|#838c24c0-935f-11ee-97ba-d79b6a12ccbe] 
Received merkle tree for table1 from ...  [repeated 35 times]
11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 694ms.  G1 Eden Space: 
8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 
385875968 -> 0; 
11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 505ms.  G1 Old Gen: 1133473904 
-> 1133526408; 
11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1  
NoSpamLogger.java:92 - Some operations were slow, details available at debug 
level (debug.log)
11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 787ms.  G1 Eden Space: 16777216 
-> 0; G1 Old Gen: 1133526408 -> 1133545448; 
11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:292 - G1 Old Generation GC in 4742ms.  G1 Old Gen: 1133545448 
-> 1133581144; 
11:17:23,217 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
StatusLogger.java:65  [abbr] StatusLogger.java:65 - Pool Name       
                Active   Pending      Completed   Blocked  All Time Blocked
StatusLogger.java:69 - ReadStage                            1         0     
  48261572         0                 0
StatusLogger.java:69 - Native-Transport-Requests            1         0     
 395189663         0                 0
StatusLogger.java:69 - ValidationExecutor                   4        73     
    110086         0                 0
StatusLogger.java:69 - AntiEntropyStage                     1         0     
    352704         0                 0
11:17:24,114 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 853ms.  G1 Eden Space: 117440512 
-> 0; G1 Old Gen: 1133747360 -> 1133758448; 
11:17:24,564 [ERROR] [Messaging-EventLoop-3-5] cluster_id=99 
ip_address=10.0.0.1  JVMStabilityInspector.java:102 - OutOfMemory error letting 
the JVM handle the error:
java.lang.OutOfMemoryError: Direct buffer memory
    at java.base/java.nio.Bits.reserveMemory(Bits.java:175)
    at 
java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118)
    at 
java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318){{    ... 
etc}}


was (Author: bschoeni):
An update: we are still seeing this occur on a cluster. They have configured 
native_transport_max_thread = 256. A large number of repair Merkle trees 
precedes the OOM crash.

System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs

file_cache_size_in_mb =4096, G1HeapRegionSize=16M

!image-2023-12-06-15-58-55-007.png!

above graph is missing time ticks, but the spike occurs at 06:16:00

!image-2023-12-06-15-29-31-491.png!

 

Summary of the cassandra log:

{{11:17:10,289  [INFO ]  RepairSession.java:202 - [repair 
#838c24c0-935f-11ee-97ba-d79b6a12ccbe|#838c24c0-935f-11ee-97ba-d79b6a12ccbe] 
Received merkle tree for table1 from ...  [repeated 35 times]}}
{{11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 694ms.  G1 Eden Space: 
8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 
385875968 -> 0; }}
{{11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 505ms.  G1 Old Gen: 1133473904 
-> 1133526408; }}
{{11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1  
NoSpamLogger.java:92 - Some operations were slow, details available at debug 
level (debug.log)}}
{{11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 787ms.  G1 Eden Space: 16777216 
-> 0; G1 Old Gen: 1133526408 -> 1133545448; }}
{{11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:292 - G1 Old Generation GC 

[jira] [Comment Edited] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory

2023-12-06 Thread Brad Schoening (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793935#comment-17793935
 ] 

Brad Schoening edited comment on CASSANDRA-18762 at 12/6/23 9:12 PM:
-

An update: we are still seeing this occur on a cluster. They have configured 
native_transport_max_thread = 256. A large number of repair Merkle trees 
precedes the OOM crash.

System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs

file_cache_size_in_mb =4096, G1HeapRegionSize=16M

!image-2023-12-06-15-58-55-007.png!

above graph is missing time ticks, but the spike occurs at 06:16:00

!image-2023-12-06-15-29-31-491.png!

 

Summary of the cassandra log:

{{11:17:10,289  [INFO ]  RepairSession.java:202 - [repair 
#838c24c0-935f-11ee-97ba-d79b6a12ccbe|#838c24c0-935f-11ee-97ba-d79b6a12ccbe] 
Received merkle tree for table1 from ...  [repeated 35 times]}}
{{11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 694ms.  G1 Eden Space: 
8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 
385875968 -> 0; }}
{{11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 505ms.  G1 Old Gen: 1133473904 
-> 1133526408; }}
{{11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1  
NoSpamLogger.java:92 - Some operations were slow, details available at debug 
level (debug.log)}}
{{11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 787ms.  G1 Eden Space: 16777216 
-> 0; G1 Old Gen: 1133526408 -> 1133545448; }}
{{11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:292 - G1 Old Generation GC in 4742ms.  G1 Old Gen: 1133545448 
-> 1133581144; }}
{{11:17:23,217 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
StatusLogger.java:65  [abbr] }}{{StatusLogger.java:65 - Pool Name           
            Active   Pending      Completed   Blocked  All Time Blocked}}
{{StatusLogger.java:69 - ReadStage                            1         0       
48261572         0                 0}}
{{StatusLogger.java:69 - Native-Transport-Requests            1         0      
395189663         0                 0}}
{{StatusLogger.java:69 - ValidationExecutor                   4        73       
  110086         0                 0}}
{{StatusLogger.java:69 - AntiEntropyStage                     1         0       
  352704         0                 0}}
{{11:17:24,114 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 853ms.  G1 Eden Space: 117440512 
-> 0; G1 Old Gen: 1133747360 -> 1133758448; }}
{{11:17:24,564 [ERROR] [Messaging-EventLoop-3-5] cluster_id=99 
ip_address=10.0.0.1  JVMStabilityInspector.java:102 - OutOfMemory error letting 
the JVM handle the error:}}
{{java.lang.OutOfMemoryError: Direct buffer memory}}
{{    at java.base/java.nio.Bits.reserveMemory(Bits.java:175)}}
{{    at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118)}}
{{    at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318)}}

    ... etc


was (Author: bschoeni):
An update: we are still seeing this occur on a cluster. They have configured 
native_transport_max_thread = 256. A large number of repair Merkle trees 
precedes the OOM crash.

System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs

file_cache_size_in_mb =4096, G1HeapRegionSize=16M

!image-2023-12-06-15-58-55-007.png!

above graph is missing time ticks, but the spike occurs at 06:16:00

!image-2023-12-06-15-29-31-491.png!

 

Summary of the cassandra log:

{{11:17:10,289  [INFO ]  RepairSession.java:202 - [repair 
#838c24c0-935f-11ee-97ba-d79b6a12ccbe|#838c24c0-935f-11ee-97ba-d79b6a12ccbe] 
Received merkle tree for table1 from ...  [repeated 35 times]}}
{{11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 694ms.  G1 Eden Space: 
8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 
385875968 -> 0; }}
{{11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 505ms.  G1 Old Gen: 1133473904 
-> 1133526408; }}
{{11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1  
NoSpamLogger.java:92 - Some operations were slow, details available at debug 
level (debug.log)}}
{{11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 787ms.  G1 Eden Space: 16777216 
-> 0; G1 Old Gen: 1133526408 -> 1133545448; }}
{{11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:292 - G1 Old Generation GC in 4742ms.  G1 Old Gen: 1133545448 
-> 1133581144; }}
{{{}11:17:23,217 [INFO 

[jira] [Comment Edited] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory

2023-12-06 Thread Brad Schoening (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793935#comment-17793935
 ] 

Brad Schoening edited comment on CASSANDRA-18762 at 12/6/23 9:11 PM:
-

An update: we are still seeing this occur on a cluster. They have configured 
native_transport_max_thread = 256. A large number of repair Merkle trees 
precedes the OOM crash.

System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs

file_cache_size_in_mb =4096, G1HeapRegionSize=16M

!image-2023-12-06-15-58-55-007.png!

above graph is missing time ticks, but the spike occurs at 06:16:00

!image-2023-12-06-15-29-31-491.png!

 

Summary of the cassandra log:

{{11:17:10,289  [INFO ]  RepairSession.java:202 - [repair 
#838c24c0-935f-11ee-97ba-d79b6a12ccbe|#838c24c0-935f-11ee-97ba-d79b6a12ccbe] 
Received merkle tree for table1 from ...  [repeated 35 times]}}
{{11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 694ms.  G1 Eden Space: 
8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 
385875968 -> 0; }}
{{11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 505ms.  G1 Old Gen: 1133473904 
-> 1133526408; }}
{{11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1  
NoSpamLogger.java:92 - Some operations were slow, details available at debug 
level (debug.log)}}
{{11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 787ms.  G1 Eden Space: 16777216 
-> 0; G1 Old Gen: 1133526408 -> 1133545448; }}
{{11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:292 - G1 Old Generation GC in 4742ms.  G1 Old Gen: 1133545448 
-> 1133581144; }}
{{{}11:17:23,217 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
StatusLogger.java:65  [abbr]{}}}{{{}StatusLogger.java:65 - Pool Name        
               Active   Pending      Completed   Blocked  All Time Blocked{}}}
{{StatusLogger.java:69 - ReadStage                            1         0       
48261572         0                 0}}
{{StatusLogger.java:69 - Native-Transport-Requests            1         0      
395189663         0                 0}}
{{StatusLogger.java:69 - ValidationExecutor                   4        73       
  110086         0                 0}}
{{StatusLogger.java:69 - AntiEntropyStage                     1         0       
  352704         0                 0}}
{{11:17:24,114 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 853ms.  G1 Eden Space: 117440512 
-> 0; G1 Old Gen: 1133747360 -> 1133758448; }}
{{11:17:24,564 [ERROR] [Messaging-EventLoop-3-5] cluster_id=99 
ip_address=10.0.0.1  JVMStabilityInspector.java:102 - OutOfMemory error letting 
the JVM handle the error:}}
{{java.lang.OutOfMemoryError: Direct buffer memory}}
{{    at java.base/java.nio.Bits.reserveMemory(Bits.java:175)}}
{{    at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118)}}
{{    at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318)}}

    ... etc


was (Author: bschoeni):
An update: we are still seeing this occur on a cluster. They have configured 
native_transport_max_thread = 256. A large number of repair Merkle trees 
precedes the OOM crash.

System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs

file_cache_size_in_mb =4096, G1HeapRegionSize=16M

!image-2023-12-06-15-58-55-007.png!

above graph is missing time ticks, but the spike occurs at 06:16:00

!image-2023-12-06-15-29-31-491.png!

 

Summary of the cassandra log:

11:17:10,289  [INFO ]  RepairSession.java:202 - [repair 
#838c24c0-935f-11ee-97ba-d79b6a12ccbe|#838c24c0-935f-11ee-97ba-d79b6a12ccbe] 
Received merkle tree for table1 from ...  [repeated 35 times]
11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 694ms.  G1 Eden Space: 
8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 
385875968 -> 0; 
11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 505ms.  G1 Old Gen: 1133473904 
-> 1133526408; 
11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1  
NoSpamLogger.java:92 - Some operations were slow, details available at debug 
level (debug.log)
11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 787ms.  G1 Eden Space: 16777216 
-> 0; G1 Old Gen: 1133526408 -> 1133545448; 
11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:292 - G1 Old Generation GC in 4742ms.  G1 Old Gen: 1133545448 
-> 1133581144; 
11:17:23,217 [INFO ] [Service Thread] 

[jira] [Comment Edited] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory

2023-12-06 Thread Brad Schoening (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793935#comment-17793935
 ] 

Brad Schoening edited comment on CASSANDRA-18762 at 12/6/23 9:07 PM:
-

An update: we are still seeing this occur on a cluster. They have configured 
native_transport_max_thread = 256. A large number of repair Merkle trees 
precedes the OOM crash.

System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs

file_cache_size_in_mb =4096, G1HeapRegionSize=16M

!image-2023-12-06-15-58-55-007.png!

above graph is missing time ticks, but the spike occurs at 06:16:00

!image-2023-12-06-15-29-31-491.png!

 

Summary of the cassandra log:

11:17:10,289  [INFO ]  RepairSession.java:202 - [repair 
#838c24c0-935f-11ee-97ba-d79b6a12ccbe|#838c24c0-935f-11ee-97ba-d79b6a12ccbe] 
Received merkle tree for table1 from ...  [repeated 35 times]
11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 694ms.  G1 Eden Space: 
8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 
385875968 -> 0; 
11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 505ms.  G1 Old Gen: 1133473904 
-> 1133526408; 
11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1  
NoSpamLogger.java:92 - Some operations were slow, details available at debug 
level (debug.log)
11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 787ms.  G1 Eden Space: 16777216 
-> 0; G1 Old Gen: 1133526408 -> 1133545448; 
11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:292 - G1 Old Generation GC in 4742ms.  G1 Old Gen: 1133545448 
-> 1133581144; 
11:17:23,217 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
StatusLogger.java:65  [elided]
11:17:24,114 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 853ms.  G1 Eden Space: 117440512 
-> 0; G1 Old Gen: 1133747360 -> 1133758448; 
11:17:24,564 [ERROR] [Messaging-EventLoop-3-5] cluster_id=99 
ip_address=10.0.0.1  JVMStabilityInspector.java:102 - OutOfMemory error letting 
the JVM handle the error:
java.lang.OutOfMemoryError: Direct buffer memory
    at java.base/java.nio.Bits.reserveMemory(Bits.java:175)
    at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118)
    at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318)

    ... etc


was (Author: bschoeni):
An update: we are still seeing this occur on a cluster. They have configured 
native_transport_max_thread = 256. A large number of repair Merkle trees 
precedes the OOM crash.

System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs

file_cache_size_in_mb =4096, G1HeapRegionSize=16M

!image-2023-12-06-15-58-55-007.png!

above graph is missing time ticks, but the spike occurs at 06:16:00

!image-2023-12-06-15-29-31-491.png!

 

Summary of the cassandra log:

11:17:10,289  [INFO ]  RepairSession.java:202 - [repair 
#838c24c0-935f-11ee-97ba-d79b6a12ccbe] Received merkle tree for table1 from ... 
 [repeated 35 times]
11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 694ms.  G1 Eden Space: 
8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 
385875968 -> 0; 
11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java[repeated 35 times]:294 - G1 Old Generation GC in 505ms.  G1 
Old Gen: 1133473904 -> 1133526408; 
11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1  
NoSpamLogger.java:92 - Some operations were slow, details available at debug 
level (debug.log)
11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 787ms.  G1 Eden Space: 16777216 
-> 0; G1 Old Gen: 1133526408 -> 1133545448; 
11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:292 - G1 Old Generation GC in 4742ms.  G1 Old Gen: 1133545448 
-> 1133581144; 
11:17:23,217 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
StatusLogger.java:65  [elided]
11:17:24,114 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 853ms.  G1 Eden Space: 117440512 
-> 0; G1 Old Gen: 1133747360 -> 1133758448; 
11:17:24,564 [ERROR] [Messaging-EventLoop-3-5] cluster_id=99 
ip_address=10.0.0.1  JVMStabilityInspector.java:102 - OutOfMemory error letting 
the JVM handle the error:
java.lang.OutOfMemoryError: Direct buffer memory
    at java.base/java.nio.Bits.reserveMemory(Bits.java:175)
    at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118)
    at 

[jira] [Updated] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory

2023-12-06 Thread Brad Schoening (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brad Schoening updated CASSANDRA-18762:
---
Resolution: (was: Cannot Reproduce)
Status: Open  (was: Resolved)

> Repair triggers OOM with direct buffer memory
> -
>
> Key: CASSANDRA-18762
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18762
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Brad Schoening
>Priority: Normal
>  Labels: OutOfMemoryError
> Attachments: Cluster-dm-metrics-1.PNG, 
> image-2023-12-06-15-28-05-459.png, image-2023-12-06-15-29-31-491.png, 
> image-2023-12-06-15-58-55-007.png
>
>
> We are seeing repeated failures of nodes with 16GB of heap and the same size 
> (16GB) for direct memory (derived from -Xms).  This seems to be related to 
> CASSANDRA-15202 which moved merkel trees off-heap in 4.0.   Using Cassandra 
> 4.0.6.
> {noformat}
> 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from 
> /169.102.200.241:7000
> 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from 
> /169.93.192.29:7000
> 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from 
> /169.104.171.134:7000
> 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from 
> /169.79.232.67:7000
> 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 
> ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. 
> Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; 
> G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; 
> Metaspace: 80411136 -> 80176528
> 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error 
> letting the JVM handle the error:
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.base/java.nio.Bits.reserveMemory(Bits.java:175)
> at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118)
> at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318)
> at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742)
> at 
> org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780)
> at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751)
> at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720)
> at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698)
> at 
> org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416)
> at 
> org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100)
> at 
> org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84)
> at 
> org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782)
> at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642)
> at 
> org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364)
> at 
> org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317)
> at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504)
> at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is 
> done here{noformat}
>  
> -XX:+AlwaysPreTouch
> -XX:+CrashOnOutOfMemoryError
> -XX:+ExitOnOutOfMemoryError
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:+ParallelRefProcEnabled
> -XX:+PerfDisableSharedMem
> -XX:+ResizeTLAB
> -XX:+UseG1GC
> -XX:+UseNUMA
> -XX:+UseTLAB
> -XX:+UseThreadPriorities
> 

[jira] [Commented] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory

2023-12-06 Thread Brad Schoening (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793935#comment-17793935
 ] 

Brad Schoening commented on CASSANDRA-18762:


An update: we are still seeing this occur on a cluster. They have configured 
native_transport_max_thread = 256. A large number of repair Merkle trees 
precedes the OOM crash.

System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs

file_cache_size_in_mb =4096, G1HeapRegionSize=16M

!image-2023-12-06-15-58-55-007.png!

above graph is missing time ticks, but the spike occurs at 06:16:00

!image-2023-12-06-15-29-31-491.png!

 

Summary of the cassandra log:

11:17:10,289  [INFO ]  RepairSession.java:202 - [repair 
#838c24c0-935f-11ee-97ba-d79b6a12ccbe] Received merkle tree for table1 from ... 
 [repeated 35 times]
11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 694ms.  G1 Eden Space: 
8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 
385875968 -> 0; 
11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java[repeated 35 times]:294 - G1 Old Generation GC in 505ms.  G1 
Old Gen: 1133473904 -> 1133526408; 
11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1  
NoSpamLogger.java:92 - Some operations were slow, details available at debug 
level (debug.log)
11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 787ms.  G1 Eden Space: 16777216 
-> 0; G1 Old Gen: 1133526408 -> 1133545448; 
11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:292 - G1 Old Generation GC in 4742ms.  G1 Old Gen: 1133545448 
-> 1133581144; 
11:17:23,217 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
StatusLogger.java:65  [elided]
11:17:24,114 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 853ms.  G1 Eden Space: 117440512 
-> 0; G1 Old Gen: 1133747360 -> 1133758448; 
11:17:24,564 [ERROR] [Messaging-EventLoop-3-5] cluster_id=99 
ip_address=10.0.0.1  JVMStabilityInspector.java:102 - OutOfMemory error letting 
the JVM handle the error:
java.lang.OutOfMemoryError: Direct buffer memory
    at java.base/java.nio.Bits.reserveMemory(Bits.java:175)
    at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118)
    at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318)

    ... etc

> Repair triggers OOM with direct buffer memory
> -
>
> Key: CASSANDRA-18762
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18762
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Brad Schoening
>Priority: Normal
>  Labels: OutOfMemoryError
> Attachments: Cluster-dm-metrics-1.PNG, 
> image-2023-12-06-15-28-05-459.png, image-2023-12-06-15-29-31-491.png, 
> image-2023-12-06-15-58-55-007.png
>
>
> We are seeing repeated failures of nodes with 16GB of heap and the same size 
> (16GB) for direct memory (derived from -Xms).  This seems to be related to 
> CASSANDRA-15202 which moved merkel trees off-heap in 4.0.   Using Cassandra 
> 4.0.6.
> {noformat}
> 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from 
> /169.102.200.241:7000
> 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from 
> /169.93.192.29:7000
> 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from 
> /169.104.171.134:7000
> 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from 
> /169.79.232.67:7000
> 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 
> ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. 
> Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; 
> G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; 
> Metaspace: 80411136 -> 80176528
> 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error 
> letting the JVM handle the error:
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.base/java.nio.Bits.reserveMemory(Bits.java:175)
> at 

[jira] [Commented] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails with no internode encryption

2023-12-06 Thread Aldo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793934#comment-17793934
 ] 

Aldo commented on CASSANDRA-19178:
--

Thanks, I moved the question on StackExchange 
[here|https://dba.stackexchange.com/questions/333799/cassandra-cluster-upgrade-3-x-4-x-fails-with-internode-encryption-none].

> Cluster upgrade 3.x -> 4.x fails with no internode encryption
> -
>
> Key: CASSANDRA-19178
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19178
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Aldo
>Priority: Normal
> Attachments: cassandra7.downgrade.log, cassandra7.log
>
>
> I have a Docker swarm cluster with 3 distinct Cassandra services (named 
> {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different 
> servers. The 3 services are running the version 3.11.16, using the official 
> Cassandra image 3.11.16 on Docker Hub. The first service is configured just 
> with the following environment variables
> {code:java}
> CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7"
> CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code}
> which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance 
> the _cassandra.yaml_ for the first service contains the following (and the 
> rest is the image default):
> {code:java}
> # grep tasks /etc/cassandra/cassandra.yaml
>           - seeds: "tasks.cassandra7,tasks.cassandra9"
> listen_address: tasks.cassandra7
> broadcast_address: tasks.cassandra7
> broadcast_rpc_address: tasks.cassandra7 {code}
> Other services (8 and 9) have a similar configuration, obviously with a 
> different {{CASSANDRA_LISTEN_ADDRESS }}(\{{{}tasks.cassandra8}} and 
> {{{}tasks.cassandra9{}}}).
> The cluster is running smoothly and all the nodes are perfectly able to 
> rejoin the cluster whichever event occurs, thanks to the Docker Swarm 
> {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for 
> Docker swarm to restart it, force update it in order to force a restart, 
> scale to 0 and then 1 the service, restart an entire server, turn off and 
> then turn on all the 3 servers. Never found an issue on this.
> I also just completed a full upgrade of the cluster from version 2.2.8 to 
> 3.11.16 (simply upgrading the Docker official image associated with the 
> services) without issues. I was also able, thanks to a 2.2.8 snapshot on each 
> server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I 
> finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables 
> have now the {{me-*}} prefix.
>  
> The problem I'm facing right now is the upgrade from 3.11.16 to 4.x. The 
> procedure that I follow is very simple:
>  # I start from the _cassandra7_ service (which is a seed node)
>  # {{nodetool drain}}
>  # Wait for the {{DRAINING ... DRAINED}} messages to appear in the log
>  # Upgrade the Docker image of _cassandra7_ to the official 4.1.3 version
> The procedure is exactly the same I followed for the upgrade 2.2.8 --> 
> 3.11.16, obviously with a different version at step 4. Unfortunately the 
> upgrade 3.x --> 4.x is not working, the _cassandra7_ service restarts and 
> attempts to communicate with the other seed node ({_}cassandra9{_}) but the 
> log of _cassandra7_ shows the following:
> {code:java}
> INFO  [Messaging-EventLoop-3-3] 2023-12-06 17:15:04,727 
> OutboundConnectionInitiator.java:390 - Failed to connect to peer 
> tasks.cassandra9/10.0.2.196:7000(tasks.cassandra9/10.0.2.196:7000)
> io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: 
> Connection reset by peer{code}
> The relevant port of the log, related to the missing internode communication, 
> is attached in _cassandra7.log_
> In the log of _cassandra9_ there is nothing after the abovementioned step #4. 
> So only _cassandra7_ is saying something in the logs.
> I tried with multiple versions (4.0.11 but also 4.0.0) but the outcome is 
> always the same. Of course when I follow the steps 1..3, then restore the 3.x 
> snapshot and finally perform the step #4 using the official 3.11.16 version 
> the node 7 restarts correctly and joins the cluster. I attached the relevant 
> part of the log (see {_}cassandra7.downgrade.log{_}) where you can see that 
> node 7 and 9 can communicate.
> I suspect this could be related to the port 7000 now (with Cassandra 4.x) 
> supporting both encrypted and unencrypted traffic. As stated previously I'm 
> using the untouched official Cassandra images so all my cluster, inside the 
> Docker Swarm, is not (and has never been) configured with encryption.
> I can also add the following: if I perform the 4 above steps also for the 
> _cassandra9_ and _cassandra8_ services, in the end the cluster works. But 
> this 

[jira] [Updated] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory

2023-12-06 Thread Brad Schoening (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brad Schoening updated CASSANDRA-18762:
---
Attachment: image-2023-12-06-15-58-55-007.png

> Repair triggers OOM with direct buffer memory
> -
>
> Key: CASSANDRA-18762
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18762
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Brad Schoening
>Priority: Normal
>  Labels: OutOfMemoryError
> Attachments: Cluster-dm-metrics-1.PNG, 
> image-2023-12-06-15-28-05-459.png, image-2023-12-06-15-29-31-491.png, 
> image-2023-12-06-15-58-55-007.png
>
>
> We are seeing repeated failures of nodes with 16GB of heap and the same size 
> (16GB) for direct memory (derived from -Xms).  This seems to be related to 
> CASSANDRA-15202 which moved merkel trees off-heap in 4.0.   Using Cassandra 
> 4.0.6.
> {noformat}
> 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from 
> /169.102.200.241:7000
> 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from 
> /169.93.192.29:7000
> 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from 
> /169.104.171.134:7000
> 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from 
> /169.79.232.67:7000
> 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 
> ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. 
> Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; 
> G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; 
> Metaspace: 80411136 -> 80176528
> 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error 
> letting the JVM handle the error:
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.base/java.nio.Bits.reserveMemory(Bits.java:175)
> at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118)
> at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318)
> at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742)
> at 
> org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780)
> at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751)
> at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720)
> at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698)
> at 
> org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416)
> at 
> org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100)
> at 
> org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84)
> at 
> org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782)
> at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642)
> at 
> org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364)
> at 
> org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317)
> at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504)
> at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is 
> done here{noformat}
>  
> -XX:+AlwaysPreTouch
> -XX:+CrashOnOutOfMemoryError
> -XX:+ExitOnOutOfMemoryError
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:+ParallelRefProcEnabled
> -XX:+PerfDisableSharedMem
> -XX:+ResizeTLAB
> -XX:+UseG1GC
> -XX:+UseNUMA
> -XX:+UseTLAB
> -XX:+UseThreadPriorities
> -XX:-UseBiasedLocking
> 

[jira] [Updated] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory

2023-12-06 Thread Brad Schoening (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brad Schoening updated CASSANDRA-18762:
---
Attachment: image-2023-12-06-15-29-31-491.png

> Repair triggers OOM with direct buffer memory
> -
>
> Key: CASSANDRA-18762
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18762
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Brad Schoening
>Priority: Normal
>  Labels: OutOfMemoryError
> Attachments: Cluster-dm-metrics-1.PNG, 
> image-2023-12-06-15-28-05-459.png, image-2023-12-06-15-29-31-491.png
>
>
> We are seeing repeated failures of nodes with 16GB of heap and the same size 
> (16GB) for direct memory (derived from -Xms).  This seems to be related to 
> CASSANDRA-15202 which moved merkel trees off-heap in 4.0.   Using Cassandra 
> 4.0.6.
> {noformat}
> 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from 
> /169.102.200.241:7000
> 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from 
> /169.93.192.29:7000
> 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from 
> /169.104.171.134:7000
> 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from 
> /169.79.232.67:7000
> 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 
> ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. 
> Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; 
> G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; 
> Metaspace: 80411136 -> 80176528
> 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error 
> letting the JVM handle the error:
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.base/java.nio.Bits.reserveMemory(Bits.java:175)
> at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118)
> at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318)
> at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742)
> at 
> org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780)
> at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751)
> at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720)
> at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698)
> at 
> org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416)
> at 
> org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100)
> at 
> org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84)
> at 
> org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782)
> at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642)
> at 
> org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364)
> at 
> org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317)
> at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504)
> at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is 
> done here{noformat}
>  
> -XX:+AlwaysPreTouch
> -XX:+CrashOnOutOfMemoryError
> -XX:+ExitOnOutOfMemoryError
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:+ParallelRefProcEnabled
> -XX:+PerfDisableSharedMem
> -XX:+ResizeTLAB
> -XX:+UseG1GC
> -XX:+UseNUMA
> -XX:+UseTLAB
> -XX:+UseThreadPriorities
> -XX:-UseBiasedLocking
> 

[jira] [Updated] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory

2023-12-06 Thread Brad Schoening (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brad Schoening updated CASSANDRA-18762:
---
Attachment: image-2023-12-06-15-28-05-459.png

> Repair triggers OOM with direct buffer memory
> -
>
> Key: CASSANDRA-18762
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18762
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Brad Schoening
>Priority: Normal
>  Labels: OutOfMemoryError
> Attachments: Cluster-dm-metrics-1.PNG, 
> image-2023-12-06-15-28-05-459.png
>
>
> We are seeing repeated failures of nodes with 16GB of heap and the same size 
> (16GB) for direct memory (derived from -Xms).  This seems to be related to 
> CASSANDRA-15202 which moved merkel trees off-heap in 4.0.   Using Cassandra 
> 4.0.6.
> {noformat}
> 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from 
> /169.102.200.241:7000
> 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from 
> /169.93.192.29:7000
> 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from 
> /169.104.171.134:7000
> 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from 
> /169.79.232.67:7000
> 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 
> ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. 
> Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; 
> G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; 
> Metaspace: 80411136 -> 80176528
> 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error 
> letting the JVM handle the error:
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.base/java.nio.Bits.reserveMemory(Bits.java:175)
> at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118)
> at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318)
> at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742)
> at 
> org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780)
> at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751)
> at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720)
> at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698)
> at 
> org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416)
> at 
> org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100)
> at 
> org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84)
> at 
> org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782)
> at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642)
> at 
> org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364)
> at 
> org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317)
> at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504)
> at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is 
> done here{noformat}
>  
> -XX:+AlwaysPreTouch
> -XX:+CrashOnOutOfMemoryError
> -XX:+ExitOnOutOfMemoryError
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:+ParallelRefProcEnabled
> -XX:+PerfDisableSharedMem
> -XX:+ResizeTLAB
> -XX:+UseG1GC
> -XX:+UseNUMA
> -XX:+UseTLAB
> -XX:+UseThreadPriorities
> -XX:-UseBiasedLocking
> -XX:CompileCommandFile=/opt/nosql/clusters/cassandra-101/conf/hotspot_compiler
> 

[jira] [Updated] (CASSANDRA-19104) Standardize tablestats formatting and data units

2023-12-06 Thread Brad Schoening (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brad Schoening updated CASSANDRA-19104:
---
Description: 
Tablestats reports output in plaintext, JSON or YAML. The human readable output 
currently has a mix of KiB, bytes with inconsistent spacing

Simplify and default output to 'human readable'. Machine readable output is 
available as an option and the current mixed output formatting is neither 
friendly for human or machine reading and can be replaced.

!image-2023-11-27-13-49-14-247.png!

*Not a goal now (consider a follow up Jira):*

Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting:
 * gcstats - uses MB
 * getcompactionthroughput - uses MB/s
 * getstreamthroughput - uses MB/s
 * info - uses MiB/GiB

  was:
Tablestats reports output in plaintext, JSON or YAML. The human readable output 
currently has a mix of KiB, bytes with inconsistent spacing

Simplify and defaulting output to 'human readable'. Machine readable output is 
available as an option and the current mixed output formatting is neither 
friendly for human or machine reading and can be replaced.

!image-2023-11-27-13-49-14-247.png!

*Not a goal now (consider a follow up Jira):*

Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting:
 * gcstats - uses MB
 * getcompactionthroughput - uses MB/s
 * getstreamthroughput - uses MB/s
 * info - uses MiB/GiB


> Standardize tablestats formatting and data units
> 
>
> Key: CASSANDRA-19104
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19104
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/nodetool
>Reporter: Brad Schoening
>Assignee: Leo Toff
>Priority: Normal
>
> Tablestats reports output in plaintext, JSON or YAML. The human readable 
> output currently has a mix of KiB, bytes with inconsistent spacing
> Simplify and default output to 'human readable'. Machine readable output is 
> available as an option and the current mixed output formatting is neither 
> friendly for human or machine reading and can be replaced.
> !image-2023-11-27-13-49-14-247.png!
> *Not a goal now (consider a follow up Jira):*
> Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting:
>  * gcstats - uses MB
>  * getcompactionthroughput - uses MB/s
>  * getstreamthroughput - uses MB/s
>  * info - uses MiB/GiB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19104) Standardize tablestats formatting and data units

2023-12-06 Thread Brad Schoening (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793912#comment-17793912
 ] 

Brad Schoening commented on CASSANDRA-19104:


[~zaaath] this link explains the difference, so it just depends on how its 
calculated size/1000 (KB) or size/1024 (KiB).

[What exactly are the storage/memory Units 
KiB/MiB/GiB/TiB|https://community.hpe.com/t5/hpe-primera-storage/what-exactly-are-the-storage-memory-units-kib-mib-gib-tib-in/td-p/7123450]

> Standardize tablestats formatting and data units
> 
>
> Key: CASSANDRA-19104
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19104
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/nodetool
>Reporter: Brad Schoening
>Assignee: Leo Toff
>Priority: Normal
>
> Tablestats reports output in plaintext, JSON or YAML. The human readable 
> output currently has a mix of KiB, bytes with inconsistent spacing
> Simplify and defaulting output to 'human readable'. Machine readable output 
> is available as an option and the current mixed output formatting is neither 
> friendly for human or machine reading and can be replaced.
> !image-2023-11-27-13-49-14-247.png!
> *Not a goal now (consider a follow up Jira):*
> Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting:
>  * gcstats - uses MB
>  * getcompactionthroughput - uses MB/s
>  * getstreamthroughput - uses MB/s
>  * info - uses MiB/GiB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18857) Allow CQL client certificate authentication to work without sending an AUTHENTICATE request

2023-12-06 Thread Andy Tolbert (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793913#comment-17793913
 ] 

Andy Tolbert commented on CASSANDRA-18857:
--

Apologies for the late follow up here.  Realized that for this to fully work 
[CASSANDRA-18811] is needed.  I've created a [pull 
request|https://github.com/apache/cassandra/pull/2969] that I will update as 
soon as [CASSANDRA-18811] lands in trunk.

> Allow CQL client certificate authentication to work without sending an 
> AUTHENTICATE request
> ---
>
> Key: CASSANDRA-18857
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18857
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Encryption
>Reporter: Andy Tolbert
>Priority: Normal
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently when using {{MutualTlsAuthenticator}} or 
> {{MutualTlsWithPasswordFallbackAuthenticator}} a client is prompted with an 
> {{AUTHENTICATE}} message to which they must respond with an {{AUTH_RESPONSE}} 
> (e.g. a user name and password).  This shouldn't be needed as the role can be 
> identified using only the certificate.
> To address this, we could add the capability to authenticate early in 
> processing of a {{STARTUP}} message if we can determine that both the 
> configured authenticator supports certificate authentication and a client 
> certificate was provided.  If the certificate can be authenticated, a 
> {{READY}} response is returned, otherwise an {{ERROR}} is returned.
> This change can be done done in a fully backwards compatible way and requires 
> no protocol or driver changes;  I will supply a patch shortly!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19047) Guardrail for the number of tables is not working

2023-12-06 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793907#comment-17793907
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-19047 at 12/6/23 7:31 PM:
--

Apologize for the late response, I missed the latest notifications. 
{quote}Maybe we can simply remove the {{@Deprecated}} tag so 
{{YamlConfigurationLoader}} doesn't complain about it, and manually throw the 
deprecation warning if its value is anything different than the default and 
{{Integer#MAX_VALUE}}
{quote}
Only if the value is not set to the MAX_VALUE, I think. If it is set to the 
default - I think people should be warned. If they want to stop from emitting 
deprecation warning - then they should set it to MAX_VALUE and/or move to 
guardrails. This is a way to prepare them for the future as that property can 
be removed in the next major. We can add a section in the NEWS.txt to explain. 


was (Author: e.dimitrova):
Apologize for the late response, I missed the latest notifications. 
{quote}Maybe we can simply remove the {{@Deprecated}} tag so 
{{YamlConfigurationLoader}} doesn't complain about it, and manually throw the 
deprecation warning if its value is anything different than the default and 
{{Integer#MAX_VALUE}}
{quote}
Only if the value is not set to the MAX_VALUE, I think. If it is set to the 
default - I think people should be warned. If they want to stop from emitting 
it - then they should set it to MAX_VALUE and/or move to guardrails. This is a 
way to prepare them for the future as that property can be removed in the next 
major. We can add a section in the NEWS.txt to explain. 

> Guardrail for the number of tables is not working
> -
>
> Key: CASSANDRA-19047
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19047
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Mohammad Aburadeh
>Priority: Urgent
>
> Hi, 
> We installed Cassandra 4.1.3 and we got the following warning when creating 
> more than 150 tables: 
> {code:java}
> WARN  [Native-Transport-Requests-6] 2023-11-21 18:35:24,585 
> CreateTableStatement.java:421 - Cluster already contains 161 tables in 6 
> keyspaces. Having a large number of tables will significantly slow down 
> schema dependent cluster operations. {code}
> I tried to disable "table_count_warn_threshold" by setting its value to "-1" 
> but that did not work. 
> Then I tried to set the guardrail for number of tables to "-1" to disable the 
> above but did not work as well. It seems there is no way to disable checking 
> the number of tables. 
> Also,  I tried to set "tables_warn_threshold" to a value less than 
> "tables_count_warn_threshold", it seems Cassandra always uses 
> "tables_count_warn_threshold" when throwing the warning. 
> *Two issues in Cassandra 4.1.3:* 
> 1- There should be a way to disable this feature. Either by setting the 
> guardrail parameter to -1 or setting tables_count_warn_threshold to -1. 
> 2- The guardrail for number of tables should overwrite 
> tables_count_warn_threshold because I always get the following warning when I 
> try to increase the number of tables: 
> {code:java}
> WARN  [main] 2023-11-21 18:26:16,988 YamlConfigurationLoader.java:427 - 
> [keyspace_count_warn_threshold, table_count_warn_threshold] parameters have 
> been deprecated. They have new names and/or value format; For more 
> information, please refer to NEWS.txt {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19047) Guardrail for the number of tables is not working

2023-12-06 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793907#comment-17793907
 ] 

Ekaterina Dimitrova commented on CASSANDRA-19047:
-

Apologize for the late response, I missed the latest notifications. 
{quote}Maybe we can simply remove the {{@Deprecated}} tag so 
{{YamlConfigurationLoader}} doesn't complain about it, and manually throw the 
deprecation warning if its value is anything different than the default and 
{{Integer#MAX_VALUE}}
{quote}
Only if the value is not set to the MAX_VALUE, I think. If it is set to the 
default - I think people should be warned. If they want to stop from emitting 
it - then they should set it to MAX_VALUE and/or move to guardrails. This is a 
way to prepare them for the future as that property can be removed in the next 
major. We can add a section in the NEWS.txt to explain. 

> Guardrail for the number of tables is not working
> -
>
> Key: CASSANDRA-19047
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19047
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Mohammad Aburadeh
>Priority: Urgent
>
> Hi, 
> We installed Cassandra 4.1.3 and we got the following warning when creating 
> more than 150 tables: 
> {code:java}
> WARN  [Native-Transport-Requests-6] 2023-11-21 18:35:24,585 
> CreateTableStatement.java:421 - Cluster already contains 161 tables in 6 
> keyspaces. Having a large number of tables will significantly slow down 
> schema dependent cluster operations. {code}
> I tried to disable "table_count_warn_threshold" by setting its value to "-1" 
> but that did not work. 
> Then I tried to set the guardrail for number of tables to "-1" to disable the 
> above but did not work as well. It seems there is no way to disable checking 
> the number of tables. 
> Also,  I tried to set "tables_warn_threshold" to a value less than 
> "tables_count_warn_threshold", it seems Cassandra always uses 
> "tables_count_warn_threshold" when throwing the warning. 
> *Two issues in Cassandra 4.1.3:* 
> 1- There should be a way to disable this feature. Either by setting the 
> guardrail parameter to -1 or setting tables_count_warn_threshold to -1. 
> 2- The guardrail for number of tables should overwrite 
> tables_count_warn_threshold because I always get the following warning when I 
> try to increase the number of tables: 
> {code:java}
> WARN  [main] 2023-11-21 18:26:16,988 YamlConfigurationLoader.java:427 - 
> [keyspace_count_warn_threshold, table_count_warn_threshold] parameters have 
> been deprecated. They have new names and/or value format; For more 
> information, please refer to NEWS.txt {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19104) Standardize tablestats formatting and data units

2023-12-06 Thread Brad Schoening (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brad Schoening updated CASSANDRA-19104:
---
Description: 
Tablestats reports output in plaintext, JSON or YAML. The human readable output 
currently has a mix of KiB, bytes with inconsistent spacing

Simplify and defaulting output to 'human readable'. Machine readable output is 
available as an option and the current mixed output formatting is neither 
friendly for human or machine reading and can be replaced.

!image-2023-11-27-13-49-14-247.png!

*Not a goal now (consider a follow up Jira):*

Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting:
 * gcstats - uses MB
 * getcompactionthroughput - uses MB/s
 * getstreamthroughput - uses MB/s
 * info - uses MiB/GiB

  was:
Tablestats reports output in plaintext, JSON or YAML. The human readable output 
currently has a mix of KiB, bytes with inconsistent spacing

Considering simplifying and defaulting output to 'human readable'. Machine 
readable output is available as an option and the current mixed output 
formatting is neither friendly for human or machine reading.

!image-2023-11-27-13-49-14-247.png!

*Not a goal now (consider a follow up Jira):*

Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting:
 * gcstats - uses MB
 * getcompactionthroughput - uses MB/s
 * getstreamthroughput - uses MB/s
 * info - uses MiB/GiB


> Standardize tablestats formatting and data units
> 
>
> Key: CASSANDRA-19104
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19104
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/nodetool
>Reporter: Brad Schoening
>Assignee: Leo Toff
>Priority: Normal
>
> Tablestats reports output in plaintext, JSON or YAML. The human readable 
> output currently has a mix of KiB, bytes with inconsistent spacing
> Simplify and defaulting output to 'human readable'. Machine readable output 
> is available as an option and the current mixed output formatting is neither 
> friendly for human or machine reading and can be replaced.
> !image-2023-11-27-13-49-14-247.png!
> *Not a goal now (consider a follow up Jira):*
> Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting:
>  * gcstats - uses MB
>  * getcompactionthroughput - uses MB/s
>  * getstreamthroughput - uses MB/s
>  * info - uses MiB/GiB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19120) local consistencies may get timeout if blocking read repair is sending the read repair mutation to other DC

2023-12-06 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793903#comment-17793903
 ] 

Stefan Miklosovic commented on CASSANDRA-19120:
---

Feel free to fix that test I started, no problem with that. I had strong 
suspicion that I did not do it right anyway ... 

> local consistencies may get timeout if blocking read repair is sending the 
> read repair mutation to other DC 
> 
>
> Key: CASSANDRA-19120
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19120
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Runtian Liu
>Priority: Normal
> Attachments: image-2023-11-29-15-26-08-056.png, signature.asc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For a two DCs cluster setup. When a new node is being added to DC1, for 
> blocking read repair triggered by local_quorum in DC1, it will require to 
> send read repair mutation to an extra node(1)(2). The selector for read 
> repair may select *ANY* node that has not been contacted before(3) instead of 
> selecting the DC1 nodes. If a node from DC2 is selected, this will cause 100% 
> timeout because of the bug described below:
> When we initialized the latch(4) for blocking read repair, the shouldBlockOn 
> function will only return true for local nodes(5), the blockFor value will be 
> reduced if a local node doesn't require repair(6). The blockFor is same as 
> the number of read repair mutation sent out. But when the coordinator node 
> receives the response from the target nodes, the latch only count down for 
> nodes in same DC(7). The latch will wait till timeout and the read request 
> will timeout.
> This can be reproduced if you have a constant load on a 3 + 3 cluster when 
> adding a node. If you have someway to trigger blocking read repair(maybe by 
> adding load using stress tool). If you use local_quorum consistency with a 
> constant read after write load in the same DC that you are adding node. You 
> will see read timeout issue from time to time because of the bug described 
> above
>  
> I think for read repair when selecting the extra node to do repair, we should 
> prefer local nodes than the nodes from other region. Also, we need to fix the 
> latch part so even if we send mutation to the nodes in other DC, we don't get 
> a timeout.
> (1)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L455]
> (2)[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ConsistencyLevel.java#L183]
> (3)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L458]
> (4)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L96]
> (5)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L71]
> (6)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L88]
> (7)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L113]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19175) Test Failure: dtest.sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ca_mismatch

2023-12-06 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793898#comment-17793898
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-19175 at 12/6/23 7:10 PM:
--

{quote} if the machine ran out of space
{quote}
I saw more flakies from this one, but I am not sure all of them were when we 
ran out of space.

Examples from 4.1:

[https://ci-cassandra.apache.org/job/Cassandra-4.1/441/testReport/dtest.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/]

[https://ci-cassandra.apache.org/job/Cassandra-4.1/438/testReport/dtest-offheap.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/]

EDIT: I moved the ticket to 5.x and 4.1.x, instead of 5.1.beta as it is 
obviously not a regression at least

 


was (Author: e.dimitrova):
{quote} if the machine ran out of space
{quote}
I saw more flakies from this one, but I am not sure all of them were when we 
ran out of space.

Examples from 4.1:

[https://ci-cassandra.apache.org/job/Cassandra-4.1/441/testReport/dtest.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/]

[https://ci-cassandra.apache.org/job/Cassandra-4.1/438/testReport/dtest-offheap.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/]

 

> Test Failure: 
> dtest.sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ca_mismatch
> ---
>
> Key: CASSANDRA-19175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19175
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.1.x, 5.x
>
>
> [https://ci-cassandra.apache.org/job/Cassandra-trunk/1782/testReport/dtest.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/]
> h3.  
> {code:java}
> Error Message
> assert False
> Stacktrace
> self =  0x7fca5921d050> def test_ca_mismatch(self): """CA mismatch should cause nodes 
> to fail to connect""" credNode1 = sslkeygen.generate_credentials("127.0.0.1") 
> credNode2 = sslkeygen.generate_credentials("127.0.0.2") # mismatching CA! 
> self.setup_nodes(credNode1, credNode2) 
> self.fixture_dtest_setup.allow_log_errors = True 
> self.cluster.start(no_wait=True) found = self._grep_msg(self.node1, 
> _LOG_ERR_HANDSHAKE) self.cluster.stop() > assert found E assert False 
> sslnodetonode_test.py:115: AssertionError
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19175) Test Failure: dtest.sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ca_mismatch

2023-12-06 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-19175:

Fix Version/s: 5.x
   (was: 5.1-beta)

> Test Failure: 
> dtest.sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ca_mismatch
> ---
>
> Key: CASSANDRA-19175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19175
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.1.x, 5.x
>
>
> [https://ci-cassandra.apache.org/job/Cassandra-trunk/1782/testReport/dtest.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/]
> h3.  
> {code:java}
> Error Message
> assert False
> Stacktrace
> self =  0x7fca5921d050> def test_ca_mismatch(self): """CA mismatch should cause nodes 
> to fail to connect""" credNode1 = sslkeygen.generate_credentials("127.0.0.1") 
> credNode2 = sslkeygen.generate_credentials("127.0.0.2") # mismatching CA! 
> self.setup_nodes(credNode1, credNode2) 
> self.fixture_dtest_setup.allow_log_errors = True 
> self.cluster.start(no_wait=True) found = self._grep_msg(self.node1, 
> _LOG_ERR_HANDSHAKE) self.cluster.stop() > assert found E assert False 
> sslnodetonode_test.py:115: AssertionError
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19175) Test Failure: dtest.sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ca_mismatch

2023-12-06 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793898#comment-17793898
 ] 

Ekaterina Dimitrova commented on CASSANDRA-19175:
-

{quote} if the machine ran out of space
{quote}
I saw more flakies from this one, but I am not sure all of them were when we 
ran out of space.

Examples from 4.1:

[https://ci-cassandra.apache.org/job/Cassandra-4.1/441/testReport/dtest.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/]

[https://ci-cassandra.apache.org/job/Cassandra-4.1/438/testReport/dtest-offheap.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/]

 

> Test Failure: 
> dtest.sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ca_mismatch
> ---
>
> Key: CASSANDRA-19175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19175
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.1.x, 5.1-beta
>
>
> [https://ci-cassandra.apache.org/job/Cassandra-trunk/1782/testReport/dtest.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/]
> h3.  
> {code:java}
> Error Message
> assert False
> Stacktrace
> self =  0x7fca5921d050> def test_ca_mismatch(self): """CA mismatch should cause nodes 
> to fail to connect""" credNode1 = sslkeygen.generate_credentials("127.0.0.1") 
> credNode2 = sslkeygen.generate_credentials("127.0.0.2") # mismatching CA! 
> self.setup_nodes(credNode1, credNode2) 
> self.fixture_dtest_setup.allow_log_errors = True 
> self.cluster.start(no_wait=True) found = self._grep_msg(self.node1, 
> _LOG_ERR_HANDSHAKE) self.cluster.stop() > assert found E assert False 
> sslnodetonode_test.py:115: AssertionError
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19175) Test Failure: dtest.sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ca_mismatch

2023-12-06 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-19175:

Fix Version/s: 4.1.x

> Test Failure: 
> dtest.sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ca_mismatch
> ---
>
> Key: CASSANDRA-19175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19175
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.1.x, 5.1-beta
>
>
> [https://ci-cassandra.apache.org/job/Cassandra-trunk/1782/testReport/dtest.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/]
> h3.  
> {code:java}
> Error Message
> assert False
> Stacktrace
> self =  0x7fca5921d050> def test_ca_mismatch(self): """CA mismatch should cause nodes 
> to fail to connect""" credNode1 = sslkeygen.generate_credentials("127.0.0.1") 
> credNode2 = sslkeygen.generate_credentials("127.0.0.2") # mismatching CA! 
> self.setup_nodes(credNode1, credNode2) 
> self.fixture_dtest_setup.allow_log_errors = True 
> self.cluster.start(no_wait=True) found = self._grep_msg(self.node1, 
> _LOG_ERR_HANDSHAKE) self.cluster.stop() > assert found E assert False 
> sslnodetonode_test.py:115: AssertionError
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19104) Standardize tablestats formatting and data units

2023-12-06 Thread Brad Schoening (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793879#comment-17793879
 ] 

Brad Schoening edited comment on CASSANDRA-19104 at 12/6/23 6:47 PM:
-

Ok, I suppose for storage KiB might make sense as the lowest reportable units. 
Jacek's example is ambiguous, but I'd say with a leading zero before the 
decimal point, there should be three significant digits afterwards, so 0.000 
KiB (which is the maximum resolution available since bytes are unitary)

 
{noformat}
Example
Bytes repaired: 0.00 KiB
Bytes unrepaired: 4.31 TiB
Bytes pending repair: 0.000 KiB
{noformat}
>From Wikipedia [Significant 
>Figures|https://en.wikipedia.org/wiki/Significant_figures] 
{quote}[Leading zeros|https://en.wikipedia.org/wiki/Leading_zero]. For 
instance, 013 kg has two significant figures—1 and 3—while the leading zero is 
insignificant since it does not impact the mass indication; 013 kg is 
equivalent to 13 kg, rendering the zero unnecessary. Similarly, in the case of 
0.056 m, there are two insignificant leading zeros since 0.056 m is the same as 
56 mm, thus the leading zeros do not contribute to the length indication.
{quote}
For storage units in bytes, we should probably use 0.001 KiB (one byte) and 
0.000 KiB (zero bytes), 0.01 KiB (10 bytes)


was (Author: bschoeni):
Ok, I suppose for storage KiB might make sense as the lowest reportable units. 
Jacek's example is ambiguous, but I'd say with a leading zero before the 
decimal point, there should be three significant digits afterwards, so 0.000 
KiB (which is the maximum resolution available since bytes are unitary)

 
{noformat}
Example
Bytes repaired: 0.00 KiB
Bytes unrepaired: 4.31 TiB
Bytes pending repair: 0.000 KiB
{noformat}
>From Wikipedia [Significant 
>Figures|https://en.wikipedia.org/wiki/Significant_figures] 
{quote}[Leading zeros|https://en.wikipedia.org/wiki/Leading_zero]. For 
instance, 013 kg has two significant figures—1 and 3—while the leading zero is 
insignificant since it does not impact the mass indication; 013 kg is 
equivalent to 13 kg, rendering the zero unnecessary. Similarly, in the case of 
0.056 m, there are two insignificant leading zeros since 0.056 m is the same as 
56 mm, thus the leading zeros do not contribute to the length indication.
{quote}
For storage and bytes, we should probably use 0.001 KiB (one byte) and 0.000 
KiB (zero bytes), 0.01 KiB (10 bytes)

> Standardize tablestats formatting and data units
> 
>
> Key: CASSANDRA-19104
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19104
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/nodetool
>Reporter: Brad Schoening
>Assignee: Leo Toff
>Priority: Normal
>
> Tablestats reports output in plaintext, JSON or YAML. The human readable 
> output currently has a mix of KiB, bytes with inconsistent spacing
> Considering simplifying and defaulting output to 'human readable'. Machine 
> readable output is available as an option and the current mixed output 
> formatting is neither friendly for human or machine reading.
> !image-2023-11-27-13-49-14-247.png!
> *Not a goal now (consider a follow up Jira):*
> Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting:
>  * gcstats - uses MB
>  * getcompactionthroughput - uses MB/s
>  * getstreamthroughput - uses MB/s
>  * info - uses MiB/GiB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19104) Standardize tablestats formatting and data units

2023-12-06 Thread Brad Schoening (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793879#comment-17793879
 ] 

Brad Schoening edited comment on CASSANDRA-19104 at 12/6/23 6:46 PM:
-

Ok, I suppose for storage KiB might make sense as the lowest reportable units. 
Jacek's example is ambiguous, but I'd say with a leading zero before the 
decimal point, there should be three significant digits afterwards, so 0.000 
KiB (which is the maximum resolution available since bytes are unitary)

 
{noformat}
Example
Bytes repaired: 0.00 KiB
Bytes unrepaired: 4.31 TiB
Bytes pending repair: 0.000 KiB
{noformat}
>From Wikipedia [Significant 
>Figures|https://en.wikipedia.org/wiki/Significant_figures] 
{quote}[Leading zeros|https://en.wikipedia.org/wiki/Leading_zero]. For 
instance, 013 kg has two significant figures—1 and 3—while the leading zero is 
insignificant since it does not impact the mass indication; 013 kg is 
equivalent to 13 kg, rendering the zero unnecessary. Similarly, in the case of 
0.056 m, there are two insignificant leading zeros since 0.056 m is the same as 
56 mm, thus the leading zeros do not contribute to the length indication.
{quote}
For storage and bytes, we should probably use 0.001 KiB (one byte) and 0.000 
KiB (zero bytes), 0.01 KiB (10 bytes)


was (Author: bschoeni):
Ok, I suppose for storage KiB might make sense as the lowest reportable units. 
Jacek's example is ambiguous, but I'd say with a leading zero before the 
decimal point, there should be three significant digits afterwards, so 0.000 
KiB (which is the maximum resolution available since bytes are unitary)

 
{noformat}
Example
Bytes repaired: 0.00 KiB
Bytes unrepaired: 4.31 TiB
Bytes pending repair: 0.000 KiB
{noformat}
>From Wikipedia [Significant 
>Figures|https://en.wikipedia.org/wiki/Significant_figures] 
{quote}[Leading zeros|https://en.wikipedia.org/wiki/Leading_zero]. For 
instance, 013 kg has two significant figures—1 and 3—while the leading zero is 
insignificant since it does not impact the mass indication; 013 kg is 
equivalent to 13 kg, rendering the zero unnecessary. Similarly, in the case of 
0.056 m, there are two insignificant leading zeros since 0.056 m is the same as 
56 mm, thus the leading zeros do not contribute to the length indication.
{quote}
 

> Standardize tablestats formatting and data units
> 
>
> Key: CASSANDRA-19104
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19104
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/nodetool
>Reporter: Brad Schoening
>Assignee: Leo Toff
>Priority: Normal
>
> Tablestats reports output in plaintext, JSON or YAML. The human readable 
> output currently has a mix of KiB, bytes with inconsistent spacing
> Considering simplifying and defaulting output to 'human readable'. Machine 
> readable output is available as an option and the current mixed output 
> formatting is neither friendly for human or machine reading.
> !image-2023-11-27-13-49-14-247.png!
> *Not a goal now (consider a follow up Jira):*
> Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting:
>  * gcstats - uses MB
>  * getcompactionthroughput - uses MB/s
>  * getstreamthroughput - uses MB/s
>  * info - uses MiB/GiB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19104) Standardize tablestats formatting and data units

2023-12-06 Thread Brad Schoening (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793879#comment-17793879
 ] 

Brad Schoening commented on CASSANDRA-19104:


Ok, I suppose for storage KiB might make sense as the lowest reportable units. 
Jacek's example is ambiguous, but I'd say with a leading zero before the 
decimal point, there should be three significant digits afterwards, so 0.000 KiB

 
{noformat}
Example
Bytes repaired: 0.00 KiB
Bytes unrepaired: 4.31 TiB
Bytes pending repair: 0.000 KiB
{noformat}
>From Wikipedia [Significant 
>Figures|https://en.wikipedia.org/wiki/Significant_figures] 
{quote}[Leading zeros|https://en.wikipedia.org/wiki/Leading_zero]. For 
instance, 013 kg has two significant figures—1 and 3—while the leading zero is 
insignificant since it does not impact the mass indication; 013 kg is 
equivalent to 13 kg, rendering the zero unnecessary. Similarly, in the case of 
0.056 m, there are two insignificant leading zeros since 0.056 m is the same as 
56 mm, thus the leading zeros do not contribute to the length indication.{quote}
 

> Standardize tablestats formatting and data units
> 
>
> Key: CASSANDRA-19104
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19104
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/nodetool
>Reporter: Brad Schoening
>Assignee: Leo Toff
>Priority: Normal
>
> Tablestats reports output in plaintext, JSON or YAML. The human readable 
> output currently has a mix of KiB, bytes with inconsistent spacing
> Considering simplifying and defaulting output to 'human readable'. Machine 
> readable output is available as an option and the current mixed output 
> formatting is neither friendly for human or machine reading.
> !image-2023-11-27-13-49-14-247.png!
> *Not a goal now (consider a follow up Jira):*
> Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting:
>  * gcstats - uses MB
>  * getcompactionthroughput - uses MB/s
>  * getstreamthroughput - uses MB/s
>  * info - uses MiB/GiB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19104) Standardize tablestats formatting and data units

2023-12-06 Thread Brad Schoening (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793879#comment-17793879
 ] 

Brad Schoening edited comment on CASSANDRA-19104 at 12/6/23 6:42 PM:
-

Ok, I suppose for storage KiB might make sense as the lowest reportable units. 
Jacek's example is ambiguous, but I'd say with a leading zero before the 
decimal point, there should be three significant digits afterwards, so 0.000 
KiB (which is the maximum resolution available since bytes are unitary)

 
{noformat}
Example
Bytes repaired: 0.00 KiB
Bytes unrepaired: 4.31 TiB
Bytes pending repair: 0.000 KiB
{noformat}
>From Wikipedia [Significant 
>Figures|https://en.wikipedia.org/wiki/Significant_figures] 
{quote}[Leading zeros|https://en.wikipedia.org/wiki/Leading_zero]. For 
instance, 013 kg has two significant figures—1 and 3—while the leading zero is 
insignificant since it does not impact the mass indication; 013 kg is 
equivalent to 13 kg, rendering the zero unnecessary. Similarly, in the case of 
0.056 m, there are two insignificant leading zeros since 0.056 m is the same as 
56 mm, thus the leading zeros do not contribute to the length indication.
{quote}
 


was (Author: bschoeni):
Ok, I suppose for storage KiB might make sense as the lowest reportable units. 
Jacek's example is ambiguous, but I'd say with a leading zero before the 
decimal point, there should be three significant digits afterwards, so 0.000 KiB

 
{noformat}
Example
Bytes repaired: 0.00 KiB
Bytes unrepaired: 4.31 TiB
Bytes pending repair: 0.000 KiB
{noformat}
>From Wikipedia [Significant 
>Figures|https://en.wikipedia.org/wiki/Significant_figures] 
{quote}[Leading zeros|https://en.wikipedia.org/wiki/Leading_zero]. For 
instance, 013 kg has two significant figures—1 and 3—while the leading zero is 
insignificant since it does not impact the mass indication; 013 kg is 
equivalent to 13 kg, rendering the zero unnecessary. Similarly, in the case of 
0.056 m, there are two insignificant leading zeros since 0.056 m is the same as 
56 mm, thus the leading zeros do not contribute to the length indication.{quote}
 

> Standardize tablestats formatting and data units
> 
>
> Key: CASSANDRA-19104
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19104
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/nodetool
>Reporter: Brad Schoening
>Assignee: Leo Toff
>Priority: Normal
>
> Tablestats reports output in plaintext, JSON or YAML. The human readable 
> output currently has a mix of KiB, bytes with inconsistent spacing
> Considering simplifying and defaulting output to 'human readable'. Machine 
> readable output is available as an option and the current mixed output 
> formatting is neither friendly for human or machine reading.
> !image-2023-11-27-13-49-14-247.png!
> *Not a goal now (consider a follow up Jira):*
> Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting:
>  * gcstats - uses MB
>  * getcompactionthroughput - uses MB/s
>  * getstreamthroughput - uses MB/s
>  * info - uses MiB/GiB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] Cassandra 18852: Make bulk writer resilient to cluster resize events [cassandra-analytics]

2023-12-06 Thread via GitHub


arjunashok commented on code in PR #17:
URL: 
https://github.com/apache/cassandra-analytics/pull/17#discussion_r1417772899


##
cassandra-analytics-integration-framework/src/main/java/org/apache/cassandra/testing/CassandraIntegrationTest.java:
##
@@ -59,6 +59,13 @@
  */
 int numDcs() default 1;
 
+/**
+ * This is only applied in context of multi-DC tests. Returns true if the 
keyspace is replicated
+ * across multiple DCs. Defaults to {@code true}
+ * @return whether the multi-DC test uses a cross-DC keyspace
+ */
+boolean useCrossDcKeyspace() default true;

Review Comment:
   Agreed. We can revisit this when we are refactoring these helpers to make 
them less error prone (minimize the no. params being passed). 



##
cassandra-analytics-integration-tests/src/test/java/org/apache/cassandra/analytics/expansion/JoiningBaseTest.java:
##
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.analytics.expansion;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.function.BiConsumer;
+
+import com.google.common.util.concurrent.Uninterruptibles;
+
+import com.datastax.driver.core.ConsistencyLevel;
+import o.a.c.analytics.sidecar.shaded.testing.common.data.QualifiedTableName;
+import org.apache.cassandra.analytics.ResiliencyTestBase;
+import org.apache.cassandra.analytics.TestTokenSupplier;
+import org.apache.cassandra.distributed.UpgradeableCluster;
+import org.apache.cassandra.distributed.api.Feature;
+import org.apache.cassandra.distributed.api.IUpgradeableInstance;
+import org.apache.cassandra.distributed.api.TokenSupplier;
+import org.apache.cassandra.distributed.shared.ClusterUtils;
+import org.apache.cassandra.testing.CassandraIntegrationTest;
+import org.apache.cassandra.testing.ConfigurableCassandraTestContext;
+
+import static junit.framework.TestCase.assertNotNull;
+import static org.assertj.core.api.Assertions.assertThat;
+
+public class JoiningBaseTest extends ResiliencyTestBase

Review Comment:
   In these scenario specific base classes, we're only grouping common 
functionality instead of defining a contract for subclasses to implement, so 
this seemed appropriate.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] Cassandra 18852: Make bulk writer resilient to cluster resize events [cassandra-analytics]

2023-12-06 Thread via GitHub


arjunashok commented on code in PR #17:
URL: 
https://github.com/apache/cassandra-analytics/pull/17#discussion_r1417772294


##
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/RecordWriter.java:
##
@@ -132,37 +133,32 @@ public List 
write(Iterator> sourceI
 Map valueMap = new HashMap<>();
 try
 {
-List exclusions = 
failureHandler.getFailedInstances();
 Set> newRanges = 
initialTokenRangeMapping.getRangeMap().asMapOfRanges().entrySet()

.stream()
-   
.filter(e -> !exclusions.contains(e.getValue()))

.map(Map.Entry::getKey)

.collect(Collectors.toSet());
+Range tokenRange = getTokenRange(taskContext);
+Set> subRanges = newRanges.contains(tokenRange) ?
+   
Collections.singleton(tokenRange) :
+   
getIntersectingSubRanges(newRanges, tokenRange);
 
 while (dataIterator.hasNext())
 {
 Tuple2 rowData = dataIterator.next();
-streamSession = maybeCreateStreamSession(taskContext, 
streamSession, rowData, newRanges, failureHandler);
-
-sessions.add(streamSession);
+streamSession = maybeCreateStreamSession(taskContext, 
streamSession, rowData, subRanges, failureHandler, results);
 maybeCreateTableWriter(partitionId, baseDir);
 writeRow(rowData, valueMap, partitionId, 
streamSession.getTokenRange());
 checkBatchSize(streamSession, partitionId, job);
 }
 
-// Finalize SSTable for the last StreamSession
-if (sstableWriter != null || (streamSession != null && batchSize 
!= 0))
+// Cleanup SSTable writer and schedule the last stream

Review Comment:
   Makes sense.



##
cassandra-analytics-core/src/test/java/org/apache/cassandra/spark/bulkwriter/RecordWriterTest.java:
##
@@ -346,19 +366,22 @@ void writeBuffered()
 
 private void validateSuccessfulWrite(MockBulkWriterContext writerContext,
  Iterator> data,
- String[] columnNames)
+ String[] columnNames) throws 
InterruptedException
 {
 validateSuccessfulWrite(writerContext, data, columnNames, 
UPLOADED_TABLES);
 }
 
 private void validateSuccessfulWrite(MockBulkWriterContext writerContext,
  Iterator> data,
  String[] columnNames,
- int uploadedTables)
+ int uploadedTables) throws 
InterruptedException
 {
 RecordWriter rw = new RecordWriter(writerContext, columnNames, () -> 
tc, SSTableWriter::new);
 rw.write(data);
+// Wait for uploads to finish
+Thread.sleep(500);

Review Comment:
   In general, I agree with the flakiness introduced by sleep. This was added 
because when the entire test suite was executed, we did see the uploads not 
finishing before we look up the no. files that we uploaded. We could 
potentially use a latch in the `MockBulkWriterContext` to make this more 
deterministic. Will explore some more.



##
cassandra-analytics-integration-framework/src/main/java/org/apache/cassandra/sidecar/testing/IntegrationTestModule.java:
##
@@ -90,16 +88,25 @@ public InstanceMetadata instanceFromId(int id) throws 
NoSuchElementException
  * @return instance meta information
  * @throws NoSuchElementException when the instance for {@code host} 
does not exist
  */
+@Override
 public InstanceMetadata instanceFromHost(String host) throws 
NoSuchElementException
 {
-return cassandraTestContext.instancesConfig.instanceFromHost(host);
+return 
cassandraTestContext.instancesConfig().instanceFromHost(host);
 }
 }
 
 @Provides
 @Singleton
 public SidecarConfiguration sidecarConfiguration()
 {
-return new SidecarConfigurationImpl(new 
ServiceConfigurationImpl("127.0.0.1"));
+ServiceConfiguration conf = ServiceConfigurationImpl.builder()
+.host("0.0.0.0") 
// binds to all interfaces, potential security issue if left running for long

Review Comment:
   Will defer this one to @JeetKunDoug.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, 

[jira] [Updated] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails with no internode encryption

2023-12-06 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19178:
-
Resolution: Invalid
Status: Resolved  (was: Triage Needed)

I don't see any debug logs here, examining the one on the other side of the 
'Connection reset by peer' may reveal something.

bq. Any idea on how to further investigate the issue?

This jira is for the development of Apache Cassandra and as such, makes for a 
poor vehicle for support.  We recommend contacting the community via slack or 
the ML instead: https://cassandra.apache.org/_/community.html  If in the end 
you discover a bug then please come back and file an actionable report here.

> Cluster upgrade 3.x -> 4.x fails with no internode encryption
> -
>
> Key: CASSANDRA-19178
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19178
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Aldo
>Priority: Normal
> Attachments: cassandra7.downgrade.log, cassandra7.log
>
>
> I have a Docker swarm cluster with 3 distinct Cassandra services (named 
> {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different 
> servers. The 3 services are running the version 3.11.16, using the official 
> Cassandra image 3.11.16 on Docker Hub. The first service is configured just 
> with the following environment variables
> {code:java}
> CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7"
> CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code}
> which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance 
> the _cassandra.yaml_ for the first service contains the following (and the 
> rest is the image default):
> {code:java}
> # grep tasks /etc/cassandra/cassandra.yaml
>           - seeds: "tasks.cassandra7,tasks.cassandra9"
> listen_address: tasks.cassandra7
> broadcast_address: tasks.cassandra7
> broadcast_rpc_address: tasks.cassandra7 {code}
> Other services (8 and 9) have a similar configuration, obviously with a 
> different {{CASSANDRA_LISTEN_ADDRESS }}(\{{{}tasks.cassandra8}} and 
> {{{}tasks.cassandra9{}}}).
> The cluster is running smoothly and all the nodes are perfectly able to 
> rejoin the cluster whichever event occurs, thanks to the Docker Swarm 
> {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for 
> Docker swarm to restart it, force update it in order to force a restart, 
> scale to 0 and then 1 the service, restart an entire server, turn off and 
> then turn on all the 3 servers. Never found an issue on this.
> I also just completed a full upgrade of the cluster from version 2.2.8 to 
> 3.11.16 (simply upgrading the Docker official image associated with the 
> services) without issues. I was also able, thanks to a 2.2.8 snapshot on each 
> server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I 
> finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables 
> have now the {{me-*}} prefix.
>  
> The problem I'm facing right now is the upgrade from 3.11.16 to 4.x. The 
> procedure that I follow is very simple:
>  # I start from the _cassandra7_ service (which is a seed node)
>  # {{nodetool drain}}
>  # Wait for the {{DRAINING ... DRAINED}} messages to appear in the log
>  # Upgrade the Docker image of _cassandra7_ to the official 4.1.3 version
> The procedure is exactly the same I followed for the upgrade 2.2.8 --> 
> 3.11.16, obviously with a different version at step 4. Unfortunately the 
> upgrade 3.x --> 4.x is not working, the _cassandra7_ service restarts and 
> attempts to communicate with the other seed node ({_}cassandra9{_}) but the 
> log of _cassandra7_ shows the following:
> {code:java}
> INFO  [Messaging-EventLoop-3-3] 2023-12-06 17:15:04,727 
> OutboundConnectionInitiator.java:390 - Failed to connect to peer 
> tasks.cassandra9/10.0.2.196:7000(tasks.cassandra9/10.0.2.196:7000)
> io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: 
> Connection reset by peer{code}
> The relevant port of the log, related to the missing internode communication, 
> is attached in _cassandra7.log_
> In the log of _cassandra9_ there is nothing after the abovementioned step #4. 
> So only _cassandra7_ is saying something in the logs.
> I tried with multiple versions (4.0.11 but also 4.0.0) but the outcome is 
> always the same. Of course when I follow the steps 1..3, then restore the 3.x 
> snapshot and finally perform the step #4 using the official 3.11.16 version 
> the node 7 restarts correctly and joins the cluster. I attached the relevant 
> part of the log (see {_}cassandra7.downgrade.log{_}) where you can see that 
> node 7 and 9 can communicate.
> I suspect this could be related to the port 7000 now (with Cassandra 4.x) 
> supporting 

[jira] [Updated] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails with no internode encryption

2023-12-06 Thread Aldo (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aldo updated CASSANDRA-19178:
-
Description: 
I have a Docker swarm cluster with 3 distinct Cassandra services (named 
{_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different 
servers. The 3 services are running the version 3.11.16, using the official 
Cassandra image 3.11.16 on Docker Hub. The first service is configured just 
with the following environment variables
{code:java}
CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7"
CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code}
which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance 
the _cassandra.yaml_ for the first service contains the following (and the rest 
is the image default):
{code:java}
# grep tasks /etc/cassandra/cassandra.yaml
          - seeds: "tasks.cassandra7,tasks.cassandra9"
listen_address: tasks.cassandra7
broadcast_address: tasks.cassandra7
broadcast_rpc_address: tasks.cassandra7 {code}
Other services (8 and 9) have a similar configuration, obviously with a 
different {{CASSANDRA_LISTEN_ADDRESS }}(\{{{}tasks.cassandra8}} and 
{{{}tasks.cassandra9{}}}).

The cluster is running smoothly and all the nodes are perfectly able to rejoin 
the cluster whichever event occurs, thanks to the Docker Swarm 
{{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for 
Docker swarm to restart it, force update it in order to force a restart, scale 
to 0 and then 1 the service, restart an entire server, turn off and then turn 
on all the 3 servers. Never found an issue on this.

I also just completed a full upgrade of the cluster from version 2.2.8 to 
3.11.16 (simply upgrading the Docker official image associated with the 
services) without issues. I was also able, thanks to a 2.2.8 snapshot on each 
server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I 
finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables have 
now the {{me-*}} prefix.

 

The problem I'm facing right now is the upgrade from 3.11.16 to 4.x. The 
procedure that I follow is very simple:
 # I start from the _cassandra7_ service (which is a seed node)
 # {{nodetool drain}}
 # Wait for the {{DRAINING ... DRAINED}} messages to appear in the log
 # Upgrade the Docker image of _cassandra7_ to the official 4.1.3 version

The procedure is exactly the same I followed for the upgrade 2.2.8 --> 3.11.16, 
obviously with a different version at step 4. Unfortunately the upgrade 3.x --> 
4.x is not working, the _cassandra7_ service restarts and attempts to 
communicate with the other seed node ({_}cassandra9{_}) but the log of 
_cassandra7_ shows the following:
{code:java}
INFO  [Messaging-EventLoop-3-3] 2023-12-06 17:15:04,727 
OutboundConnectionInitiator.java:390 - Failed to connect to peer 
tasks.cassandra9/10.0.2.196:7000(tasks.cassandra9/10.0.2.196:7000)
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: 
Connection reset by peer{code}
The relevant port of the log, related to the missing internode communication, 
is attached in _cassandra7.log_

In the log of _cassandra9_ there is nothing after the abovementioned step #4. 
So only _cassandra7_ is saying something in the logs.

I tried with multiple versions (4.0.11 but also 4.0.0) but the outcome is 
always the same. Of course when I follow the steps 1..3, then restore the 3.x 
snapshot and finally perform the step #4 using the official 3.11.16 version the 
node 7 restarts correctly and joins the cluster. I attached the relevant part 
of the log (see {_}cassandra7.downgrade.log{_}) where you can see that node 7 
and 9 can communicate.

I suspect this could be related to the port 7000 now (with Cassandra 4.x) 
supporting both encrypted and unencrypted traffic. As stated previously I'm 
using the untouched official Cassandra images so all my cluster, inside the 
Docker Swarm, is not (and has never been) configured with encryption.

I can also add the following: if I perform the 4 above steps also for the 
_cassandra9_ and _cassandra8_ services, in the end the cluster works. But this 
is not acceptable, because the cluster is unavailable until I finish the full 
upgrade of all nodes: I need to perform a step-update, one node after the 
other, where only 1 node is temporarily down and the other N-1 stay up.

Any idea on how to further investigate the issue? Thanks

 

  was:
I have a Docker swarm cluster with 3 distinct Cassandra services (named 
{_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different 
servers. The 3 services are running the version 3.11.16, using the official 
Cassandra image 3.11.16 on Docker Hub. The first service is configured just 
with the following environment variables
{code:java}
CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7"
CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code}
which in turn, at startup, modifies 

Re: [PR] CASSANDRA-18969: source files missing from sources jars due to maven … [cassandra-java-driver]

2023-12-06 Thread via GitHub


hhughes commented on PR #1900:
URL: 
https://github.com/apache/cassandra-java-driver/pull/1900#issuecomment-1843370589

   @michaelsembwever done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails with no internode encryption

2023-12-06 Thread Aldo (Jira)
Aldo created CASSANDRA-19178:


 Summary: Cluster upgrade 3.x -> 4.x fails with no internode 
encryption
 Key: CASSANDRA-19178
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19178
 Project: Cassandra
  Issue Type: Bug
  Components: Cluster/Gossip
Reporter: Aldo
 Attachments: cassandra7.downgrade.log, cassandra7.log

I have a Docker swarm cluster with 3 distinct Cassandra services (named 
{_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different 
servers. The 3 services are running the version 3.11.16, using the official 
Cassandra image 3.11.16 on Docker Hub. The first service is configured just 
with the following environment variables
{code:java}
CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7"
CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code}
which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance 
the _cassandra.yaml_ for the first service contains the following (and the rest 
is the image default):
{code:java}
# grep tasks /etc/cassandra/cassandra.yaml
          - seeds: "tasks.cassandra7,tasks.cassandra9"
listen_address: tasks.cassandra7
broadcast_address: tasks.cassandra7
broadcast_rpc_address: tasks.cassandra7 {code}
Other services (8 and 9) have a similar configuration, obviously with a 
different {{CASSANDRA_LISTEN_ADDRESS }}({{{}tasks.cassandra8{}}} and 
{{{}tasks.cassandra9{}}}).

The cluster is running smoothly and all the nodes are perfectly able to rejoin 
the cluster whichever event occurs, thanks to the Docker Swarm 
{{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for 
Docker swarm to restart it, force update it in order to force a restart, scale 
to 0 and then 1 the service, restart an entire server, turn off and then turn 
on all the 3 servers. Never found an issue on this.

I also just completed a full upgrade of the cluster from version 2.2.8 to 
3.11.16 (simply upgrading the Docker official image associated with the 
services) without issues. I was also able, thanks to a 2.2.8 snapshot on each 
server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I 
finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables have 
now the {{me-*}} prefix.

 

The problem I'm facing right now is the upgrade from 3.11.16 to 4.x. The 
procedure that I follow is very simple:
 # I start from the _cassandra7_ service (which is a seed node)
 # {{nodetool drain}}
 # Wait for the {{DRAINING ... DRAINED}} messages to appear in the log
 # Upgrade the Docker image of _cassandra7_ to the official 4.1.3 version

The procedure is exactly the same I followed for the upgrade 2.2.8 --> 3.11.16, 
obviously with a different version at step 4. Unfortunately the upgrade 3.x --> 
4.x is not working, the _cassandra7_ service restarts and attempts to 
communicate with the other seed node ({_}cassandra9{_}) but the log of 
_cassandra7_ shows the following:
{code:java}
INFO  [Messaging-EventLoop-3-3] 2023-12-06 17:15:04,727 
OutboundConnectionInitiator.java:390 - Failed to connect to peer 
tasks.cassandra9/10.0.2.196:7000(tasks.cassandra9/10.0.2.196:7000)
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: 
Connection reset by peer{code}
The relevant port of the log, related to the missing internode communication, 
is attached in _cassandra7.log_

In the log of _cassandra9_ there is nothing after the abovementioned step #4. 
So only _cassandra7_ is saying something in the logs.

I tried with multiple versions (4.0.11 but also 4.0.0) but the outcome is 
always the same. Of course when I follow the steps 1..3, then restore the 3.x 
snapshot and finally perform the step #4 using the official 3.11.16 version the 
node 7 restarts correctly and joins the cluster. I attached the relevant part 
of the log (see {_}cassandra7.downgrade.log{_}) where you can see that node 7 
and 9 can communicate.

I suspect this could be related to the port 7000 now (with Cassandra 4.x) 
supporting both encrypted and unencrypted traffic. As stated previously I'm 
using the untouched official Cassandra images so all my cluster, inside the 
Docker Swarm, is not (and has never been) configured with encryption.

Any idea on how to further investigate the issue? Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] Cassandra 18852: Make bulk writer resilient to cluster resize events [cassandra-analytics]

2023-12-06 Thread via GitHub


yifan-c commented on code in PR #17:
URL: 
https://github.com/apache/cassandra-analytics/pull/17#discussion_r1416591381


##
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/RecordWriter.java:
##
@@ -207,46 +203,62 @@ private Set 
instancesFromMapping(Map, List rowData,
-   Set> 
newRanges,
-   
ReplicaAwareFailureHandler failureHandler) throws IOException
+   Set> 
subRanges,
+   
ReplicaAwareFailureHandler failureHandler,
+   List results)
+throws IOException, ExecutionException, InterruptedException
 {
 BigInteger token = rowData._1().getToken();
 Range tokenRange = getTokenRange(taskContext);
 
 Preconditions.checkState(tokenRange.contains(token),
  String.format("Received Token %s outside of 
expected range %s", token, tokenRange));
 
-// token range for this partition is not among the write-replica-set 
ranges
-if (!newRanges.contains(tokenRange))
+// We have split ranges likely resulting from pending nodes
+// Evaluate creating a new session if the token from current row is 
part of a sub-range
+if (subRanges.size() > 1)
 {
-Set> subRanges = 
getIntersectingSubRanges(newRanges, tokenRange);
-// We have split ranges - likely resulting from pending nodes
-if (subRanges.size() > 1)
-{
-// Create session using sub-range that contains the token from 
current row
-Range matchingRange = subRanges.stream().filter(r 
-> r.contains(token)).findFirst().get();
-Preconditions.checkState(matchingRange != null,
- String.format("Received Token %s 
outside of expected range %s", token, matchingRange));
+// Create session using sub-range that contains the token from 
current row
+Range matchingSubRange = subRanges.stream().filter(r 
-> r.contains(token)).findFirst().get();
+Preconditions.checkState(matchingSubRange != null,
+ String.format("Received Token %s outside 
of expected range %s", token, matchingSubRange));

Review Comment:
   The `checkState` will not ever see `matchingSubRange == null`. The reason is 
that at line#222, if the value is null, the `get()` operation throws exception 
already. 
   If the intent to provide a more user friendly error message, can you not 
call `get()` and use `Optional> matchingSubRangeOpt` to 
capture the result and run `checkState` on the optional.



##
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/RecordWriter.java:
##
@@ -132,37 +133,32 @@ public List 
write(Iterator> sourceI
 Map valueMap = new HashMap<>();
 try
 {
-List exclusions = 
failureHandler.getFailedInstances();
 Set> newRanges = 
initialTokenRangeMapping.getRangeMap().asMapOfRanges().entrySet()

.stream()
-   
.filter(e -> !exclusions.contains(e.getValue()))

.map(Map.Entry::getKey)

.collect(Collectors.toSet());
+Range tokenRange = getTokenRange(taskContext);
+Set> subRanges = newRanges.contains(tokenRange) ?
+   
Collections.singleton(tokenRange) :
+   
getIntersectingSubRanges(newRanges, tokenRange);
 
 while (dataIterator.hasNext())
 {
 Tuple2 rowData = dataIterator.next();
-streamSession = maybeCreateStreamSession(taskContext, 
streamSession, rowData, newRanges, failureHandler);
-
-sessions.add(streamSession);
+streamSession = maybeCreateStreamSession(taskContext, 
streamSession, rowData, subRanges, failureHandler, results);
 maybeCreateTableWriter(partitionId, baseDir);
 writeRow(rowData, valueMap, partitionId, 
streamSession.getTokenRange());
 checkBatchSize(streamSession, partitionId, job);
 }
 
-// Finalize SSTable for the last StreamSession
-if (sstableWriter != null || (streamSession != null && batchSize 
!= 0))
+// Cleanup SSTable writer and schedule the last stream

Review Comment:
   "Cleanup SSTable writer" reads wrong to me. I would stick with "Finalize". 
The code is to flush any data to sstable by closing the writer. Cleanup leads 
me to 

[jira] [Commented] (CASSANDRA-19166) StackOverflowError on ALTER after many previous schema changes

2023-12-06 Thread Abe Ratnofsky (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793852#comment-17793852
 ] 

Abe Ratnofsky commented on CASSANDRA-19166:
---

Thank you [~jlewandowski]!

> StackOverflowError on ALTER after many previous schema changes
> --
>
> Key: CASSANDRA-19166
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19166
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Abe Ratnofsky
>Assignee: Abe Ratnofsky
>Priority: Normal
> Fix For: 4.1.4, 5.0-rc
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Since 4.1, TableMetadataRefCache re-wraps its fields in 
> Collections.unmodifiableMap on every local schema update. This causes 
> TableMetadataRefCache's Map fields to reference chains of nested 
> UnmodifiableMaps. Eventually, this leads to a StackOverflowError on get(), 
> which has to traverse lots of these maps to fetch the actual value.
> https://github.com/apache/cassandra/blob/4059faf5b948c5a285c25fb0f2e4c4288ee7c305/src/java/org/apache/cassandra/schema/TableMetadataRefCache.java#L53
> The issue goes away on restart, since TableMetadataRefCache is reloaded from 
> disk.
> See CASSANDRA-17044, when TableMetadataRefCache was introduced. This issue 
> was discovered on a real test cluster where schema changes were failing, via 
> a heap dump.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19166) StackOverflowError on ALTER after many previous schema changes

2023-12-06 Thread Jacek Lewandowski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski updated CASSANDRA-19166:
--
  Since Version: 4.1.0
Source Control Link: 
https://github.com/apache/cassandra/commit/a443990bfa64e239810876121f2877064f2d9ae8
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed, thank you for the patch [~aratnofsky] !

> StackOverflowError on ALTER after many previous schema changes
> --
>
> Key: CASSANDRA-19166
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19166
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Abe Ratnofsky
>Assignee: Abe Ratnofsky
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Since 4.1, TableMetadataRefCache re-wraps its fields in 
> Collections.unmodifiableMap on every local schema update. This causes 
> TableMetadataRefCache's Map fields to reference chains of nested 
> UnmodifiableMaps. Eventually, this leads to a StackOverflowError on get(), 
> which has to traverse lots of these maps to fetch the actual value.
> https://github.com/apache/cassandra/blob/4059faf5b948c5a285c25fb0f2e4c4288ee7c305/src/java/org/apache/cassandra/schema/TableMetadataRefCache.java#L53
> The issue goes away on restart, since TableMetadataRefCache is reloaded from 
> disk.
> See CASSANDRA-17044, when TableMetadataRefCache was introduced. This issue 
> was discovered on a real test cluster where schema changes were failing, via 
> a heap dump.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19166) StackOverflowError on ALTER after many previous schema changes

2023-12-06 Thread Jacek Lewandowski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski updated CASSANDRA-19166:
--
Fix Version/s: 4.1.4
   (was: 4.1.x)

> StackOverflowError on ALTER after many previous schema changes
> --
>
> Key: CASSANDRA-19166
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19166
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Abe Ratnofsky
>Assignee: Abe Ratnofsky
>Priority: Normal
> Fix For: 4.1.4, 5.0-rc
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Since 4.1, TableMetadataRefCache re-wraps its fields in 
> Collections.unmodifiableMap on every local schema update. This causes 
> TableMetadataRefCache's Map fields to reference chains of nested 
> UnmodifiableMaps. Eventually, this leads to a StackOverflowError on get(), 
> which has to traverse lots of these maps to fetch the actual value.
> https://github.com/apache/cassandra/blob/4059faf5b948c5a285c25fb0f2e4c4288ee7c305/src/java/org/apache/cassandra/schema/TableMetadataRefCache.java#L53
> The issue goes away on restart, since TableMetadataRefCache is reloaded from 
> disk.
> See CASSANDRA-17044, when TableMetadataRefCache was introduced. This issue 
> was discovered on a real test cluster where schema changes were failing, via 
> a heap dump.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) branch cassandra-4.1 updated (13e5956285 -> a443990bfa)

2023-12-06 Thread jlewandowski
This is an automated email from the ASF dual-hosted git repository.

jlewandowski pushed a change to branch cassandra-4.1
in repository https://gitbox.apache.org/repos/asf/cassandra.git


from 13e5956285 Merge branch 'cassandra-4.0' into cassandra-4.1
 add a443990bfa Fix StackOverflowError on ALTER after many previous schema 
changes

No new revisions were added by this update.

Summary of changes:
 CHANGES.txt  |  1 +
 .../cassandra/schema/TableMetadataRefCache.java  | 20 +---
 2 files changed, 14 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) branch cassandra-5.0 updated (fdfc5e614d -> 676f7ee751)

2023-12-06 Thread jlewandowski
This is an automated email from the ASF dual-hosted git repository.

jlewandowski pushed a change to branch cassandra-5.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git


from fdfc5e614d Merge branch 'cassandra-4.1' into cassandra-5.0
 add a443990bfa Fix StackOverflowError on ALTER after many previous schema 
changes
 add 676f7ee751 Merge branch 'cassandra-4.1' into cassandra-5.0

No new revisions were added by this update.

Summary of changes:
 CHANGES.txt  |  2 ++
 .../cassandra/schema/TableMetadataRefCache.java  | 20 +---
 2 files changed, 15 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) 01/01: Merge branch 'cassandra-5.0' into trunk

2023-12-06 Thread jlewandowski
This is an automated email from the ASF dual-hosted git repository.

jlewandowski pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit ea1f9e4504cec1849a96c4b8eac962783662fcd8
Merge: ad86c9d201 676f7ee751
Author: Jacek Lewandowski 
AuthorDate: Wed Dec 6 18:05:49 2023 +0100

Merge branch 'cassandra-5.0' into trunk

* cassandra-5.0:
  Fix StackOverflowError on ALTER after many previous schema changes

 CHANGES.txt | 1 +
 1 file changed, 1 insertion(+)

diff --cc CHANGES.txt
index 6fcb72dcf1,c69b0e1234..60879f9d64
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -302,6 -291,6 +302,7 @@@ Merged from 3.0
  
  
  4.1.4
++ * Fix StackOverflowError on ALTER after many previous schema changes 
(CASSANDRA-19166)
  Merged from 4.0:
   * Fix NTS log message when an unrecognized strategy option is passed 
(CASSANDRA-18679)
   * Fix BulkLoader ignoring cipher suites options (CASSANDRA-18582)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) branch trunk updated (ad86c9d201 -> ea1f9e4504)

2023-12-06 Thread jlewandowski
This is an automated email from the ASF dual-hosted git repository.

jlewandowski pushed a change to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


from ad86c9d201 Merge branch 'cassandra-5.0' into trunk
 add a443990bfa Fix StackOverflowError on ALTER after many previous schema 
changes
 add 676f7ee751 Merge branch 'cassandra-4.1' into cassandra-5.0
 new ea1f9e4504 Merge branch 'cassandra-5.0' into trunk

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGES.txt | 1 +
 1 file changed, 1 insertion(+)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19120) local consistencies may get timeout if blocking read repair is sending the read repair mutation to other DC

2023-12-06 Thread Runtian Liu (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793846#comment-17793846
 ] 

Runtian Liu commented on CASSANDRA-19120:
-

The test you added is not right or at least not testing the bug we are 
discussing.

The pending node should be joining the first DC instead of second DC (1) which 
is causing one more remote node to be added to the repairs map (This part is 
good in your test(2))

If the pending node is joining the first DC, the blockFor will be 3 instead of 
2. (3)

Since no node is repairs satisfied the if condition (4), the latch will be 
initialized with 3.

So the handler.waitingOn should be 3 instead of 2. (5)

Without any change to the latch part, we will run into the timeout error 
because ack node4 won't do latch count down.

(1) 
[https://github.com/instaclustr/cassandra/commit/853ced996d3637109bf1e183092f0bd9cbb180ca#diff-1ddca3571de225b02568519eada4b76eb136b84c4cc25f061d5c1f806f0fe145R332]

(2) 
[https://github.com/instaclustr/cassandra/commit/853ced996d3637109bf1e183092f0bd9cbb180ca#diff-1ddca3571de225b02568519eada4b76eb136b84c4cc25f061d5c1f806f0fe145R309-R312]

(3) 
[https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L83]

(4) 
[https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L90]

(5) 
[https://github.com/instaclustr/cassandra/commit/853ced996d3637109bf1e183092f0bd9cbb180ca#diff-1ddca3571de225b02568519eada4b76eb136b84c4cc25f061d5c1f806f0fe145R343]

 

> local consistencies may get timeout if blocking read repair is sending the 
> read repair mutation to other DC 
> 
>
> Key: CASSANDRA-19120
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19120
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Runtian Liu
>Priority: Normal
> Attachments: image-2023-11-29-15-26-08-056.png, signature.asc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For a two DCs cluster setup. When a new node is being added to DC1, for 
> blocking read repair triggered by local_quorum in DC1, it will require to 
> send read repair mutation to an extra node(1)(2). The selector for read 
> repair may select *ANY* node that has not been contacted before(3) instead of 
> selecting the DC1 nodes. If a node from DC2 is selected, this will cause 100% 
> timeout because of the bug described below:
> When we initialized the latch(4) for blocking read repair, the shouldBlockOn 
> function will only return true for local nodes(5), the blockFor value will be 
> reduced if a local node doesn't require repair(6). The blockFor is same as 
> the number of read repair mutation sent out. But when the coordinator node 
> receives the response from the target nodes, the latch only count down for 
> nodes in same DC(7). The latch will wait till timeout and the read request 
> will timeout.
> This can be reproduced if you have a constant load on a 3 + 3 cluster when 
> adding a node. If you have someway to trigger blocking read repair(maybe by 
> adding load using stress tool). If you use local_quorum consistency with a 
> constant read after write load in the same DC that you are adding node. You 
> will see read timeout issue from time to time because of the bug described 
> above
>  
> I think for read repair when selecting the extra node to do repair, we should 
> prefer local nodes than the nodes from other region. Also, we need to fix the 
> latch part so even if we send mutation to the nodes in other DC, we don't get 
> a timeout.
> (1)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L455]
> (2)[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ConsistencyLevel.java#L183]
> (3)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L458]
> (4)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L96]
> (5)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L71]
> (6)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L88]
> (7)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L113]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: 

(cassandra-website) branch asf-staging updated (3c2f600d -> 606e216c)

2023-12-06 Thread git-site-role
This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a change to branch asf-staging
in repository https://gitbox.apache.org/repos/asf/cassandra-website.git


omit 3c2f600d generate docs for afd7ef66
 new 606e216c generate docs for afd7ef66

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (3c2f600d)
\
 N -- N -- N   refs/heads/asf-staging (606e216c)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 content/search-index.js |   2 +-
 site-ui/build/ui-bundle.zip | Bin 4883726 -> 4883726 bytes
 2 files changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19009) CEP-15: (C*/Accord) Schema based fast path reconfiguration

2023-12-06 Thread Blake Eggleston (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-19009:

Status: Review In Progress  (was: Patch Available)

> CEP-15: (C*/Accord)  Schema based fast path reconfiguration
> ---
>
> Key: CASSANDRA-19009
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19009
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 5.0.x
>
>
> This adds availability aware accord fast path reconfiguration, as well as 
> user configurable fast path settings, which are set at the keyspace level and 
> (optionally) at the table level for increased granularity.
> The major parts are:
> *Add availability information to cluster metadata*
> Accord topology in C* is not stored in cluster metadata, but is meant to 
> calculated deterministically from cluster metadata state at a given epoch. 
> This adds the availability data, as well as the failure detector / gossip 
> listener and state change deduplication to CMS.
> *Move C* accord keys/topology from keyspace prefixes to tableid prefixes*
> To support per-table fast path settings, topologies and keys need to include 
> the table id. Since accord topologies could begin to consume a lot of memory 
> in clusters with a lot of nodes and tables, topology generation has been 
> updated to reuse previously allocated shards / shard parts where possible, 
> which will only increase heap sizes when things actually change.
> *Make fast path settings configurable via schema*
> There are 2.5 strategies: Simple, Parameterized, and InheritKeyspaceSettings. 
> Simple will use as many available nodes as possible for the fast path 
> electorate, this is the default for the keyspace fast path strategy. 
> Parameterized allows you to set a target size, and preferred datacenters for 
> the FP electorate. InheritKeyspace tells topology generation to just use the 
> keyspace fast path settings, and is the default for the table fast path 
> strategy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19009) CEP-15: (C*/Accord) Schema based fast path reconfiguration

2023-12-06 Thread Blake Eggleston (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-19009:

Status: Ready to Commit  (was: Review In Progress)

> CEP-15: (C*/Accord)  Schema based fast path reconfiguration
> ---
>
> Key: CASSANDRA-19009
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19009
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 5.0.x
>
>
> This adds availability aware accord fast path reconfiguration, as well as 
> user configurable fast path settings, which are set at the keyspace level and 
> (optionally) at the table level for increased granularity.
> The major parts are:
> *Add availability information to cluster metadata*
> Accord topology in C* is not stored in cluster metadata, but is meant to 
> calculated deterministically from cluster metadata state at a given epoch. 
> This adds the availability data, as well as the failure detector / gossip 
> listener and state change deduplication to CMS.
> *Move C* accord keys/topology from keyspace prefixes to tableid prefixes*
> To support per-table fast path settings, topologies and keys need to include 
> the table id. Since accord topologies could begin to consume a lot of memory 
> in clusters with a lot of nodes and tables, topology generation has been 
> updated to reuse previously allocated shards / shard parts where possible, 
> which will only increase heap sizes when things actually change.
> *Make fast path settings configurable via schema*
> There are 2.5 strategies: Simple, Parameterized, and InheritKeyspaceSettings. 
> Simple will use as many available nodes as possible for the fast path 
> electorate, this is the default for the keyspace fast path strategy. 
> Parameterized allows you to set a target size, and preferred datacenters for 
> the FP electorate. InheritKeyspace tells topology generation to just use the 
> keyspace fast path settings, and is the default for the table fast path 
> strategy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19175) Test Failure: dtest.sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ca_mismatch

2023-12-06 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793835#comment-17793835
 ] 

Brandon Williams commented on CASSANDRA-19175:
--

I'm not able to reproduce this, and it makes sense that if the machine ran out 
of space a log message may be truncated, so this could have been environmental.

> Test Failure: 
> dtest.sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ca_mismatch
> ---
>
> Key: CASSANDRA-19175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19175
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 5.1-beta
>
>
> [https://ci-cassandra.apache.org/job/Cassandra-trunk/1782/testReport/dtest.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/]
> h3.  
> {code:java}
> Error Message
> assert False
> Stacktrace
> self =  0x7fca5921d050> def test_ca_mismatch(self): """CA mismatch should cause nodes 
> to fail to connect""" credNode1 = sslkeygen.generate_credentials("127.0.0.1") 
> credNode2 = sslkeygen.generate_credentials("127.0.0.2") # mismatching CA! 
> self.setup_nodes(credNode1, credNode2) 
> self.fixture_dtest_setup.allow_log_errors = True 
> self.cluster.start(no_wait=True) found = self._grep_msg(self.node1, 
> _LOG_ERR_HANDSHAKE) self.cluster.stop() > assert found E assert False 
> sslnodetonode_test.py:115: AssertionError
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19084) Test Failure: IndexStreamingFailureTest.testAvailabilityAfterFailed*EntireFileStreaming

2023-12-06 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793829#comment-17793829
 ] 

Ekaterina Dimitrova commented on CASSANDRA-19084:
-

This seems to be known issue from the original ticket where the test class was 
introduced. Check the comments in CASSANDRA-18670 and this run:

[https://app.circleci.com/pipelines/github/adelapena/cassandra/3045/workflows/02b94d07-ba00-457c-9d2c-c41a020bda01/jobs/61452/tests]

CC [~maedhroz] , [~adelapena] and [~mikea] 

> Test Failure: 
> IndexStreamingFailureTest.testAvailabilityAfterFailed*EntireFileStreaming
> ---
>
> Key: CASSANDRA-19084
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19084
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/SAI
>Reporter: Michael Semb Wever
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> Flakies 
> https://app.circleci.com/pipelines/github/adelapena/cassandra/3329/workflows/f2124edd-fa0e-4bc5-ab03-ddfb886bf015/jobs/93097/tests
> {noformat}
> java.lang.NullPointerException
>   at 
> java.base/sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:133)
>   at 
> java.base/sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:155)
>   at java.base/java.net.URL.openStream(URL.java:1165)
>   at 
> java.base/java.lang.ClassLoader.getResourceAsStream(ClassLoader.java:1739)
>   at 
> net.bytebuddy.dynamic.ClassFileLocator$ForClassLoader.locate(ClassFileLocator.java:453)
>   at 
> net.bytebuddy.dynamic.ClassFileLocator$ForClassLoader.locate(ClassFileLocator.java:434)
>   at 
> net.bytebuddy.dynamic.scaffold.TypeWriter$Default$ForInlining.create(TypeWriter.java:4009)
>   at 
> net.bytebuddy.dynamic.scaffold.TypeWriter$Default.make(TypeWriter.java:2224)
>   at 
> net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$UsingTypeWriter.make(DynamicType.java:4050)
>   at 
> net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase.make(DynamicType.java:3734)
>   at 
> net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$Delegator.make(DynamicType.java:3986)
>   at 
> org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest$ByteBuddyHelper.installErrors(IndexStreamingFailureTest.java:154)
>   at 
> org.apache.cassandra.distributed.shared.AbstractBuilder$1.initialise(AbstractBuilder.java:360)
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.newInstance(AbstractCluster.java:312)
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegateForStartup(AbstractCluster.java:292)
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.startup(AbstractCluster.java:410)
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.startup(AbstractCluster.java:383)
>   at 
> org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest.testAvailabilityAfterStreaming(IndexStreamingFailureTest.java:123)
>   at 
> org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest.testAvailabilityAfterFailedNonEntireFileStreaming(IndexStreamingFailureTest.java:79)
> {noformat}
> {noformat}
> java.lang.IllegalStateException: Can't use shutdown instances, delegate is 
> null
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:285)
>   at 
> org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.transfer(DelegatingInvokableInstance.java:49)
>   at 
> org.apache.cassandra.distributed.api.IInvokableInstance.runsOnInstance(IInvokableInstance.java:45)
>   at 
> org.apache.cassandra.distributed.api.IInvokableInstance.runOnInstance(IInvokableInstance.java:46)
>   at 
> org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest.testAvailabilityAfterFailedEntireFileStreaming(IndexStreamingFailureTest.java:85)
> {noformat}
> https://ci-cassandra.apache.org/job/Cassandra-5.0/106/testReport/org.apache.cassandra.distributed.test.sai/IndexStreamingFailureTest/testAvailabilityAfterFailedNonEntireFileStreaming__jdk11_x86_64_novnode/
> {noformat}
> java.lang.RuntimeException: The class file could not be written
>   at 
> net.bytebuddy.dynamic.scaffold.TypeWriter$Default$ForInlining.create(TypeWriter.java:4021)
>   at 
> net.bytebuddy.dynamic.scaffold.TypeWriter$Default.make(TypeWriter.java:2224)
>   at 
> net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$UsingTypeWriter.make(DynamicType.java:4050)
>   at 
> net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase.make(DynamicType.java:3734)
>   at 
> net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$Delegator.make(DynamicType.java:3986)
>   at 
> 

[jira] [Updated] (CASSANDRA-19072) Test failure: org.apache.cassandra.distributed.test.log.FetchLogFromPeersTest.catchupCoordinatorAheadPlacementsReadTest-_jdk11

2023-12-06 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19072:

Test and Documentation Plan: CI summary attached
 Status: Patch Available  (was: In Progress)

The issue is that the test requires a specific node not to be a replica for a
particular key then to become one following the removal of a peer. With 16
tokens per node, this node is already a replica at the outset and so the test
fails. We could select a different key, but the number of vnodes is set
externally to the test and so could change. The thing we're testing here is
unrelated to the number of tokens, so I've just fixed a couple of  specific to
only run in a non-vnodes env. 
Patch 
[here|https://github.com/beobal/cassandra/commit/7d6b5a5fc0ee78114ef4cfced86229bf8c59019b]
CI summary and results attached

> Test failure: 
> org.apache.cassandra.distributed.test.log.FetchLogFromPeersTest.catchupCoordinatorAheadPlacementsReadTest-_jdk11
> --
>
> Key: CASSANDRA-19072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19072
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Alex Petrov
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1-alpha1
>
>
> CircleCI failure: 
> https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/256/workflows/c4fda8f1-a8d6-4523-be83-5e30b9de39fe/jobs/20464/tests
> Also failing on 17: Circleci Failure: 
> https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/256/workflows/c4fda8f1-a8d6-4523-be83-5e30b9de39fe/jobs/20500/tests
> {code}
> junit.framework.AssertionFailedError
>   at 
> org.apache.cassandra.distributed.test.log.FetchLogFromPeersTest.catchupCoordinatorAheadPlacementsReadTest(FetchLogFromPeersTest.java:217)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19072) Test failure: org.apache.cassandra.distributed.test.log.FetchLogFromPeersTest.catchupCoordinatorAheadPlacementsReadTest-_jdk11

2023-12-06 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19072:

Attachment: ci_summary.html
result_details.tar.gz

> Test failure: 
> org.apache.cassandra.distributed.test.log.FetchLogFromPeersTest.catchupCoordinatorAheadPlacementsReadTest-_jdk11
> --
>
> Key: CASSANDRA-19072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19072
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Alex Petrov
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>
> CircleCI failure: 
> https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/256/workflows/c4fda8f1-a8d6-4523-be83-5e30b9de39fe/jobs/20464/tests
> Also failing on 17: Circleci Failure: 
> https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/256/workflows/c4fda8f1-a8d6-4523-be83-5e30b9de39fe/jobs/20500/tests
> {code}
> junit.framework.AssertionFailedError
>   at 
> org.apache.cassandra.distributed.test.log.FetchLogFromPeersTest.catchupCoordinatorAheadPlacementsReadTest(FetchLogFromPeersTest.java:217)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19104) Standardize tablestats formatting and data units

2023-12-06 Thread Leo Toff (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793826#comment-17793826
 ] 

Leo Toff commented on CASSANDRA-19104:
--

Got it, keeping KiB/MiB/GiB for now. We should probably get feedback from the 
mailing list on this as well.

Regarding "compacted partitions", the spreadsheet is updated to use byte 
quantifiers.

Regarding "zero bytes", looks like "0.00 KiB" is the preferred format according 
to the discussion in the mailing list, so I'll go with that. However, I think 
"0.00 KiB" is ambiguous since it might be interpreted as "something below 0.005 
KiB", not necessarily "zero bytes". So my personal preference is "0 B" or "0 
bytes". I'll be using "0.00 KiB" until corrected.

> Standardize tablestats formatting and data units
> 
>
> Key: CASSANDRA-19104
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19104
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/nodetool
>Reporter: Brad Schoening
>Assignee: Leo Toff
>Priority: Normal
>
> Tablestats reports output in plaintext, JSON or YAML. The human readable 
> output currently has a mix of KiB, bytes with inconsistent spacing
> Considering simplifying and defaulting output to 'human readable'. Machine 
> readable output is available as an option and the current mixed output 
> formatting is neither friendly for human or machine reading.
> !image-2023-11-27-13-49-14-247.png!
> *Not a goal now (consider a follow up Jira):*
> Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting:
>  * gcstats - uses MB
>  * getcompactionthroughput - uses MB/s
>  * getstreamthroughput - uses MB/s
>  * info - uses MiB/GiB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19084) Test Failure: IndexStreamingFailureTest.testAvailabilityAfterFailed*EntireFileStreaming

2023-12-06 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-19084:

 Bug Category: Parent values: Correctness(12982)Level 1 values: Test 
Failure(12990)
   Complexity: Normal
Discovered By: User Report
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Test Failure: 
> IndexStreamingFailureTest.testAvailabilityAfterFailed*EntireFileStreaming
> ---
>
> Key: CASSANDRA-19084
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19084
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/SAI
>Reporter: Michael Semb Wever
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> Flakies 
> https://app.circleci.com/pipelines/github/adelapena/cassandra/3329/workflows/f2124edd-fa0e-4bc5-ab03-ddfb886bf015/jobs/93097/tests
> {noformat}
> java.lang.NullPointerException
>   at 
> java.base/sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:133)
>   at 
> java.base/sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:155)
>   at java.base/java.net.URL.openStream(URL.java:1165)
>   at 
> java.base/java.lang.ClassLoader.getResourceAsStream(ClassLoader.java:1739)
>   at 
> net.bytebuddy.dynamic.ClassFileLocator$ForClassLoader.locate(ClassFileLocator.java:453)
>   at 
> net.bytebuddy.dynamic.ClassFileLocator$ForClassLoader.locate(ClassFileLocator.java:434)
>   at 
> net.bytebuddy.dynamic.scaffold.TypeWriter$Default$ForInlining.create(TypeWriter.java:4009)
>   at 
> net.bytebuddy.dynamic.scaffold.TypeWriter$Default.make(TypeWriter.java:2224)
>   at 
> net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$UsingTypeWriter.make(DynamicType.java:4050)
>   at 
> net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase.make(DynamicType.java:3734)
>   at 
> net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$Delegator.make(DynamicType.java:3986)
>   at 
> org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest$ByteBuddyHelper.installErrors(IndexStreamingFailureTest.java:154)
>   at 
> org.apache.cassandra.distributed.shared.AbstractBuilder$1.initialise(AbstractBuilder.java:360)
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.newInstance(AbstractCluster.java:312)
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegateForStartup(AbstractCluster.java:292)
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.startup(AbstractCluster.java:410)
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.startup(AbstractCluster.java:383)
>   at 
> org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest.testAvailabilityAfterStreaming(IndexStreamingFailureTest.java:123)
>   at 
> org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest.testAvailabilityAfterFailedNonEntireFileStreaming(IndexStreamingFailureTest.java:79)
> {noformat}
> {noformat}
> java.lang.IllegalStateException: Can't use shutdown instances, delegate is 
> null
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:285)
>   at 
> org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.transfer(DelegatingInvokableInstance.java:49)
>   at 
> org.apache.cassandra.distributed.api.IInvokableInstance.runsOnInstance(IInvokableInstance.java:45)
>   at 
> org.apache.cassandra.distributed.api.IInvokableInstance.runOnInstance(IInvokableInstance.java:46)
>   at 
> org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest.testAvailabilityAfterFailedEntireFileStreaming(IndexStreamingFailureTest.java:85)
> {noformat}
> https://ci-cassandra.apache.org/job/Cassandra-5.0/106/testReport/org.apache.cassandra.distributed.test.sai/IndexStreamingFailureTest/testAvailabilityAfterFailedNonEntireFileStreaming__jdk11_x86_64_novnode/
> {noformat}
> java.lang.RuntimeException: The class file could not be written
>   at 
> net.bytebuddy.dynamic.scaffold.TypeWriter$Default$ForInlining.create(TypeWriter.java:4021)
>   at 
> net.bytebuddy.dynamic.scaffold.TypeWriter$Default.make(TypeWriter.java:2224)
>   at 
> net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$UsingTypeWriter.make(DynamicType.java:4050)
>   at 
> net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase.make(DynamicType.java:3734)
>   at 
> net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$Delegator.make(DynamicType.java:3986)
>   at 
> org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest$ByteBuddyHelper.installErrors(IndexStreamingFailureTest.java:155)
>   at 
> 

[jira] [Updated] (CASSANDRA-19104) Standardize tablestats formatting and data units

2023-12-06 Thread Leo Toff (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leo Toff updated CASSANDRA-19104:
-
Description: 
Tablestats reports output in plaintext, JSON or YAML. The human readable output 
currently has a mix of KiB, bytes with inconsistent spacing

Considering simplifying and defaulting output to 'human readable'. Machine 
readable output is available as an option and the current mixed output 
formatting is neither friendly for human or machine reading.

!image-2023-11-27-13-49-14-247.png!

*Not a goal now (consider a follow up Jira):*

Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting:
 * gcstats - uses MB
 * getcompactionthroughput - uses MB/s
 * getstreamthroughput - uses MB/s
 * info - uses MiB/GiB

  was:
Tablestats reports output in plaintext, JSON or YAML. The human readable output 
currently has a mix of KiB, bytes with inconsistent spacing

Considering simplifying and defaulting output to 'human readable'. Machine 
readable output is available as an option and the current mixed output 
formatting is neither friendly for human or machine reading.

!image-2023-11-27-13-49-14-247.png!


> Standardize tablestats formatting and data units
> 
>
> Key: CASSANDRA-19104
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19104
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/nodetool
>Reporter: Brad Schoening
>Assignee: Leo Toff
>Priority: Normal
>
> Tablestats reports output in plaintext, JSON or YAML. The human readable 
> output currently has a mix of KiB, bytes with inconsistent spacing
> Considering simplifying and defaulting output to 'human readable'. Machine 
> readable output is available as an option and the current mixed output 
> formatting is neither friendly for human or machine reading.
> !image-2023-11-27-13-49-14-247.png!
> *Not a goal now (consider a follow up Jira):*
> Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting:
>  * gcstats - uses MB
>  * getcompactionthroughput - uses MB/s
>  * getstreamthroughput - uses MB/s
>  * info - uses MiB/GiB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19009) CEP-15: (C*/Accord) Schema based fast path reconfiguration

2023-12-06 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793822#comment-17793822
 ] 

Alex Petrov commented on CASSANDRA-19009:
-

Sounds good. If you agree with minor suggestions above, please feel free to 
make the change on commit.

+1 otherwise!

> CEP-15: (C*/Accord)  Schema based fast path reconfiguration
> ---
>
> Key: CASSANDRA-19009
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19009
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 5.0.x
>
>
> This adds availability aware accord fast path reconfiguration, as well as 
> user configurable fast path settings, which are set at the keyspace level and 
> (optionally) at the table level for increased granularity.
> The major parts are:
> *Add availability information to cluster metadata*
> Accord topology in C* is not stored in cluster metadata, but is meant to 
> calculated deterministically from cluster metadata state at a given epoch. 
> This adds the availability data, as well as the failure detector / gossip 
> listener and state change deduplication to CMS.
> *Move C* accord keys/topology from keyspace prefixes to tableid prefixes*
> To support per-table fast path settings, topologies and keys need to include 
> the table id. Since accord topologies could begin to consume a lot of memory 
> in clusters with a lot of nodes and tables, topology generation has been 
> updated to reuse previously allocated shards / shard parts where possible, 
> which will only increase heap sizes when things actually change.
> *Make fast path settings configurable via schema*
> There are 2.5 strategies: Simple, Parameterized, and InheritKeyspaceSettings. 
> Simple will use as many available nodes as possible for the fast path 
> electorate, this is the default for the keyspace fast path strategy. 
> Parameterized allows you to set a target size, and preferred datacenters for 
> the FP electorate. InheritKeyspace tells topology generation to just use the 
> keyspace fast path settings, and is the default for the table fast path 
> strategy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19169) Don't NPE when initializing CFSs for local system keyspaces with UCS

2023-12-06 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793815#comment-17793815
 ] 

Sam Tunnicliffe commented on CASSANDRA-19169:
-

Trivial patch 
[here|https://github.com/beobal/cassandra/commit/cd456f7e30f6128e67631503f0e71f4b99cc]

Circle CI runs for branch with CASSANDRA-19169, CASSANDRA-19102 & 
CASSANDRA-19171:  
[J11|https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/274/workflows/6843aabd-4749-4cbf-94a5-ec3a546704e6],
 
[J17|https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/274/workflows/440e2e39-038e-45ba-ab91-80335a497f21]
 
Aside from some strangeness with cqlshlib tests which appears completely 
unrelated, only failures are the ones tracked in CASSANDRA-19072, 
CASSANDRA-19058 & CASSANDRA-18360

> Don't NPE when initializing CFSs for local system keyspaces with UCS
> 
>
> Key: CASSANDRA-19169
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19169
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1-alpha1
>
>
> When UnifiedCompactionStrategy is used as the default, NPEs are thrown when
> flushing the system keyspace tables early during startup. The system keyspace 
> is
> initialised before the cluster metadata, but UCS currently tries to access the
> current epoch when initialising the shard manager, to determine whether the
> local ranges are out of date. This isn't necessary for the system keyspaces as
> they use LocalStrategy and cover the whole token space.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19009) CEP-15: (C*/Accord) Schema based fast path reconfiguration

2023-12-06 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793813#comment-17793813
 ] 

Blake Eggleston edited comment on CASSANDRA-19009 at 12/6/23 4:08 PM:
--

> I do not understand fully is the intention for {{maintenance}} scheduled task

The maintenance scheduled task is for rapid state changes. For instance, if a 
node transitioned to DOWN, then back to UP within the update interval, the 
UNAVAILABLE update would be executed, but the NORMAL update would be rejected. 
The maintenance task is just so we retry in cases like that


was (Author: bdeggleston):
> I do not understand fully is the intention for {{maintenance}} scheduled task

> CEP-15: (C*/Accord)  Schema based fast path reconfiguration
> ---
>
> Key: CASSANDRA-19009
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19009
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 5.0.x
>
>
> This adds availability aware accord fast path reconfiguration, as well as 
> user configurable fast path settings, which are set at the keyspace level and 
> (optionally) at the table level for increased granularity.
> The major parts are:
> *Add availability information to cluster metadata*
> Accord topology in C* is not stored in cluster metadata, but is meant to 
> calculated deterministically from cluster metadata state at a given epoch. 
> This adds the availability data, as well as the failure detector / gossip 
> listener and state change deduplication to CMS.
> *Move C* accord keys/topology from keyspace prefixes to tableid prefixes*
> To support per-table fast path settings, topologies and keys need to include 
> the table id. Since accord topologies could begin to consume a lot of memory 
> in clusters with a lot of nodes and tables, topology generation has been 
> updated to reuse previously allocated shards / shard parts where possible, 
> which will only increase heap sizes when things actually change.
> *Make fast path settings configurable via schema*
> There are 2.5 strategies: Simple, Parameterized, and InheritKeyspaceSettings. 
> Simple will use as many available nodes as possible for the fast path 
> electorate, this is the default for the keyspace fast path strategy. 
> Parameterized allows you to set a target size, and preferred datacenters for 
> the FP electorate. InheritKeyspace tells topology generation to just use the 
> keyspace fast path settings, and is the default for the table fast path 
> strategy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19169) Don't NPE when initializing CFSs for local system keyspaces with UCS

2023-12-06 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793402#comment-17793402
 ] 

Sam Tunnicliffe edited comment on CASSANDRA-19169 at 12/6/23 4:08 PM:
--

https://github.com/beobal/cassandra/commits/samt/19169

Edit: added commit to a branch batching together a few small fixes


was (Author: beobal):
[-https://github.com/beobal/cassandra/commits/samt/19169-]

> Don't NPE when initializing CFSs for local system keyspaces with UCS
> 
>
> Key: CASSANDRA-19169
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19169
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1-alpha1
>
>
> When UnifiedCompactionStrategy is used as the default, NPEs are thrown when
> flushing the system keyspace tables early during startup. The system keyspace 
> is
> initialised before the cluster metadata, but UCS currently tries to access the
> current epoch when initialising the shard manager, to determine whether the
> local ranges are out of date. This isn't necessary for the system keyspaces as
> they use LocalStrategy and cover the whole token space.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19169) Don't NPE when initializing CFSs for local system keyspaces with UCS

2023-12-06 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793402#comment-17793402
 ] 

Sam Tunnicliffe edited comment on CASSANDRA-19169 at 12/6/23 4:07 PM:
--

[-https://github.com/beobal/cassandra/commits/samt/19169-]


was (Author: beobal):
[https://github.com/beobal/cassandra/commits/samt/19169]

> Don't NPE when initializing CFSs for local system keyspaces with UCS
> 
>
> Key: CASSANDRA-19169
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19169
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1-alpha1
>
>
> When UnifiedCompactionStrategy is used as the default, NPEs are thrown when
> flushing the system keyspace tables early during startup. The system keyspace 
> is
> initialised before the cluster metadata, but UCS currently tries to access the
> current epoch when initialising the shard manager, to determine whether the
> local ranges are out of date. This isn't necessary for the system keyspaces as
> they use LocalStrategy and cover the whole token space.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19009) CEP-15: (C*/Accord) Schema based fast path reconfiguration

2023-12-06 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793813#comment-17793813
 ] 

Blake Eggleston commented on CASSANDRA-19009:
-

> I do not understand fully is the intention for {{maintenance}} scheduled task

> CEP-15: (C*/Accord)  Schema based fast path reconfiguration
> ---
>
> Key: CASSANDRA-19009
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19009
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 5.0.x
>
>
> This adds availability aware accord fast path reconfiguration, as well as 
> user configurable fast path settings, which are set at the keyspace level and 
> (optionally) at the table level for increased granularity.
> The major parts are:
> *Add availability information to cluster metadata*
> Accord topology in C* is not stored in cluster metadata, but is meant to 
> calculated deterministically from cluster metadata state at a given epoch. 
> This adds the availability data, as well as the failure detector / gossip 
> listener and state change deduplication to CMS.
> *Move C* accord keys/topology from keyspace prefixes to tableid prefixes*
> To support per-table fast path settings, topologies and keys need to include 
> the table id. Since accord topologies could begin to consume a lot of memory 
> in clusters with a lot of nodes and tables, topology generation has been 
> updated to reuse previously allocated shards / shard parts where possible, 
> which will only increase heap sizes when things actually change.
> *Make fast path settings configurable via schema*
> There are 2.5 strategies: Simple, Parameterized, and InheritKeyspaceSettings. 
> Simple will use as many available nodes as possible for the fast path 
> electorate, this is the default for the keyspace fast path strategy. 
> Parameterized allows you to set a target size, and preferred datacenters for 
> the FP electorate. InheritKeyspace tells topology generation to just use the 
> keyspace fast path settings, and is the default for the table fast path 
> strategy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19171) Test Failure: org.apache.cassandra.locator.PropertyFileSnitchTest.configContainsRemoteConfig-cdc_jdk17_x86_64

2023-12-06 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19171:

Test and Documentation Plan: CI details in comment
 Status: Patch Available  (was: In Progress)

Trivial patch 
[here|https://github.com/beobal/cassandra/commit/6cf2271f7b806b4aecfeada1eb0575f2c646f1fd]
 
Circle CI runs for branch with CASSANDRA-19169, CASSANDRA-19102 & 
CASSANDRA-19171:  
[J11|https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/274/workflows/6843aabd-4749-4cbf-94a5-ec3a546704e6],
 
[J17|https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/274/workflows/440e2e39-038e-45ba-ab91-80335a497f21]
 
Aside from some strangeness with cqlshlib tests which appears completely 
unrelated, only failures are the ones tracked in CASSANDRA-19072, 
CASSANDRA-19058 & CASSANDRA-18360

> Test Failure: 
> org.apache.cassandra.locator.PropertyFileSnitchTest.configContainsRemoteConfig-cdc_jdk17_x86_64
> -
>
> Key: CASSANDRA-19171
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19171
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1-beta
>
>
> h3.  
> {code:java}
> Error Message
> Multiple entries with same key: 127.0.0.1:7012=OTHER_DC1:OTHER_RAC1 and 
> 127.0.0.1:7012=DC1:RAC1
> Stacktrace
> java.lang.IllegalArgumentException: Multiple entries with same key: 
> 127.0.0.1:7012=OTHER_DC1:OTHER_RAC1 and 127.0.0.1:7012=DC1:RAC1 at 
> com.google.common.collect.ImmutableMap.conflictException(ImmutableMap.java:378)
>  at 
> com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:372) 
> at 
> com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:246)
>  at 
> com.google.common.collect.RegularImmutableMap.fromEntryArrayCheckingBucketOverflow(RegularImmutableMap.java:133)
>  at 
> com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:95)
>  at 
> com.google.common.collect.RegularImmutableMap.fromEntries(RegularImmutableMap.java:78)
>  at com.google.common.collect.ImmutableMap.of(ImmutableMap.java:139) at 
> org.apache.cassandra.locator.PropertyFileSnitchTest.configContainsRemoteConfig(PropertyFileSnitchTest.java:121)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19102) Test Failure: org.apache.cassandra.distributed.test.ReadRepairTest#readRepairRTRangeMovementTest

2023-12-06 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19102:

Test and Documentation Plan: CI details in comment
 Status: Patch Available  (was: In Progress)

Trivial patch 
[here|https://github.com/beobal/cassandra/commit/940ebc97a63a4ec4e207e348c4311aec07c2cd44]
 

Circle CI runs for branch with CASSANDRA-19169, CASSANDRA-19102 & 
CASSANDRA-19171:  
[J11|https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/274/workflows/6843aabd-4749-4cbf-94a5-ec3a546704e6],
 
[J17|https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/274/workflows/440e2e39-038e-45ba-ab91-80335a497f21]
 
Aside from some strangeness with cqlshlib tests which appears completely 
unrelated, only failures are the ones tracked in CASSANDRA-19072, 
CASSANDRA-19058 & CASSANDRA-18360


> Test Failure: 
> org.apache.cassandra.distributed.test.ReadRepairTest#readRepairRTRangeMovementTest
> 
>
> Key: CASSANDRA-19102
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19102
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Jacek Lewandowski
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1-beta
>
>
> {noformat}
> java.lang.AssertionError: Expected a different error message, but got 
> Operation failed - received 2 responses and 1 failures: INVALID_ROUTING from 
> /127.0.0.2:7012
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.cassandra.distributed.test.ReadRepairTest.readRepairRTRangeMovementTest(ReadRepairTest.java:424)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
>   at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38)
>   at 
> com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11)
>   at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35)
>   at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:232)
>   at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:55)
> {noformat}
> Manual testing in IntelliJ / trunk. Detected during investigation of test 
> failures of CASSANDRA-18464



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19177) SAI query timeouts can cause resource leaks

2023-12-06 Thread Mike Adamson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Adamson updated CASSANDRA-19177:
-
 Bug Category: Parent values: Degradation(12984)
   Complexity: Normal
Discovered By: Code Inspection
 Severity: Normal
 Assignee: Mike Adamson
   Status: Open  (was: Triage Needed)

> SAI query timeouts can cause resource leaks
> ---
>
> Key: CASSANDRA-19177
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19177
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/SAI
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
>
> There are several places in the SAI query path where a query timeout can 
> result in a resource not being closed correctly. We need to make sure that 
> wherever QueryContext.checkpoint is called we catch the resulting exception 
> and close any open resources.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19177) SAI query timeouts can cause resource leaks

2023-12-06 Thread Mike Adamson (Jira)
Mike Adamson created CASSANDRA-19177:


 Summary: SAI query timeouts can cause resource leaks
 Key: CASSANDRA-19177
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19177
 Project: Cassandra
  Issue Type: Bug
  Components: Feature/SAI
Reporter: Mike Adamson


There are several places in the SAI query path where a query timeout can result 
in a resource not being closed correctly. We need to make sure that wherever 
QueryContext.checkpoint is called we catch the resulting exception and close 
any open resources.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18824) Backport CASSANDRA-16418: Cleanup behaviour during node decommission caused missing replica

2023-12-06 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-18824:
-
Status: Open  (was: Patch Available)

> Backport CASSANDRA-16418: Cleanup behaviour during node decommission caused 
> missing replica
> ---
>
> Key: CASSANDRA-18824
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18824
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Szymon Miezal
>Assignee: Szymon Miezal
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> Node decommission triggers data transfer to other nodes. While this transfer 
> is in progress,
> receiving nodes temporarily hold token ranges in a pending state. However, 
> the cleanup process currently doesn't consider these pending ranges when 
> calculating token ownership.
> As a consequence, data that is already stored in sstables gets inadvertently 
> cleaned up.
> STR:
>  * Create two node cluster
>  * Create keyspace with RF=1
>  * Insert sample data (assert data is available when querying both nodes)
>  * Start decommission process of node 1
>  * Start running cleanup in a loop on node 2 until decommission on node 1 
> finishes
>  * Verify of all rows are in the cluster - it will fail as the previous step 
> removed some of the rows
> It seems that the cleanup process does not take into account the pending 
> ranges, it uses only the local ranges - 
> [https://github.com/apache/cassandra/blob/caad2f24f95b494d05c6b5d86a8d25fbee58d7c2/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L466].
> There are two solutions to the problem.
> One would be to change the cleanup process in a way that it start taking 
> pending ranges into account. Even thought it might sound tempting at first it 
> will require involving changes and a lot of testing effort.
> Alternatively we could interrupt/prevent the cleanup process from running 
> when any pending range on a node is detected. That sounds like a reasonable 
> alternative to the problem and something that is relatively easy to implement.
> The bug has been already fixed in 4.x with CASSANDRA-16418, the goal of this 
> ticket is to backport it to 3.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18824) Backport CASSANDRA-16418: Cleanup behaviour during node decommission caused missing replica

2023-12-06 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-18824:
-
Status: Patch Available  (was: Needs Committer)

> Backport CASSANDRA-16418: Cleanup behaviour during node decommission caused 
> missing replica
> ---
>
> Key: CASSANDRA-18824
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18824
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Szymon Miezal
>Assignee: Szymon Miezal
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> Node decommission triggers data transfer to other nodes. While this transfer 
> is in progress,
> receiving nodes temporarily hold token ranges in a pending state. However, 
> the cleanup process currently doesn't consider these pending ranges when 
> calculating token ownership.
> As a consequence, data that is already stored in sstables gets inadvertently 
> cleaned up.
> STR:
>  * Create two node cluster
>  * Create keyspace with RF=1
>  * Insert sample data (assert data is available when querying both nodes)
>  * Start decommission process of node 1
>  * Start running cleanup in a loop on node 2 until decommission on node 1 
> finishes
>  * Verify of all rows are in the cluster - it will fail as the previous step 
> removed some of the rows
> It seems that the cleanup process does not take into account the pending 
> ranges, it uses only the local ranges - 
> [https://github.com/apache/cassandra/blob/caad2f24f95b494d05c6b5d86a8d25fbee58d7c2/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L466].
> There are two solutions to the problem.
> One would be to change the cleanup process in a way that it start taking 
> pending ranges into account. Even thought it might sound tempting at first it 
> will require involving changes and a lot of testing effort.
> Alternatively we could interrupt/prevent the cleanup process from running 
> when any pending range on a node is detected. That sounds like a reasonable 
> alternative to the problem and something that is relatively easy to implement.
> The bug has been already fixed in 4.x with CASSANDRA-16418, the goal of this 
> ticket is to backport it to 3.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission

2023-12-06 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793789#comment-17793789
 ] 

Brandon Williams commented on CASSANDRA-16418:
--

bq. Feel free to create a new ticket to add it back or piggyback in some other 
ticket, I'd be glad to review.

That would be CASSANDRA-18824

> Unsafe to run nodetool cleanup during bootstrap or decommission
> ---
>
> Key: CASSANDRA-16418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16418
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: James Baker
>Assignee: Lindsey Zurovchak
>Priority: Normal
> Fix For: 4.0.8, 4.1.1, 5.0-alpha1, 5.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> What we expected: Running a cleanup is a safe operation; the result of 
> running a query after a cleanup should be the same as the result of running a 
> query before a cleanup.
> What actually happened: We ran a cleanup during a decommission. All the 
> streamed data was silently deleted, the bootstrap did not fail, the cluster's 
> data after the decommission was very different to the state before.
> Why: Cleanups do not take into account pending ranges and so the cleanup 
> thought that all the data that had just been streamed was redundant and so 
> deleted it. We think that this is symmetric with bootstraps, though have not 
> verified.
> Not sure if this is technically a bug but it was very surprising (and 
> seemingly undocumented) behaviour.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19118) Add support of vector type to COPY command

2023-12-06 Thread Jira


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andres de la Peña updated CASSANDRA-19118:
--
Reviewers: Andres de la Peña, Maxwell Guo, Stefan Miklosovic, Andres de la 
Peña  (was: Andres de la Peña, Maxwell Guo)
   Andres de la Peña, Maxwell Guo, Stefan Miklosovic, Andres de la 
Peña  (was: Andres de la Peña, Maxwell Guo, Stefan Miklosovic)
   Status: Review In Progress  (was: Patch Available)

> Add support of vector type to COPY command
> --
>
> Key: CASSANDRA-19118
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19118
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/cqlsh
>Reporter: Szymon Miezal
>Assignee: Szymon Miezal
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Currently it's not possible to import rows with vector literals via {{COPY}} 
> command.
> STR:
>  * Create a table
> {code:sql}
> CREATE TABLE testcopyfrom (id text PRIMARY KEY, embedding_vector 
> VECTOR
> {code}
>  * Prepare csv file with sample data, for instance:
> {code:sql}
> 1,"[0.1, 0.2, 0.3, 0.4, 0.5, 0.6]"
> 2,"[-0.1, -0.2, -0.3, -0.4, -0.5, -0.6]" {code}
>  * in cqlsh run
> {code:sql}
> COPY ks.testcopyfrom FROM data.csv
> {code}
> It will result in getting:
> {code:sql}
> TypeError: Received an argument of invalid type for column 
> "embedding_vector". Expected: , 
> Got: ; (required argument is not a float){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19084) Test Failure: IndexStreamingFailureTest.testAvailabilityAfterFailed*EntireFileStreaming

2023-12-06 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-19084:

Fix Version/s: 5.x

> Test Failure: 
> IndexStreamingFailureTest.testAvailabilityAfterFailed*EntireFileStreaming
> ---
>
> Key: CASSANDRA-19084
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19084
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/SAI
>Reporter: Michael Semb Wever
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> Flakies 
> https://app.circleci.com/pipelines/github/adelapena/cassandra/3329/workflows/f2124edd-fa0e-4bc5-ab03-ddfb886bf015/jobs/93097/tests
> {noformat}
> java.lang.NullPointerException
>   at 
> java.base/sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:133)
>   at 
> java.base/sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:155)
>   at java.base/java.net.URL.openStream(URL.java:1165)
>   at 
> java.base/java.lang.ClassLoader.getResourceAsStream(ClassLoader.java:1739)
>   at 
> net.bytebuddy.dynamic.ClassFileLocator$ForClassLoader.locate(ClassFileLocator.java:453)
>   at 
> net.bytebuddy.dynamic.ClassFileLocator$ForClassLoader.locate(ClassFileLocator.java:434)
>   at 
> net.bytebuddy.dynamic.scaffold.TypeWriter$Default$ForInlining.create(TypeWriter.java:4009)
>   at 
> net.bytebuddy.dynamic.scaffold.TypeWriter$Default.make(TypeWriter.java:2224)
>   at 
> net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$UsingTypeWriter.make(DynamicType.java:4050)
>   at 
> net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase.make(DynamicType.java:3734)
>   at 
> net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$Delegator.make(DynamicType.java:3986)
>   at 
> org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest$ByteBuddyHelper.installErrors(IndexStreamingFailureTest.java:154)
>   at 
> org.apache.cassandra.distributed.shared.AbstractBuilder$1.initialise(AbstractBuilder.java:360)
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.newInstance(AbstractCluster.java:312)
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegateForStartup(AbstractCluster.java:292)
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.startup(AbstractCluster.java:410)
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.startup(AbstractCluster.java:383)
>   at 
> org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest.testAvailabilityAfterStreaming(IndexStreamingFailureTest.java:123)
>   at 
> org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest.testAvailabilityAfterFailedNonEntireFileStreaming(IndexStreamingFailureTest.java:79)
> {noformat}
> {noformat}
> java.lang.IllegalStateException: Can't use shutdown instances, delegate is 
> null
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:285)
>   at 
> org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.transfer(DelegatingInvokableInstance.java:49)
>   at 
> org.apache.cassandra.distributed.api.IInvokableInstance.runsOnInstance(IInvokableInstance.java:45)
>   at 
> org.apache.cassandra.distributed.api.IInvokableInstance.runOnInstance(IInvokableInstance.java:46)
>   at 
> org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest.testAvailabilityAfterFailedEntireFileStreaming(IndexStreamingFailureTest.java:85)
> {noformat}
> https://ci-cassandra.apache.org/job/Cassandra-5.0/106/testReport/org.apache.cassandra.distributed.test.sai/IndexStreamingFailureTest/testAvailabilityAfterFailedNonEntireFileStreaming__jdk11_x86_64_novnode/
> {noformat}
> java.lang.RuntimeException: The class file could not be written
>   at 
> net.bytebuddy.dynamic.scaffold.TypeWriter$Default$ForInlining.create(TypeWriter.java:4021)
>   at 
> net.bytebuddy.dynamic.scaffold.TypeWriter$Default.make(TypeWriter.java:2224)
>   at 
> net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$UsingTypeWriter.make(DynamicType.java:4050)
>   at 
> net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase.make(DynamicType.java:3734)
>   at 
> net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$Delegator.make(DynamicType.java:3986)
>   at 
> org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest$ByteBuddyHelper.installErrors(IndexStreamingFailureTest.java:155)
>   at 
> org.apache.cassandra.distributed.shared.AbstractBuilder$1.initialise(AbstractBuilder.java:360)
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.newInstance(AbstractCluster.java:312)
>   at 
> 

[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission

2023-12-06 Thread Paulo Motta (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793739#comment-17793739
 ] 

Paulo Motta commented on CASSANDRA-16418:
-

bq. However, from the API pov CompactionManager.performCleanup can be now 
called anytime - I think it was important precondition for that method - 
wouldn't be good to keep it there, just changing the condition to check pending 
ranges rather than joining status?

Good point, this was overlooked during review - I suggested removing that just 
to cleanup but looking back I think there is value in keeping it for safety if 
this API is used elsewhere. Feel free to create a new ticket to add it back or 
piggyback in some other ticket, I'd be glad to review.

To me it'd be nice that CompactionManager API is a dumb local API unaware of 
token ranges/membership status since it's just a local operation, but 
practically these concerns are mixed across the codebase so developers expect 
that any local API is safe from a distributed standpoint.

> Unsafe to run nodetool cleanup during bootstrap or decommission
> ---
>
> Key: CASSANDRA-16418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16418
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: James Baker
>Assignee: Lindsey Zurovchak
>Priority: Normal
> Fix For: 4.0.8, 4.1.1, 5.0-alpha1, 5.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> What we expected: Running a cleanup is a safe operation; the result of 
> running a query after a cleanup should be the same as the result of running a 
> query before a cleanup.
> What actually happened: We ran a cleanup during a decommission. All the 
> streamed data was silently deleted, the bootstrap did not fail, the cluster's 
> data after the decommission was very different to the state before.
> Why: Cleanups do not take into account pending ranges and so the cleanup 
> thought that all the data that had just been streamed was redundant and so 
> deleted it. We think that this is symmetric with bootstraps, though have not 
> verified.
> Not sure if this is technically a bug but it was very surprising (and 
> seemingly undocumented) behaviour.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-18969: source files missing from sources jars due to maven … [cassandra-java-driver]

2023-12-06 Thread via GitHub


michaelsembwever commented on PR #1900:
URL: 
https://github.com/apache/cassandra-java-driver/pull/1900#issuecomment-1842960983

   Here's the commit (PR on top) to add the required checksums to the 
distribution* artefacts
   https://github.com/hhughes/java-driver/pull/1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19142) logback-core-1.2.12.jar vulnerability: CVE-2023-6378

2023-12-06 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19142:
-
Reviewers: Stefan Miklosovic

> logback-core-1.2.12.jar vulnerability: CVE-2023-6378
> 
>
> Key: CASSANDRA-19142
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19142
> Project: Cassandra
>  Issue Type: Bug
>  Components: Dependencies
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 3.0.30, 3.11.17, 4.0.12, 4.1.4, 5.0-beta2
>
>
> https://nvd.nist.gov/vuln/detail/CVE-2023-6378
> {quote}
> A serialization vulnerability in logback receiver component part of logback 
> version 1.4.11 allows an attacker to mount a Denial-Of-Service attack by 
> sending poisoned data. 
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19142) logback-core-1.2.12.jar vulnerability: CVE-2023-6378

2023-12-06 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19142:
-
  Fix Version/s: 3.0.30
 3.11.17
 4.0.12
 4.1.4
 5.0-beta2
 (was: 3.0.x)
 (was: 3.11.x)
 (was: 5.x)
 (was: 4.0.x)
 (was: 4.1.x)
 (was: 5.0-rc)
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra/commit/a1421ec324e4bf8ab46df2a72af298f9286e0d59
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Thanks for the review! Committed.

> logback-core-1.2.12.jar vulnerability: CVE-2023-6378
> 
>
> Key: CASSANDRA-19142
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19142
> Project: Cassandra
>  Issue Type: Bug
>  Components: Dependencies
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 3.0.30, 3.11.17, 4.0.12, 4.1.4, 5.0-beta2
>
>
> https://nvd.nist.gov/vuln/detail/CVE-2023-6378
> {quote}
> A serialization vulnerability in logback receiver component part of logback 
> version 1.4.11 allows an attacker to mount a Denial-Of-Service attack by 
> sending poisoned data. 
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) branch cassandra-4.0 updated (e1b0b44f9e -> 8e5fc74c9a)

2023-12-06 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a change to branch cassandra-4.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git


from e1b0b44f9e Fix repeated tests on CircleCI and 
long-testsome/burn-testsome targets
 new a1421ec324 Suppress CVE-2023-6378
 new 2e3d7e76f5 Merge branch 'cassandra-3.0' into cassandra-3.11
 new 8e5fc74c9a Merge branch 'cassandra-3.11' into cassandra-4.0

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .build/dependency-check-suppressions.xml | 9 +
 CHANGES.txt  | 1 +
 2 files changed, 10 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) 01/01: Merge branch 'cassandra-3.11' into cassandra-4.0

2023-12-06 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a commit to branch cassandra-4.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit 8e5fc74c9a3d734bfded9bde3fff399d4b67d65a
Merge: e1b0b44f9e 2e3d7e76f5
Author: Brandon Williams 
AuthorDate: Wed Dec 6 06:32:32 2023 -0600

Merge branch 'cassandra-3.11' into cassandra-4.0

 .build/dependency-check-suppressions.xml | 9 +
 CHANGES.txt  | 1 +
 2 files changed, 10 insertions(+)

diff --cc .build/dependency-check-suppressions.xml
index d806926aaf,774e2e7886..0c32a06b17
--- a/.build/dependency-check-suppressions.xml
+++ b/.build/dependency-check-suppressions.xml
@@@ -62,6 -96,17 +62,15 @@@
  CVE-2022-42003
  CVE-2022-42004
  CVE-2023-35116
 -  CVE-2022-42003
 -  CVE-2022-42004
  
  
+ 
+ 
+ ^pkg:maven/ch\.qos\.logback/logback\-core@.*$
+ CVE-2023-6378
+ 
+ 
+ ^pkg:maven/ch\.qos\.logback/logback\-classic@.*$
+ CVE-2023-6378
+ 
  
diff --cc CHANGES.txt
index f79af3a59b,96e34db044..771cf1f3c0
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -20,8 -2,8 +20,9 @@@ Merged from 3.11
   * Fix delayed SSTable release with unsafe_aggressive_sstable_expiration 
(CASSANDRA-18756)
   * Revert CASSANDRA-18543 (CASSANDRA-18854)
   * Fix NPE when using udfContext in UDF after a restart of a node 
(CASSANDRA-18739)
 + * Moved jflex from runtime to build dependencies (CASSANDRA-18664)
  Merged from 3.0:
+  * Suppress CVE-2023-6378 (CASSANDRA-19142) 
   * Do not set RPC_READY to false on transports shutdown in order to not fail 
counter updates for deployments with coordinator and storage nodes with 
transports turned off (CASSANDRA-18935)
   * Suppress CVE-2023-44487 (CASSANDRA-18943)
   * Fix nodetool enable/disablebinary to correctly set rpc readiness in gossip 
(CASSANDRA-18935)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) 01/01: Merge branch 'cassandra-4.1' into cassandra-5.0

2023-12-06 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a commit to branch cassandra-5.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit fdfc5e614d6d7e3e84f0200870d6ac34917f601d
Merge: fe7997884d 13e5956285
Author: Brandon Williams 
AuthorDate: Wed Dec 6 06:33:36 2023 -0600

Merge branch 'cassandra-4.1' into cassandra-5.0

 .build/dependency-check-suppressions.xml | 10 ++
 CHANGES.txt  |  5 +
 2 files changed, 15 insertions(+)

diff --cc CHANGES.txt
index 9b4900eb03,79e8ee7a84..2602ec4b23
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -64,10 -19,17 +64,15 @@@ Merged from 4.0
   * Improve performance of compactions when table does not have an index 
(CASSANDRA-18773)
   * JMH improvements - faster build and async profiler (CASSANDRA-18871)
   * Enable 3rd party JDK installations for Debian package (CASSANDRA-18844)
 - * Fix NTS log message when an unrecognized strategy option is passed 
(CASSANDRA-18679)
 - * Fix BulkLoader ignoring cipher suites options (CASSANDRA-18582)
 - * Migrate Python optparse to argparse (CASSANDRA-17914)
  Merged from 3.11:
 - * Fix delayed SSTable release with unsafe_aggressive_sstable_expiration 
(CASSANDRA-18756)
 - * Revert CASSANDRA-18543 (CASSANDRA-18854)
 - * Fix NPE when using udfContext in UDF after a restart of a node 
(CASSANDRA-18739)
  Merged from 3.0:
++<<< HEAD
++===
+  * Suppress CVE-2023-6378 (CASSANDRA-19142) 
+  * Do not set RPC_READY to false on transports shutdown in order to not fail 
counter updates for deployments with coordinator and storage nodes with 
transports turned off (CASSANDRA-18935)
++>>> cassandra-4.1
   * Suppress CVE-2023-44487 (CASSANDRA-18943)
 + * Implement the logic in bin/stop-server (CASSANDRA-18838)
   * Fix nodetool enable/disablebinary to correctly set rpc readiness in gossip 
(CASSANDRA-18935)
   * Implement the logic in bin/stop-server (CASSANDRA-18838) 
   * Upgrade snappy-java to 1.1.10.4 (CASSANDRA-18878)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) 01/01: Merge branch 'cassandra-4.0' into cassandra-4.1

2023-12-06 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a commit to branch cassandra-4.1
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit 13e595628548e7cdf06dd666a3d839af8fad6655
Merge: 4059faf5b9 8e5fc74c9a
Author: Brandon Williams 
AuthorDate: Wed Dec 6 06:32:48 2023 -0600

Merge branch 'cassandra-4.0' into cassandra-4.1

 .build/dependency-check-suppressions.xml | 9 +
 CHANGES.txt  | 1 +
 2 files changed, 10 insertions(+)

diff --cc CHANGES.txt
index 10096d23f2,771cf1f3c0..79e8ee7a84
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -26,7 -20,9 +26,8 @@@ Merged from 3.11
   * Fix delayed SSTable release with unsafe_aggressive_sstable_expiration 
(CASSANDRA-18756)
   * Revert CASSANDRA-18543 (CASSANDRA-18854)
   * Fix NPE when using udfContext in UDF after a restart of a node 
(CASSANDRA-18739)
 - * Moved jflex from runtime to build dependencies (CASSANDRA-18664)
  Merged from 3.0:
+  * Suppress CVE-2023-6378 (CASSANDRA-19142) 
   * Do not set RPC_READY to false on transports shutdown in order to not fail 
counter updates for deployments with coordinator and storage nodes with 
transports turned off (CASSANDRA-18935)
   * Suppress CVE-2023-44487 (CASSANDRA-18943)
   * Fix nodetool enable/disablebinary to correctly set rpc readiness in gossip 
(CASSANDRA-18935)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) branch cassandra-3.11 updated (6d7cd61412 -> 2e3d7e76f5)

2023-12-06 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a change to branch cassandra-3.11
in repository https://gitbox.apache.org/repos/asf/cassandra.git


from 6d7cd61412 Merge branch 'cassandra-3.0' into cassandra-3.11
 new a1421ec324 Suppress CVE-2023-6378
 new 2e3d7e76f5 Merge branch 'cassandra-3.0' into cassandra-3.11

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .build/dependency-check-suppressions.xml | 9 +
 CHANGES.txt  | 1 +
 2 files changed, 10 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) branch cassandra-4.1 updated (4059faf5b9 -> 13e5956285)

2023-12-06 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a change to branch cassandra-4.1
in repository https://gitbox.apache.org/repos/asf/cassandra.git


from 4059faf5b9 Merge branch 'cassandra-4.0' into cassandra-4.1
 new a1421ec324 Suppress CVE-2023-6378
 new 2e3d7e76f5 Merge branch 'cassandra-3.0' into cassandra-3.11
 new 8e5fc74c9a Merge branch 'cassandra-3.11' into cassandra-4.0
 new 13e5956285 Merge branch 'cassandra-4.0' into cassandra-4.1

The 4 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .build/dependency-check-suppressions.xml | 9 +
 CHANGES.txt  | 1 +
 2 files changed, 10 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) 01/01: Merge branch 'cassandra-5.0' into trunk

2023-12-06 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit ad86c9d201e7b17eb7cd3cbddb315e151062727f
Merge: c5a2781b22 fdfc5e614d
Author: Brandon Williams 
AuthorDate: Wed Dec 6 06:34:26 2023 -0600

Merge branch 'cassandra-5.0' into trunk

 .build/dependency-check-suppressions.xml | 10 ++
 CHANGES.txt  |  1 +
 2 files changed, 11 insertions(+)

diff --cc CHANGES.txt
index a8c465c72a,2602ec4b23..6fcb72dcf1
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -65,9 -65,12 +65,10 @@@ Merged from 4.0
   * JMH improvements - faster build and async profiler (CASSANDRA-18871)
   * Enable 3rd party JDK installations for Debian package (CASSANDRA-18844)
  Merged from 3.11:
 + * Revert CASSANDRA-18543 (CASSANDRA-18854)
  Merged from 3.0:
 -<<< HEAD
 -===
+  * Suppress CVE-2023-6378 (CASSANDRA-19142) 
   * Do not set RPC_READY to false on transports shutdown in order to not fail 
counter updates for deployments with coordinator and storage nodes with 
transports turned off (CASSANDRA-18935)
 ->>> cassandra-4.1
   * Suppress CVE-2023-44487 (CASSANDRA-18943)
   * Implement the logic in bin/stop-server (CASSANDRA-18838)
   * Fix nodetool enable/disablebinary to correctly set rpc readiness in gossip 
(CASSANDRA-18935)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) branch trunk updated (c5a2781b22 -> ad86c9d201)

2023-12-06 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a change to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


from c5a2781b22 Enable bytebuddy rule after starting nodes to fix 
DecommissionAvoidWriteTimeoutsTest
 new a1421ec324 Suppress CVE-2023-6378
 new 2e3d7e76f5 Merge branch 'cassandra-3.0' into cassandra-3.11
 new 8e5fc74c9a Merge branch 'cassandra-3.11' into cassandra-4.0
 new 13e5956285 Merge branch 'cassandra-4.0' into cassandra-4.1
 new fdfc5e614d Merge branch 'cassandra-4.1' into cassandra-5.0
 new ad86c9d201 Merge branch 'cassandra-5.0' into trunk

The 6 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .build/dependency-check-suppressions.xml | 10 ++
 CHANGES.txt  |  1 +
 2 files changed, 11 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) branch cassandra-5.0 updated (fe7997884d -> fdfc5e614d)

2023-12-06 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a change to branch cassandra-5.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git


from fe7997884d Write ccm clusters under test's TMPDIR
 new a1421ec324 Suppress CVE-2023-6378
 new 2e3d7e76f5 Merge branch 'cassandra-3.0' into cassandra-3.11
 new 8e5fc74c9a Merge branch 'cassandra-3.11' into cassandra-4.0
 new 13e5956285 Merge branch 'cassandra-4.0' into cassandra-4.1
 new fdfc5e614d Merge branch 'cassandra-4.1' into cassandra-5.0

The 5 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .build/dependency-check-suppressions.xml | 10 ++
 CHANGES.txt  |  5 +
 2 files changed, 15 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) branch cassandra-3.0 updated: Suppress CVE-2023-6378

2023-12-06 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a commit to branch cassandra-3.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/cassandra-3.0 by this push:
 new a1421ec324 Suppress CVE-2023-6378
a1421ec324 is described below

commit a1421ec324e4bf8ab46df2a72af298f9286e0d59
Author: Brandon Williams 
AuthorDate: Fri Dec 1 08:43:51 2023 -0600

Suppress CVE-2023-6378

Patch by brandonwilliams, reviewed by smiklosovic for CASSANDRA-19142
---
 .build/dependency-check-suppressions.xml | 9 +
 CHANGES.txt  | 1 +
 2 files changed, 10 insertions(+)

diff --git a/.build/dependency-check-suppressions.xml 
b/.build/dependency-check-suppressions.xml
index 1d9fba6218..04a74bb4b2 100644
--- a/.build/dependency-check-suppressions.xml
+++ b/.build/dependency-check-suppressions.xml
@@ -107,4 +107,13 @@
 CVE-2019-17267
 
 
+
+
+^pkg:maven/ch\.qos\.logback/logback\-core@.*$
+CVE-2023-6378
+
+
+^pkg:maven/ch\.qos\.logback/logback\-classic@.*$
+CVE-2023-6378
+
 
diff --git a/CHANGES.txt b/CHANGES.txt
index 10c771ae2d..b53bc55d26 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.0.30
+ * Suppress CVE-2023-6378 (CASSANDRA-19142) 
  * Do not set RPC_READY to false on transports shutdown in order to not fail 
counter updates for deployments with coordinator and storage nodes with 
transports turned off (CASSANDRA-18935)
  * Suppress CVE-2023-44487 (CASSANDRA-18943)
  * Fix nodetool enable/disablebinary to correctly set rpc readiness in gossip 
(CASSANDRA-18935)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) 01/01: Merge branch 'cassandra-3.0' into cassandra-3.11

2023-12-06 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a commit to branch cassandra-3.11
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit 2e3d7e76f5763698a3a2e161d6ec4773654d3b29
Merge: 6d7cd61412 a1421ec324
Author: Brandon Williams 
AuthorDate: Wed Dec 6 06:32:19 2023 -0600

Merge branch 'cassandra-3.0' into cassandra-3.11

 .build/dependency-check-suppressions.xml | 9 +
 CHANGES.txt  | 1 +
 2 files changed, 10 insertions(+)

diff --cc .build/dependency-check-suppressions.xml
index e3e244e62b,04a74bb4b2..774e2e7886
--- a/.build/dependency-check-suppressions.xml
+++ b/.build/dependency-check-suppressions.xml
@@@ -90,14 -89,31 +90,23 @@@
  CVE-2019-0205
  
  
 -
 -
 +
  
 -^pkg:maven/org\.codehaus\.jackson/jackson\-mapper\-asl@.*$
 -CVE-2017-7525
 -CVE-2017-15095
 -CVE-2017-17485
 -CVE-2018-5968
 -CVE-2018-14718
 -CVE-2018-1000873
 -CVE-2018-7489
 -CVE-2019-10172
 -CVE-2019-14540
 -CVE-2019-14893
 -CVE-2019-16335
 -CVE-2019-17267
 +^pkg:maven/com\.fasterxml\.jackson\.core/jackson\-databind@.*$
 +CVE-2022-42003
 +CVE-2022-42004
 +CVE-2023-35116
 +  CVE-2022-42003
 +  CVE-2022-42004
  
  
+ 
+ 
+ ^pkg:maven/ch\.qos\.logback/logback\-core@.*$
+ CVE-2023-6378
+ 
+ 
+ ^pkg:maven/ch\.qos\.logback/logback\-classic@.*$
+ CVE-2023-6378
+ 
  
diff --cc CHANGES.txt
index a6cce43bd9,b53bc55d26..96e34db044
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,8 -1,5 +1,9 @@@
 -3.0.30
 +3.11.17
 + * Fix delayed SSTable release with unsafe_aggressive_sstable_expiration 
(CASSANDRA-18756)
 + * Revert CASSANDRA-18543 (CASSANDRA-18854)
 + * Fix NPE when using udfContext in UDF after a restart of a node 
(CASSANDRA-18739)
 +Merged from 3.0:
+  * Suppress CVE-2023-6378 (CASSANDRA-19142) 
   * Do not set RPC_READY to false on transports shutdown in order to not fail 
counter updates for deployments with coordinator and storage nodes with 
transports turned off (CASSANDRA-18935)
   * Suppress CVE-2023-44487 (CASSANDRA-18943)
   * Fix nodetool enable/disablebinary to correctly set rpc readiness in gossip 
(CASSANDRA-18935)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19166) StackOverflowError on ALTER after many previous schema changes

2023-12-06 Thread Jacek Lewandowski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski updated CASSANDRA-19166:
--
Reviewers: Caleb Rackliffe, Jacek Lewandowski, Jacek Lewandowski  (was: 
Caleb Rackliffe, Jacek Lewandowski)
   Caleb Rackliffe, Jacek Lewandowski, Jacek Lewandowski  (was: 
Caleb Rackliffe, Jacek Lewandowski)
   Status: Review In Progress  (was: Patch Available)

> StackOverflowError on ALTER after many previous schema changes
> --
>
> Key: CASSANDRA-19166
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19166
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Abe Ratnofsky
>Assignee: Abe Ratnofsky
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Since 4.1, TableMetadataRefCache re-wraps its fields in 
> Collections.unmodifiableMap on every local schema update. This causes 
> TableMetadataRefCache's Map fields to reference chains of nested 
> UnmodifiableMaps. Eventually, this leads to a StackOverflowError on get(), 
> which has to traverse lots of these maps to fetch the actual value.
> https://github.com/apache/cassandra/blob/4059faf5b948c5a285c25fb0f2e4c4288ee7c305/src/java/org/apache/cassandra/schema/TableMetadataRefCache.java#L53
> The issue goes away on restart, since TableMetadataRefCache is reloaded from 
> disk.
> See CASSANDRA-17044, when TableMetadataRefCache was introduced. This issue 
> was discovered on a real test cluster where schema changes were failing, via 
> a heap dump.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19166) StackOverflowError on ALTER after many previous schema changes

2023-12-06 Thread Jacek Lewandowski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski updated CASSANDRA-19166:
--
Status: Ready to Commit  (was: Review In Progress)

> StackOverflowError on ALTER after many previous schema changes
> --
>
> Key: CASSANDRA-19166
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19166
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Abe Ratnofsky
>Assignee: Abe Ratnofsky
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Since 4.1, TableMetadataRefCache re-wraps its fields in 
> Collections.unmodifiableMap on every local schema update. This causes 
> TableMetadataRefCache's Map fields to reference chains of nested 
> UnmodifiableMaps. Eventually, this leads to a StackOverflowError on get(), 
> which has to traverse lots of these maps to fetch the actual value.
> https://github.com/apache/cassandra/blob/4059faf5b948c5a285c25fb0f2e4c4288ee7c305/src/java/org/apache/cassandra/schema/TableMetadataRefCache.java#L53
> The issue goes away on restart, since TableMetadataRefCache is reloaded from 
> disk.
> See CASSANDRA-17044, when TableMetadataRefCache was introduced. This issue 
> was discovered on a real test cluster where schema changes were failing, via 
> a heap dump.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19166) StackOverflowError on ALTER after many previous schema changes

2023-12-06 Thread Jacek Lewandowski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793671#comment-17793671
 ] 

Jacek Lewandowski commented on CASSANDRA-19166:
---

tests looks ok, I'm going to merge it

> StackOverflowError on ALTER after many previous schema changes
> --
>
> Key: CASSANDRA-19166
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19166
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Abe Ratnofsky
>Assignee: Abe Ratnofsky
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Since 4.1, TableMetadataRefCache re-wraps its fields in 
> Collections.unmodifiableMap on every local schema update. This causes 
> TableMetadataRefCache's Map fields to reference chains of nested 
> UnmodifiableMaps. Eventually, this leads to a StackOverflowError on get(), 
> which has to traverse lots of these maps to fetch the actual value.
> https://github.com/apache/cassandra/blob/4059faf5b948c5a285c25fb0f2e4c4288ee7c305/src/java/org/apache/cassandra/schema/TableMetadataRefCache.java#L53
> The issue goes away on restart, since TableMetadataRefCache is reloaded from 
> disk.
> See CASSANDRA-17044, when TableMetadataRefCache was introduced. This issue 
> was discovered on a real test cluster where schema changes were failing, via 
> a heap dump.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19166) StackOverflowError on ALTER after many previous schema changes

2023-12-06 Thread Jacek Lewandowski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793582#comment-17793582
 ] 

Jacek Lewandowski edited comment on CASSANDRA-19166 at 12/6/23 11:29 AM:
-

Test links in the PRs

4.1 - https://github.com/apache/cassandra/pull/2964
5.0 - https://github.com/apache/cassandra/pull/2965 



was (Author: jlewandowski):
4.1 j8 
https://app.circleci.com/pipelines/github/jacek-lewandowski/cassandra/1149/workflows/4de1eef7-1b0a-4975-a205-c6dbb5b8f37b

> StackOverflowError on ALTER after many previous schema changes
> --
>
> Key: CASSANDRA-19166
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19166
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Abe Ratnofsky
>Assignee: Abe Ratnofsky
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Since 4.1, TableMetadataRefCache re-wraps its fields in 
> Collections.unmodifiableMap on every local schema update. This causes 
> TableMetadataRefCache's Map fields to reference chains of nested 
> UnmodifiableMaps. Eventually, this leads to a StackOverflowError on get(), 
> which has to traverse lots of these maps to fetch the actual value.
> https://github.com/apache/cassandra/blob/4059faf5b948c5a285c25fb0f2e4c4288ee7c305/src/java/org/apache/cassandra/schema/TableMetadataRefCache.java#L53
> The issue goes away on restart, since TableMetadataRefCache is reloaded from 
> disk.
> See CASSANDRA-17044, when TableMetadataRefCache was introduced. This issue 
> was discovered on a real test cluster where schema changes were failing, via 
> a heap dump.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



  1   2   >