[jira] [Updated] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails due to IP change
[ https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aldo updated CASSANDRA-19178: - Summary: Cluster upgrade 3.x -> 4.x fails due to IP change (was: Cluster upgrade 3.x -> 4.x fails with no internode encryption) > Cluster upgrade 3.x -> 4.x fails due to IP change > - > > Key: CASSANDRA-19178 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19178 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Aldo >Priority: Normal > Attachments: cassandra7.downgrade.log, cassandra7.log > > > I have a Docker swarm cluster with 3 distinct Cassandra services (named > {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different > servers. The 3 services are running the version 3.11.16, using the official > Cassandra image 3.11.16 on Docker Hub. The first service is configured just > with the following environment variables > {code:java} > CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7" > CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code} > which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance > the _cassandra.yaml_ for the first service contains the following (and the > rest is the image default): > {code:java} > # grep tasks /etc/cassandra/cassandra.yaml > - seeds: "tasks.cassandra7,tasks.cassandra9" > listen_address: tasks.cassandra7 > broadcast_address: tasks.cassandra7 > broadcast_rpc_address: tasks.cassandra7 {code} > Other services (8 and 9) have a similar configuration, obviously with a > different {{CASSANDRA_LISTEN_ADDRESS }}(\{{{}tasks.cassandra8}} and > {{{}tasks.cassandra9{}}}). > The cluster is running smoothly and all the nodes are perfectly able to > rejoin the cluster whichever event occurs, thanks to the Docker Swarm > {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for > Docker swarm to restart it, force update it in order to force a restart, > scale to 0 and then 1 the service, restart an entire server, turn off and > then turn on all the 3 servers. Never found an issue on this. > I also just completed a full upgrade of the cluster from version 2.2.8 to > 3.11.16 (simply upgrading the Docker official image associated with the > services) without issues. I was also able, thanks to a 2.2.8 snapshot on each > server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I > finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables > have now the {{me-*}} prefix. > > The problem I'm facing right now is the upgrade from 3.11.16 to 4.x. The > procedure that I follow is very simple: > # I start from the _cassandra7_ service (which is a seed node) > # {{nodetool drain}} > # Wait for the {{DRAINING ... DRAINED}} messages to appear in the log > # Upgrade the Docker image of _cassandra7_ to the official 4.1.3 version > The procedure is exactly the same I followed for the upgrade 2.2.8 --> > 3.11.16, obviously with a different version at step 4. Unfortunately the > upgrade 3.x --> 4.x is not working, the _cassandra7_ service restarts and > attempts to communicate with the other seed node ({_}cassandra9{_}) but the > log of _cassandra7_ shows the following: > {code:java} > INFO [Messaging-EventLoop-3-3] 2023-12-06 17:15:04,727 > OutboundConnectionInitiator.java:390 - Failed to connect to peer > tasks.cassandra9/10.0.2.196:7000(tasks.cassandra9/10.0.2.196:7000) > io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: > Connection reset by peer{code} > The relevant port of the log, related to the missing internode communication, > is attached in _cassandra7.log_ > In the log of _cassandra9_ there is nothing after the abovementioned step #4. > So only _cassandra7_ is saying something in the logs. > I tried with multiple versions (4.0.11 but also 4.0.0) but the outcome is > always the same. Of course when I follow the steps 1..3, then restore the 3.x > snapshot and finally perform the step #4 using the official 3.11.16 version > the node 7 restarts correctly and joins the cluster. I attached the relevant > part of the log (see {_}cassandra7.downgrade.log{_}) where you can see that > node 7 and 9 can communicate. > I suspect this could be related to the port 7000 now (with Cassandra 4.x) > supporting both encrypted and unencrypted traffic. As stated previously I'm > using the untouched official Cassandra images so all my cluster, inside the > Docker Swarm, is not (and has never been) configured with encryption. > I can also add the following: if I perform the 4 above steps also for the > _cassandra9_ and _cassandra8_ services, in the end the cluster works. But > this is not acceptable, because the cluster is unavailable until I finish the > full upgrade of all nodes:
[jira] [Updated] (CASSANDRA-19180) Support reloading certificate stores in cassandra-java-driver
[ https://issues.apache.org/jira/browse/CASSANDRA-19180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abe Ratnofsky updated CASSANDRA-19180: -- Change Category: Operability Complexity: Normal Status: Open (was: Triage Needed) > Support reloading certificate stores in cassandra-java-driver > - > > Key: CASSANDRA-19180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19180 > Project: Cassandra > Issue Type: New Feature > Components: Client/java-driver >Reporter: Abe Ratnofsky >Assignee: Abe Ratnofsky >Priority: Normal > > Currently, apache/cassandra-java-driver does not reload SSLContext when the > underlying certificate store files change. When the DefaultSslEngineFactory > (and the other factories) are set up, they build a fixed instance of > javax.net.ssl.SSLContext that doesn't change: > https://github.com/apache/cassandra-java-driver/blob/12e3e3ea027c51c5807e5e46ba542f894edfa4e7/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/DefaultSslEngineFactory.java#L74 > This fixed SSLContext is used to negotiate SSL with the cluster, and if a > keystore is reloaded on disk it isn't picked up by the driver, and future > reconnections will fail if the keystore certificates have expired by the time > they're used to handshake a new connection. > We should reload client certificates so that applications that provide them > can use short-lived certificates and not require a bounce to pick up new > certificates. This is especially relevant in a world with CASSANDRA-18554 and > broad use of mTLS. > I have a patch for this that is nearly ready. Now that the project has moved > under apache/ - who can I work with to understand how CI works now? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19180) Support reloading certificate stores in cassandra-java-driver
[ https://issues.apache.org/jira/browse/CASSANDRA-19180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abe Ratnofsky updated CASSANDRA-19180: -- Impacts: Clients (was: None) > Support reloading certificate stores in cassandra-java-driver > - > > Key: CASSANDRA-19180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19180 > Project: Cassandra > Issue Type: New Feature > Components: Client/java-driver >Reporter: Abe Ratnofsky >Assignee: Abe Ratnofsky >Priority: Normal > > Currently, apache/cassandra-java-driver does not reload SSLContext when the > underlying certificate store files change. When the DefaultSslEngineFactory > (and the other factories) are set up, they build a fixed instance of > javax.net.ssl.SSLContext that doesn't change: > https://github.com/apache/cassandra-java-driver/blob/12e3e3ea027c51c5807e5e46ba542f894edfa4e7/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/DefaultSslEngineFactory.java#L74 > This fixed SSLContext is used to negotiate SSL with the cluster, and if a > keystore is reloaded on disk it isn't picked up by the driver, and future > reconnections will fail if the keystore certificates have expired by the time > they're used to handshake a new connection. > We should reload client certificates so that applications that provide them > can use short-lived certificates and not require a bounce to pick up new > certificates. This is especially relevant in a world with CASSANDRA-18554 and > broad use of mTLS. > I have a patch for this that is nearly ready. Now that the project has moved > under apache/ - who can I work with to understand how CI works now? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19180) Support reloading certificate stores in cassandra-java-driver
Abe Ratnofsky created CASSANDRA-19180: - Summary: Support reloading certificate stores in cassandra-java-driver Key: CASSANDRA-19180 URL: https://issues.apache.org/jira/browse/CASSANDRA-19180 Project: Cassandra Issue Type: New Feature Components: Client/java-driver Reporter: Abe Ratnofsky Assignee: Abe Ratnofsky Currently, apache/cassandra-java-driver does not reload SSLContext when the underlying certificate store files change. When the DefaultSslEngineFactory (and the other factories) are set up, they build a fixed instance of javax.net.ssl.SSLContext that doesn't change: https://github.com/apache/cassandra-java-driver/blob/12e3e3ea027c51c5807e5e46ba542f894edfa4e7/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/DefaultSslEngineFactory.java#L74 This fixed SSLContext is used to negotiate SSL with the cluster, and if a keystore is reloaded on disk it isn't picked up by the driver, and future reconnections will fail if the keystore certificates have expired by the time they're used to handshake a new connection. We should reload client certificates so that applications that provide them can use short-lived certificates and not require a bounce to pick up new certificates. This is especially relevant in a world with CASSANDRA-18554 and broad use of mTLS. I have a patch for this that is nearly ready. Now that the project has moved under apache/ - who can I work with to understand how CI works now? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails with no internode encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793944#comment-17793944 ] Aldo edited comment on CASSANDRA-19178 at 12/6/23 11:06 PM: I apologize in advance if reopening is not the correct behavior, please tell me if I need to open a new issue. I think I've discovered the source cause of the issue, and wonder if it's a bug or it's caused by a misconfiguration on my side. Using {{nodetool setlogginglevel org.apache.cassandra TRACE}} on both the 4.x upgraded node (cassandra7) and on the running 3.x seed node (cassandra9) I was able to isolate the relevant logs: On cassandra7: {code:java} TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,410 EndpointMessagingVersions.java:67 - Assuming current protocol version for tasks.cassandra9/10.0.2.92:7000 TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,410 OutboundConnectionInitiator.java:131 - creating outbound bootstrap to peer: (tasks.cassandra9/10.0.2.92:7000, tasks.cassandra9/10.0.2.92:7000), framing: CRC, encryption: unencrypted, requestVersion: 12 TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,411 OutboundConnectionInitiator.java:236 - starting handshake with peer tasks.cassandra9/10.0.2.92:7000(tasks.cassandra9/10.0.2.92:7000), msg = Initiate(request: 12, min: 10, max: 12, type: URGENT_MESSAGES, framing: true, from: tasks.cassandra7/10.0.2.137:7000) INFO [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,412 OutboundConnectionInitiator.java:390 - Failed to connect to peer tasks.cassandra9/10.0.2.92:7000(tasks.cassandra9/10.0.2.92:7000) io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer {code} On cassandra9: {code:java} TRACE [ACCEPT-tasks.cassandra9/10.0.2.92] 2023-12-06 22:16:56,411 MessagingService.java:1315 - Connection version 12 from /10.0.2.137 TRACE [MessagingService-Incoming-/10.0.2.137] 2023-12-06 22:16:56,412 IncomingTcpConnection.java:111 - IOException reading from socket; closing java.io.IOException: Peer-used messaging version 12 is larger than max supported 11 at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:153) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:98) TRACE [MessagingService-Incoming-/10.0.2.137] 2023-12-06 22:16:56,412 IncomingTcpConnection.java:125 - Closing socket Socket[addr=/10.0.2.137,port=45680,localport=7000] - isclosed: false {code} So it seems there is a mismatch on this {_}messaging version{_}. I'm trying to understand the behaviour of _EndpointMessagingVersions.java_ and _OutboundConnectionInitiator.java_ on the 4.1.x trunk and it seems that there are few facts: # the internal map of _EndpointMessagingVersions_ on the node just restarted (cassandra7) for sure doesn't include information about the existing node (cassandra9). This because on my network configuration cassandra7 (or more precisely the tasks.cassandra7 hostname) changed IP due to the restart. So cassandra9 (the 3.x running node) cannot send its messaging version (=11) to the newest cassandra7 until the handshake completes. # therefore inside _OutboundConnectionInitiator_ the messaging version for the cassandra7–> cassandra9 handshake is assumed equal to the current (=12) # when the 3.x node (cassandra9) determines the messaging version mismatch it throws an IOException and closed the connection # the 4.x node (cassandra7) just sees a connection reset by peer and seems not capable of downgrade the messaging version and retry the handshake I can again state that a similar upgrade path, with different involved versions (2.2.8 --> to 3.11.16) on the same exact architecture, involving the same Docker swarm services, the same IP-changing behaviour, etc... worked like a charm. So I think something changed on the source code and breaked that behavior when the upgrade is 3.11.16 --> 4.1.3. was (Author: JIRAUSER303409): I apologize in advance if reopening is not the correct behavior, please tell me if I need to open a new issue. I think I've discovered the source cause of the issue, and wonder if it's a bug or it's caused by a misconfiguration on my side. Using {{nodetool setlogginglevel org.apache.cassandra TRACE}} on both the 4.x upgraded node (cassandra7) and on the running 3.x seed node (cassandra9) I was able to isolate the relevant logs: On cassandra7: {code:java} TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,410 EndpointMessagingVersions.java:67 - Assuming current protocol version for tasks.cassandra9/10.0.2.92:7000 TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,410 OutboundConnectionInitiator.java:131 - creating outbound bootstrap to peer: (tasks.cassandra9/10.0.2.92:7000, tasks.cassandra9/10.0.2.92:7000), framing: CRC, encryption: unencrypted, requestVersion: 12 TRACE
[jira] [Commented] (CASSANDRA-19116) History Builder API 2.0
[ https://issues.apache.org/jira/browse/CASSANDRA-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793948#comment-17793948 ] Caleb Rackliffe commented on CASSANDRA-19116: - +1 > History Builder API 2.0 > --- > > Key: CASSANDRA-19116 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19116 > Project: Cassandra > Issue Type: New Feature > Components: Test/fuzz >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Urgent > > Harry history Builder 2.0 > * New History Builder API > * Add an ability to track LTS visiteb by partition in visited_lts > static column > * Add a model checker that checks against a different Cluster instance > (for example, flush vs no flush, local vs nonlocal, etc) > * Add an ability to issue LTSs out-of-order -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19148: Remove unused dead code [cassandra-analytics]
frankgh commented on PR #23: URL: https://github.com/apache/cassandra-analytics/pull/23#issuecomment-1843813701 Closed via https://github.com/apache/cassandra-analytics/commit/680cc9395c55a88217f2de975f62ad588e8c95d5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19148: Remove unused dead code [cassandra-analytics]
frankgh closed pull request #23: CASSANDRA-19148: Remove unused dead code URL: https://github.com/apache/cassandra-analytics/pull/23 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails with no internode encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aldo updated CASSANDRA-19178: - Resolution: (was: Invalid) Status: Open (was: Resolved) > Cluster upgrade 3.x -> 4.x fails with no internode encryption > - > > Key: CASSANDRA-19178 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19178 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Aldo >Priority: Normal > Attachments: cassandra7.downgrade.log, cassandra7.log > > > I have a Docker swarm cluster with 3 distinct Cassandra services (named > {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different > servers. The 3 services are running the version 3.11.16, using the official > Cassandra image 3.11.16 on Docker Hub. The first service is configured just > with the following environment variables > {code:java} > CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7" > CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code} > which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance > the _cassandra.yaml_ for the first service contains the following (and the > rest is the image default): > {code:java} > # grep tasks /etc/cassandra/cassandra.yaml > - seeds: "tasks.cassandra7,tasks.cassandra9" > listen_address: tasks.cassandra7 > broadcast_address: tasks.cassandra7 > broadcast_rpc_address: tasks.cassandra7 {code} > Other services (8 and 9) have a similar configuration, obviously with a > different {{CASSANDRA_LISTEN_ADDRESS }}(\{{{}tasks.cassandra8}} and > {{{}tasks.cassandra9{}}}). > The cluster is running smoothly and all the nodes are perfectly able to > rejoin the cluster whichever event occurs, thanks to the Docker Swarm > {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for > Docker swarm to restart it, force update it in order to force a restart, > scale to 0 and then 1 the service, restart an entire server, turn off and > then turn on all the 3 servers. Never found an issue on this. > I also just completed a full upgrade of the cluster from version 2.2.8 to > 3.11.16 (simply upgrading the Docker official image associated with the > services) without issues. I was also able, thanks to a 2.2.8 snapshot on each > server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I > finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables > have now the {{me-*}} prefix. > > The problem I'm facing right now is the upgrade from 3.11.16 to 4.x. The > procedure that I follow is very simple: > # I start from the _cassandra7_ service (which is a seed node) > # {{nodetool drain}} > # Wait for the {{DRAINING ... DRAINED}} messages to appear in the log > # Upgrade the Docker image of _cassandra7_ to the official 4.1.3 version > The procedure is exactly the same I followed for the upgrade 2.2.8 --> > 3.11.16, obviously with a different version at step 4. Unfortunately the > upgrade 3.x --> 4.x is not working, the _cassandra7_ service restarts and > attempts to communicate with the other seed node ({_}cassandra9{_}) but the > log of _cassandra7_ shows the following: > {code:java} > INFO [Messaging-EventLoop-3-3] 2023-12-06 17:15:04,727 > OutboundConnectionInitiator.java:390 - Failed to connect to peer > tasks.cassandra9/10.0.2.196:7000(tasks.cassandra9/10.0.2.196:7000) > io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: > Connection reset by peer{code} > The relevant port of the log, related to the missing internode communication, > is attached in _cassandra7.log_ > In the log of _cassandra9_ there is nothing after the abovementioned step #4. > So only _cassandra7_ is saying something in the logs. > I tried with multiple versions (4.0.11 but also 4.0.0) but the outcome is > always the same. Of course when I follow the steps 1..3, then restore the 3.x > snapshot and finally perform the step #4 using the official 3.11.16 version > the node 7 restarts correctly and joins the cluster. I attached the relevant > part of the log (see {_}cassandra7.downgrade.log{_}) where you can see that > node 7 and 9 can communicate. > I suspect this could be related to the port 7000 now (with Cassandra 4.x) > supporting both encrypted and unencrypted traffic. As stated previously I'm > using the untouched official Cassandra images so all my cluster, inside the > Docker Swarm, is not (and has never been) configured with encryption. > I can also add the following: if I perform the 4 above steps also for the > _cassandra9_ and _cassandra8_ services, in the end the cluster works. But > this is not acceptable, because the cluster is unavailable until I finish the > full upgrade of all nodes: I need to perform a step-update, one
[jira] [Commented] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails with no internode encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793944#comment-17793944 ] Aldo commented on CASSANDRA-19178: -- I apologize in advance if reopening is not the correct behavior, please tell me if I need to open a new issue. I think I've discovered the source cause of the issue, and wonder if it's a bug or it's caused by a misconfiguration on my side. Using {{nodetool setlogginglevel org.apache.cassandra TRACE}} on both the 4.x upgraded node (cassandra7) and on the running 3.x seed node (cassandra9) I was able to isolate the relevant logs: On cassandra7: {code:java} TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,410 EndpointMessagingVersions.java:67 - Assuming current protocol version for tasks.cassandra9/10.0.2.92:7000 TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,410 OutboundConnectionInitiator.java:131 - creating outbound bootstrap to peer: (tasks.cassandra9/10.0.2.92:7000, tasks.cassandra9/10.0.2.92:7000), framing: CRC, encryption: unencrypted, requestVersion: 12 TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,411 OutboundConnectionInitiator.java:236 - starting handshake with peer tasks.cassandra9/10.0.2.92:7000(tasks.cassandra9/10.0.2.92:7000), msg = Initiate(request: 12, min: 10, max: 12, type: URGENT_MESSAGES, framing: true, from: tasks.cassandra7/10.0.2.137:7000) INFO [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,412 OutboundConnectionInitiator.java:390 - Failed to connect to peer tasks.cassandra9/10.0.2.92:7000(tasks.cassandra9/10.0.2.92:7000) io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer {code} On cassandra9: {code:java} TRACE [ACCEPT-tasks.cassandra9/10.0.2.92] 2023-12-06 22:16:56,411 MessagingService.java:1315 - Connection version 12 from /10.0.2.137 TRACE [MessagingService-Incoming-/10.0.2.137] 2023-12-06 22:16:56,412 IncomingTcpConnection.java:111 - IOException reading from socket; closing java.io.IOException: Peer-used messaging version 12 is larger than max supported 11 at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:153) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:98) TRACE [MessagingService-Incoming-/10.0.2.137] 2023-12-06 22:16:56,412 IncomingTcpConnection.java:125 - Closing socket Socket[addr=/10.0.2.137,port=45680,localport=7000] - isclosed: false {code} So it seems there is a mismatch on this {_}messaging version{_}. > Cluster upgrade 3.x -> 4.x fails with no internode encryption > - > > Key: CASSANDRA-19178 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19178 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Aldo >Priority: Normal > Attachments: cassandra7.downgrade.log, cassandra7.log > > > I have a Docker swarm cluster with 3 distinct Cassandra services (named > {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different > servers. The 3 services are running the version 3.11.16, using the official > Cassandra image 3.11.16 on Docker Hub. The first service is configured just > with the following environment variables > {code:java} > CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7" > CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code} > which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance > the _cassandra.yaml_ for the first service contains the following (and the > rest is the image default): > {code:java} > # grep tasks /etc/cassandra/cassandra.yaml > - seeds: "tasks.cassandra7,tasks.cassandra9" > listen_address: tasks.cassandra7 > broadcast_address: tasks.cassandra7 > broadcast_rpc_address: tasks.cassandra7 {code} > Other services (8 and 9) have a similar configuration, obviously with a > different {{CASSANDRA_LISTEN_ADDRESS }}(\{{{}tasks.cassandra8}} and > {{{}tasks.cassandra9{}}}). > The cluster is running smoothly and all the nodes are perfectly able to > rejoin the cluster whichever event occurs, thanks to the Docker Swarm > {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for > Docker swarm to restart it, force update it in order to force a restart, > scale to 0 and then 1 the service, restart an entire server, turn off and > then turn on all the 3 servers. Never found an issue on this. > I also just completed a full upgrade of the cluster from version 2.2.8 to > 3.11.16 (simply upgrading the Docker official image associated with the > services) without issues. I was also able, thanks to a 2.2.8 snapshot on each > server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I > finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables > have
Re: [PR] CASSANDRA-18969: source files missing from sources jars due to maven … [cassandra-java-driver]
michaelsembwever commented on PR #1900: URL: https://github.com/apache/cassandra-java-driver/pull/1900#issuecomment-1843771617 I'll try to wrap up the review tomorrow, and if it looks ok merge and cut and stage a release. That will help us get eyes on anything in the release we're missed… -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19148: Remove unused dead code [cassandra-analytics]
jberragan commented on PR #23: URL: https://github.com/apache/cassandra-analytics/pull/23#issuecomment-1843751944 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19179) BLOG - Apache Cassandra 5.0 Features: Vector Search
[ https://issues.apache.org/jira/browse/CASSANDRA-19179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793938#comment-17793938 ] Paul Au commented on CASSANDRA-19179: - * [https://raw.githack.com/Paul-TT/cassandra-website/CASSANDRA-19179_generated/content/_/blog/Introducing-the-Apache-Cassandra-Catalyst-Program.html] * [https://raw.githack.com/Paul-TT/cassandra-website/CASSANDRA-19179_generated/content/_/blog.html] * https://raw.githack.com/Paul-TT/cassandra-website/CASSANDRA-19179_generated/content/_/cassandra-catalyst-program.html > BLOG - Apache Cassandra 5.0 Features: Vector Search > --- > > Key: CASSANDRA-19179 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19179 > Project: Cassandra > Issue Type: Task >Reporter: Paul Au >Priority: Normal > Attachments: blog-index.png, blog-post.png, catalyst-page.png > > > This ticket it to add a blog post to the site. The change includes: > * Adding the post : Apache Cassandra 5.0 Features: Vector Search > * updating the blog index page with the new post > * A small text change the Apache Catalyst program page. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19104) Standardize tablestats formatting and data units
[ https://issues.apache.org/jira/browse/CASSANDRA-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793912#comment-17793912 ] Brad Schoening edited comment on CASSANDRA-19104 at 12/6/23 9:22 PM: - [~zaaath] this link explains the difference, so it just depends on how it is calculating size/1000 (KB) or size/1024 (KiB). [What exactly are the storage/memory Units KiB/MiB/GiB/TiB|https://community.hpe.com/t5/hpe-primera-storage/what-exactly-are-the-storage-memory-units-kib-mib-gib-tib-in/td-p/7123450] was (Author: bschoeni): [~zaaath] this link explains the difference, so it just depends on how its calculated size/1000 (KB) or size/1024 (KiB). [What exactly are the storage/memory Units KiB/MiB/GiB/TiB|https://community.hpe.com/t5/hpe-primera-storage/what-exactly-are-the-storage-memory-units-kib-mib-gib-tib-in/td-p/7123450] > Standardize tablestats formatting and data units > > > Key: CASSANDRA-19104 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19104 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool >Reporter: Brad Schoening >Assignee: Leo Toff >Priority: Normal > > Tablestats reports output in plaintext, JSON or YAML. The human readable > output currently has a mix of KiB, bytes with inconsistent spacing > Simplify and default output to 'human readable'. Machine readable output is > available as an option and the current mixed output formatting is neither > friendly for human or machine reading and can be replaced. > !image-2023-11-27-13-49-14-247.png! > *Not a goal now (consider a follow up Jira):* > Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting: > * gcstats - uses MB > * getcompactionthroughput - uses MB/s > * getstreamthroughput - uses MB/s > * info - uses MiB/GiB -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19179) BLOG - Apache Cassandra 5.0 Features: Vector Search
[ https://issues.apache.org/jira/browse/CASSANDRA-19179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Au updated CASSANDRA-19179: Attachment: blog-index.png blog-post.png catalyst-page.png > BLOG - Apache Cassandra 5.0 Features: Vector Search > --- > > Key: CASSANDRA-19179 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19179 > Project: Cassandra > Issue Type: Task >Reporter: Paul Au >Priority: Normal > Attachments: blog-index.png, blog-post.png, catalyst-page.png > > > This ticket it to add a blog post to the site. The change includes: > * Adding the post : Apache Cassandra 5.0 Features: Vector Search > * updating the blog index page with the new post > * A small text change the Apache Catalyst program page. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19179) BLOG - Apache Cassandra 5.0 Features: Vector Search
Paul Au created CASSANDRA-19179: --- Summary: BLOG - Apache Cassandra 5.0 Features: Vector Search Key: CASSANDRA-19179 URL: https://issues.apache.org/jira/browse/CASSANDRA-19179 Project: Cassandra Issue Type: Task Reporter: Paul Au This ticket it to add a blog post to the site. The change includes: * Adding the post : Apache Cassandra 5.0 Features: Vector Search * updating the blog index page with the new post * A small text change the Apache Catalyst program page. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793935#comment-17793935 ] Brad Schoening edited comment on CASSANDRA-18762 at 12/6/23 9:13 PM: - An update: we are still seeing this occur on a cluster. They have configured native_transport_max_thread = 256. A large number of repair Merkle trees precedes the OOM crash. System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs file_cache_size_in_mb =4096, G1HeapRegionSize=16M !image-2023-12-06-15-58-55-007.png! above graph is missing time ticks, but the spike occurs at 06:16:00 !image-2023-12-06-15-29-31-491.png! Summary of the cassandra log: 11:17:10,289 [INFO ] RepairSession.java:202 - [repair #838c24c0-935f-11ee-97ba-d79b6a12ccbe|#838c24c0-935f-11ee-97ba-d79b6a12ccbe] Received merkle tree for table1 from ... [repeated 35 times] 11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 694ms. G1 Eden Space: 8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 385875968 -> 0; 11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 505ms. G1 Old Gen: 1133473904 -> 1133526408; 11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1 NoSpamLogger.java:92 - Some operations were slow, details available at debug level (debug.log) 11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 787ms. G1 Eden Space: 16777216 -> 0; G1 Old Gen: 1133526408 -> 1133545448; 11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:292 - G1 Old Generation GC in 4742ms. G1 Old Gen: 1133545448 -> 1133581144; 11:17:23,217 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 StatusLogger.java:65 [abbr] StatusLogger.java:65 - Pool Name Active Pending Completed Blocked All Time Blocked StatusLogger.java:69 - ReadStage 1 0 48261572 0 0 StatusLogger.java:69 - Native-Transport-Requests 1 0 395189663 0 0 StatusLogger.java:69 - ValidationExecutor 4 73 110086 0 0 StatusLogger.java:69 - AntiEntropyStage 1 0 352704 0 0 11:17:24,114 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 853ms. G1 Eden Space: 117440512 -> 0; G1 Old Gen: 1133747360 -> 1133758448; 11:17:24,564 [ERROR] [Messaging-EventLoop-3-5] cluster_id=99 ip_address=10.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Direct buffer memory at java.base/java.nio.Bits.reserveMemory(Bits.java:175) at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318){{ ... etc}} was (Author: bschoeni): An update: we are still seeing this occur on a cluster. They have configured native_transport_max_thread = 256. A large number of repair Merkle trees precedes the OOM crash. System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs file_cache_size_in_mb =4096, G1HeapRegionSize=16M !image-2023-12-06-15-58-55-007.png! above graph is missing time ticks, but the spike occurs at 06:16:00 !image-2023-12-06-15-29-31-491.png! Summary of the cassandra log: {{11:17:10,289 [INFO ] RepairSession.java:202 - [repair #838c24c0-935f-11ee-97ba-d79b6a12ccbe|#838c24c0-935f-11ee-97ba-d79b6a12ccbe] Received merkle tree for table1 from ... [repeated 35 times]}} {{11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 694ms. G1 Eden Space: 8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 385875968 -> 0; }} {{11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 505ms. G1 Old Gen: 1133473904 -> 1133526408; }} {{11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1 NoSpamLogger.java:92 - Some operations were slow, details available at debug level (debug.log)}} {{11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 787ms. G1 Eden Space: 16777216 -> 0; G1 Old Gen: 1133526408 -> 1133545448; }} {{11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:292 - G1 Old Generation GC
[jira] [Comment Edited] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793935#comment-17793935 ] Brad Schoening edited comment on CASSANDRA-18762 at 12/6/23 9:12 PM: - An update: we are still seeing this occur on a cluster. They have configured native_transport_max_thread = 256. A large number of repair Merkle trees precedes the OOM crash. System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs file_cache_size_in_mb =4096, G1HeapRegionSize=16M !image-2023-12-06-15-58-55-007.png! above graph is missing time ticks, but the spike occurs at 06:16:00 !image-2023-12-06-15-29-31-491.png! Summary of the cassandra log: {{11:17:10,289 [INFO ] RepairSession.java:202 - [repair #838c24c0-935f-11ee-97ba-d79b6a12ccbe|#838c24c0-935f-11ee-97ba-d79b6a12ccbe] Received merkle tree for table1 from ... [repeated 35 times]}} {{11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 694ms. G1 Eden Space: 8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 385875968 -> 0; }} {{11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 505ms. G1 Old Gen: 1133473904 -> 1133526408; }} {{11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1 NoSpamLogger.java:92 - Some operations were slow, details available at debug level (debug.log)}} {{11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 787ms. G1 Eden Space: 16777216 -> 0; G1 Old Gen: 1133526408 -> 1133545448; }} {{11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:292 - G1 Old Generation GC in 4742ms. G1 Old Gen: 1133545448 -> 1133581144; }} {{11:17:23,217 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 StatusLogger.java:65 [abbr] }}{{StatusLogger.java:65 - Pool Name Active Pending Completed Blocked All Time Blocked}} {{StatusLogger.java:69 - ReadStage 1 0 48261572 0 0}} {{StatusLogger.java:69 - Native-Transport-Requests 1 0 395189663 0 0}} {{StatusLogger.java:69 - ValidationExecutor 4 73 110086 0 0}} {{StatusLogger.java:69 - AntiEntropyStage 1 0 352704 0 0}} {{11:17:24,114 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 853ms. G1 Eden Space: 117440512 -> 0; G1 Old Gen: 1133747360 -> 1133758448; }} {{11:17:24,564 [ERROR] [Messaging-EventLoop-3-5] cluster_id=99 ip_address=10.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error letting the JVM handle the error:}} {{java.lang.OutOfMemoryError: Direct buffer memory}} {{ at java.base/java.nio.Bits.reserveMemory(Bits.java:175)}} {{ at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118)}} {{ at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318)}} ... etc was (Author: bschoeni): An update: we are still seeing this occur on a cluster. They have configured native_transport_max_thread = 256. A large number of repair Merkle trees precedes the OOM crash. System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs file_cache_size_in_mb =4096, G1HeapRegionSize=16M !image-2023-12-06-15-58-55-007.png! above graph is missing time ticks, but the spike occurs at 06:16:00 !image-2023-12-06-15-29-31-491.png! Summary of the cassandra log: {{11:17:10,289 [INFO ] RepairSession.java:202 - [repair #838c24c0-935f-11ee-97ba-d79b6a12ccbe|#838c24c0-935f-11ee-97ba-d79b6a12ccbe] Received merkle tree for table1 from ... [repeated 35 times]}} {{11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 694ms. G1 Eden Space: 8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 385875968 -> 0; }} {{11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 505ms. G1 Old Gen: 1133473904 -> 1133526408; }} {{11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1 NoSpamLogger.java:92 - Some operations were slow, details available at debug level (debug.log)}} {{11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 787ms. G1 Eden Space: 16777216 -> 0; G1 Old Gen: 1133526408 -> 1133545448; }} {{11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:292 - G1 Old Generation GC in 4742ms. G1 Old Gen: 1133545448 -> 1133581144; }} {{{}11:17:23,217 [INFO
[jira] [Comment Edited] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793935#comment-17793935 ] Brad Schoening edited comment on CASSANDRA-18762 at 12/6/23 9:11 PM: - An update: we are still seeing this occur on a cluster. They have configured native_transport_max_thread = 256. A large number of repair Merkle trees precedes the OOM crash. System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs file_cache_size_in_mb =4096, G1HeapRegionSize=16M !image-2023-12-06-15-58-55-007.png! above graph is missing time ticks, but the spike occurs at 06:16:00 !image-2023-12-06-15-29-31-491.png! Summary of the cassandra log: {{11:17:10,289 [INFO ] RepairSession.java:202 - [repair #838c24c0-935f-11ee-97ba-d79b6a12ccbe|#838c24c0-935f-11ee-97ba-d79b6a12ccbe] Received merkle tree for table1 from ... [repeated 35 times]}} {{11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 694ms. G1 Eden Space: 8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 385875968 -> 0; }} {{11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 505ms. G1 Old Gen: 1133473904 -> 1133526408; }} {{11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1 NoSpamLogger.java:92 - Some operations were slow, details available at debug level (debug.log)}} {{11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 787ms. G1 Eden Space: 16777216 -> 0; G1 Old Gen: 1133526408 -> 1133545448; }} {{11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:292 - G1 Old Generation GC in 4742ms. G1 Old Gen: 1133545448 -> 1133581144; }} {{{}11:17:23,217 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 StatusLogger.java:65 [abbr]{}}}{{{}StatusLogger.java:65 - Pool Name Active Pending Completed Blocked All Time Blocked{}}} {{StatusLogger.java:69 - ReadStage 1 0 48261572 0 0}} {{StatusLogger.java:69 - Native-Transport-Requests 1 0 395189663 0 0}} {{StatusLogger.java:69 - ValidationExecutor 4 73 110086 0 0}} {{StatusLogger.java:69 - AntiEntropyStage 1 0 352704 0 0}} {{11:17:24,114 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 853ms. G1 Eden Space: 117440512 -> 0; G1 Old Gen: 1133747360 -> 1133758448; }} {{11:17:24,564 [ERROR] [Messaging-EventLoop-3-5] cluster_id=99 ip_address=10.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error letting the JVM handle the error:}} {{java.lang.OutOfMemoryError: Direct buffer memory}} {{ at java.base/java.nio.Bits.reserveMemory(Bits.java:175)}} {{ at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118)}} {{ at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318)}} ... etc was (Author: bschoeni): An update: we are still seeing this occur on a cluster. They have configured native_transport_max_thread = 256. A large number of repair Merkle trees precedes the OOM crash. System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs file_cache_size_in_mb =4096, G1HeapRegionSize=16M !image-2023-12-06-15-58-55-007.png! above graph is missing time ticks, but the spike occurs at 06:16:00 !image-2023-12-06-15-29-31-491.png! Summary of the cassandra log: 11:17:10,289 [INFO ] RepairSession.java:202 - [repair #838c24c0-935f-11ee-97ba-d79b6a12ccbe|#838c24c0-935f-11ee-97ba-d79b6a12ccbe] Received merkle tree for table1 from ... [repeated 35 times] 11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 694ms. G1 Eden Space: 8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 385875968 -> 0; 11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 505ms. G1 Old Gen: 1133473904 -> 1133526408; 11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1 NoSpamLogger.java:92 - Some operations were slow, details available at debug level (debug.log) 11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 787ms. G1 Eden Space: 16777216 -> 0; G1 Old Gen: 1133526408 -> 1133545448; 11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:292 - G1 Old Generation GC in 4742ms. G1 Old Gen: 1133545448 -> 1133581144; 11:17:23,217 [INFO ] [Service Thread]
[jira] [Comment Edited] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793935#comment-17793935 ] Brad Schoening edited comment on CASSANDRA-18762 at 12/6/23 9:07 PM: - An update: we are still seeing this occur on a cluster. They have configured native_transport_max_thread = 256. A large number of repair Merkle trees precedes the OOM crash. System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs file_cache_size_in_mb =4096, G1HeapRegionSize=16M !image-2023-12-06-15-58-55-007.png! above graph is missing time ticks, but the spike occurs at 06:16:00 !image-2023-12-06-15-29-31-491.png! Summary of the cassandra log: 11:17:10,289 [INFO ] RepairSession.java:202 - [repair #838c24c0-935f-11ee-97ba-d79b6a12ccbe|#838c24c0-935f-11ee-97ba-d79b6a12ccbe] Received merkle tree for table1 from ... [repeated 35 times] 11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 694ms. G1 Eden Space: 8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 385875968 -> 0; 11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 505ms. G1 Old Gen: 1133473904 -> 1133526408; 11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1 NoSpamLogger.java:92 - Some operations were slow, details available at debug level (debug.log) 11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 787ms. G1 Eden Space: 16777216 -> 0; G1 Old Gen: 1133526408 -> 1133545448; 11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:292 - G1 Old Generation GC in 4742ms. G1 Old Gen: 1133545448 -> 1133581144; 11:17:23,217 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 StatusLogger.java:65 [elided] 11:17:24,114 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 853ms. G1 Eden Space: 117440512 -> 0; G1 Old Gen: 1133747360 -> 1133758448; 11:17:24,564 [ERROR] [Messaging-EventLoop-3-5] cluster_id=99 ip_address=10.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Direct buffer memory at java.base/java.nio.Bits.reserveMemory(Bits.java:175) at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) ... etc was (Author: bschoeni): An update: we are still seeing this occur on a cluster. They have configured native_transport_max_thread = 256. A large number of repair Merkle trees precedes the OOM crash. System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs file_cache_size_in_mb =4096, G1HeapRegionSize=16M !image-2023-12-06-15-58-55-007.png! above graph is missing time ticks, but the spike occurs at 06:16:00 !image-2023-12-06-15-29-31-491.png! Summary of the cassandra log: 11:17:10,289 [INFO ] RepairSession.java:202 - [repair #838c24c0-935f-11ee-97ba-d79b6a12ccbe] Received merkle tree for table1 from ... [repeated 35 times] 11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 694ms. G1 Eden Space: 8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 385875968 -> 0; 11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java[repeated 35 times]:294 - G1 Old Generation GC in 505ms. G1 Old Gen: 1133473904 -> 1133526408; 11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1 NoSpamLogger.java:92 - Some operations were slow, details available at debug level (debug.log) 11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 787ms. G1 Eden Space: 16777216 -> 0; G1 Old Gen: 1133526408 -> 1133545448; 11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:292 - G1 Old Generation GC in 4742ms. G1 Old Gen: 1133545448 -> 1133581144; 11:17:23,217 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 StatusLogger.java:65 [elided] 11:17:24,114 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 853ms. G1 Eden Space: 117440512 -> 0; G1 Old Gen: 1133747360 -> 1133758448; 11:17:24,564 [ERROR] [Messaging-EventLoop-3-5] cluster_id=99 ip_address=10.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Direct buffer memory at java.base/java.nio.Bits.reserveMemory(Bits.java:175) at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) at
[jira] [Updated] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brad Schoening updated CASSANDRA-18762: --- Resolution: (was: Cannot Reproduce) Status: Open (was: Resolved) > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Attachments: Cluster-dm-metrics-1.PNG, > image-2023-12-06-15-28-05-459.png, image-2023-12-06-15-29-31-491.png, > image-2023-12-06-15-58-55-007.png > > > We are seeing repeated failures of nodes with 16GB of heap and the same size > (16GB) for direct memory (derived from -Xms). This seems to be related to > CASSANDRA-15202 which moved merkel trees off-heap in 4.0. Using Cassandra > 4.0.6. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is > done here{noformat} > > -XX:+AlwaysPreTouch > -XX:+CrashOnOutOfMemoryError > -XX:+ExitOnOutOfMemoryError > -XX:+HeapDumpOnOutOfMemoryError > -XX:+ParallelRefProcEnabled > -XX:+PerfDisableSharedMem > -XX:+ResizeTLAB > -XX:+UseG1GC > -XX:+UseNUMA > -XX:+UseTLAB > -XX:+UseThreadPriorities >
[jira] [Commented] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793935#comment-17793935 ] Brad Schoening commented on CASSANDRA-18762: An update: we are still seeing this occur on a cluster. They have configured native_transport_max_thread = 256. A large number of repair Merkle trees precedes the OOM crash. System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs file_cache_size_in_mb =4096, G1HeapRegionSize=16M !image-2023-12-06-15-58-55-007.png! above graph is missing time ticks, but the spike occurs at 06:16:00 !image-2023-12-06-15-29-31-491.png! Summary of the cassandra log: 11:17:10,289 [INFO ] RepairSession.java:202 - [repair #838c24c0-935f-11ee-97ba-d79b6a12ccbe] Received merkle tree for table1 from ... [repeated 35 times] 11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 694ms. G1 Eden Space: 8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 385875968 -> 0; 11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java[repeated 35 times]:294 - G1 Old Generation GC in 505ms. G1 Old Gen: 1133473904 -> 1133526408; 11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1 NoSpamLogger.java:92 - Some operations were slow, details available at debug level (debug.log) 11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 787ms. G1 Eden Space: 16777216 -> 0; G1 Old Gen: 1133526408 -> 1133545448; 11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:292 - G1 Old Generation GC in 4742ms. G1 Old Gen: 1133545448 -> 1133581144; 11:17:23,217 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 StatusLogger.java:65 [elided] 11:17:24,114 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 853ms. G1 Eden Space: 117440512 -> 0; G1 Old Gen: 1133747360 -> 1133758448; 11:17:24,564 [ERROR] [Messaging-EventLoop-3-5] cluster_id=99 ip_address=10.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Direct buffer memory at java.base/java.nio.Bits.reserveMemory(Bits.java:175) at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) ... etc > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Attachments: Cluster-dm-metrics-1.PNG, > image-2023-12-06-15-28-05-459.png, image-2023-12-06-15-29-31-491.png, > image-2023-12-06-15-58-55-007.png > > > We are seeing repeated failures of nodes with 16GB of heap and the same size > (16GB) for direct memory (derived from -Xms). This seems to be related to > CASSANDRA-15202 which moved merkel trees off-heap in 4.0. Using Cassandra > 4.0.6. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at
[jira] [Commented] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails with no internode encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793934#comment-17793934 ] Aldo commented on CASSANDRA-19178: -- Thanks, I moved the question on StackExchange [here|https://dba.stackexchange.com/questions/333799/cassandra-cluster-upgrade-3-x-4-x-fails-with-internode-encryption-none]. > Cluster upgrade 3.x -> 4.x fails with no internode encryption > - > > Key: CASSANDRA-19178 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19178 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Aldo >Priority: Normal > Attachments: cassandra7.downgrade.log, cassandra7.log > > > I have a Docker swarm cluster with 3 distinct Cassandra services (named > {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different > servers. The 3 services are running the version 3.11.16, using the official > Cassandra image 3.11.16 on Docker Hub. The first service is configured just > with the following environment variables > {code:java} > CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7" > CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code} > which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance > the _cassandra.yaml_ for the first service contains the following (and the > rest is the image default): > {code:java} > # grep tasks /etc/cassandra/cassandra.yaml > - seeds: "tasks.cassandra7,tasks.cassandra9" > listen_address: tasks.cassandra7 > broadcast_address: tasks.cassandra7 > broadcast_rpc_address: tasks.cassandra7 {code} > Other services (8 and 9) have a similar configuration, obviously with a > different {{CASSANDRA_LISTEN_ADDRESS }}(\{{{}tasks.cassandra8}} and > {{{}tasks.cassandra9{}}}). > The cluster is running smoothly and all the nodes are perfectly able to > rejoin the cluster whichever event occurs, thanks to the Docker Swarm > {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for > Docker swarm to restart it, force update it in order to force a restart, > scale to 0 and then 1 the service, restart an entire server, turn off and > then turn on all the 3 servers. Never found an issue on this. > I also just completed a full upgrade of the cluster from version 2.2.8 to > 3.11.16 (simply upgrading the Docker official image associated with the > services) without issues. I was also able, thanks to a 2.2.8 snapshot on each > server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I > finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables > have now the {{me-*}} prefix. > > The problem I'm facing right now is the upgrade from 3.11.16 to 4.x. The > procedure that I follow is very simple: > # I start from the _cassandra7_ service (which is a seed node) > # {{nodetool drain}} > # Wait for the {{DRAINING ... DRAINED}} messages to appear in the log > # Upgrade the Docker image of _cassandra7_ to the official 4.1.3 version > The procedure is exactly the same I followed for the upgrade 2.2.8 --> > 3.11.16, obviously with a different version at step 4. Unfortunately the > upgrade 3.x --> 4.x is not working, the _cassandra7_ service restarts and > attempts to communicate with the other seed node ({_}cassandra9{_}) but the > log of _cassandra7_ shows the following: > {code:java} > INFO [Messaging-EventLoop-3-3] 2023-12-06 17:15:04,727 > OutboundConnectionInitiator.java:390 - Failed to connect to peer > tasks.cassandra9/10.0.2.196:7000(tasks.cassandra9/10.0.2.196:7000) > io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: > Connection reset by peer{code} > The relevant port of the log, related to the missing internode communication, > is attached in _cassandra7.log_ > In the log of _cassandra9_ there is nothing after the abovementioned step #4. > So only _cassandra7_ is saying something in the logs. > I tried with multiple versions (4.0.11 but also 4.0.0) but the outcome is > always the same. Of course when I follow the steps 1..3, then restore the 3.x > snapshot and finally perform the step #4 using the official 3.11.16 version > the node 7 restarts correctly and joins the cluster. I attached the relevant > part of the log (see {_}cassandra7.downgrade.log{_}) where you can see that > node 7 and 9 can communicate. > I suspect this could be related to the port 7000 now (with Cassandra 4.x) > supporting both encrypted and unencrypted traffic. As stated previously I'm > using the untouched official Cassandra images so all my cluster, inside the > Docker Swarm, is not (and has never been) configured with encryption. > I can also add the following: if I perform the 4 above steps also for the > _cassandra9_ and _cassandra8_ services, in the end the cluster works. But > this
[jira] [Updated] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brad Schoening updated CASSANDRA-18762: --- Attachment: image-2023-12-06-15-58-55-007.png > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Attachments: Cluster-dm-metrics-1.PNG, > image-2023-12-06-15-28-05-459.png, image-2023-12-06-15-29-31-491.png, > image-2023-12-06-15-58-55-007.png > > > We are seeing repeated failures of nodes with 16GB of heap and the same size > (16GB) for direct memory (derived from -Xms). This seems to be related to > CASSANDRA-15202 which moved merkel trees off-heap in 4.0. Using Cassandra > 4.0.6. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is > done here{noformat} > > -XX:+AlwaysPreTouch > -XX:+CrashOnOutOfMemoryError > -XX:+ExitOnOutOfMemoryError > -XX:+HeapDumpOnOutOfMemoryError > -XX:+ParallelRefProcEnabled > -XX:+PerfDisableSharedMem > -XX:+ResizeTLAB > -XX:+UseG1GC > -XX:+UseNUMA > -XX:+UseTLAB > -XX:+UseThreadPriorities > -XX:-UseBiasedLocking >
[jira] [Updated] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brad Schoening updated CASSANDRA-18762: --- Attachment: image-2023-12-06-15-29-31-491.png > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Attachments: Cluster-dm-metrics-1.PNG, > image-2023-12-06-15-28-05-459.png, image-2023-12-06-15-29-31-491.png > > > We are seeing repeated failures of nodes with 16GB of heap and the same size > (16GB) for direct memory (derived from -Xms). This seems to be related to > CASSANDRA-15202 which moved merkel trees off-heap in 4.0. Using Cassandra > 4.0.6. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is > done here{noformat} > > -XX:+AlwaysPreTouch > -XX:+CrashOnOutOfMemoryError > -XX:+ExitOnOutOfMemoryError > -XX:+HeapDumpOnOutOfMemoryError > -XX:+ParallelRefProcEnabled > -XX:+PerfDisableSharedMem > -XX:+ResizeTLAB > -XX:+UseG1GC > -XX:+UseNUMA > -XX:+UseTLAB > -XX:+UseThreadPriorities > -XX:-UseBiasedLocking >
[jira] [Updated] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brad Schoening updated CASSANDRA-18762: --- Attachment: image-2023-12-06-15-28-05-459.png > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Attachments: Cluster-dm-metrics-1.PNG, > image-2023-12-06-15-28-05-459.png > > > We are seeing repeated failures of nodes with 16GB of heap and the same size > (16GB) for direct memory (derived from -Xms). This seems to be related to > CASSANDRA-15202 which moved merkel trees off-heap in 4.0. Using Cassandra > 4.0.6. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is > done here{noformat} > > -XX:+AlwaysPreTouch > -XX:+CrashOnOutOfMemoryError > -XX:+ExitOnOutOfMemoryError > -XX:+HeapDumpOnOutOfMemoryError > -XX:+ParallelRefProcEnabled > -XX:+PerfDisableSharedMem > -XX:+ResizeTLAB > -XX:+UseG1GC > -XX:+UseNUMA > -XX:+UseTLAB > -XX:+UseThreadPriorities > -XX:-UseBiasedLocking > -XX:CompileCommandFile=/opt/nosql/clusters/cassandra-101/conf/hotspot_compiler >
[jira] [Updated] (CASSANDRA-19104) Standardize tablestats formatting and data units
[ https://issues.apache.org/jira/browse/CASSANDRA-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brad Schoening updated CASSANDRA-19104: --- Description: Tablestats reports output in plaintext, JSON or YAML. The human readable output currently has a mix of KiB, bytes with inconsistent spacing Simplify and default output to 'human readable'. Machine readable output is available as an option and the current mixed output formatting is neither friendly for human or machine reading and can be replaced. !image-2023-11-27-13-49-14-247.png! *Not a goal now (consider a follow up Jira):* Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting: * gcstats - uses MB * getcompactionthroughput - uses MB/s * getstreamthroughput - uses MB/s * info - uses MiB/GiB was: Tablestats reports output in plaintext, JSON or YAML. The human readable output currently has a mix of KiB, bytes with inconsistent spacing Simplify and defaulting output to 'human readable'. Machine readable output is available as an option and the current mixed output formatting is neither friendly for human or machine reading and can be replaced. !image-2023-11-27-13-49-14-247.png! *Not a goal now (consider a follow up Jira):* Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting: * gcstats - uses MB * getcompactionthroughput - uses MB/s * getstreamthroughput - uses MB/s * info - uses MiB/GiB > Standardize tablestats formatting and data units > > > Key: CASSANDRA-19104 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19104 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool >Reporter: Brad Schoening >Assignee: Leo Toff >Priority: Normal > > Tablestats reports output in plaintext, JSON or YAML. The human readable > output currently has a mix of KiB, bytes with inconsistent spacing > Simplify and default output to 'human readable'. Machine readable output is > available as an option and the current mixed output formatting is neither > friendly for human or machine reading and can be replaced. > !image-2023-11-27-13-49-14-247.png! > *Not a goal now (consider a follow up Jira):* > Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting: > * gcstats - uses MB > * getcompactionthroughput - uses MB/s > * getstreamthroughput - uses MB/s > * info - uses MiB/GiB -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19104) Standardize tablestats formatting and data units
[ https://issues.apache.org/jira/browse/CASSANDRA-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793912#comment-17793912 ] Brad Schoening commented on CASSANDRA-19104: [~zaaath] this link explains the difference, so it just depends on how its calculated size/1000 (KB) or size/1024 (KiB). [What exactly are the storage/memory Units KiB/MiB/GiB/TiB|https://community.hpe.com/t5/hpe-primera-storage/what-exactly-are-the-storage-memory-units-kib-mib-gib-tib-in/td-p/7123450] > Standardize tablestats formatting and data units > > > Key: CASSANDRA-19104 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19104 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool >Reporter: Brad Schoening >Assignee: Leo Toff >Priority: Normal > > Tablestats reports output in plaintext, JSON or YAML. The human readable > output currently has a mix of KiB, bytes with inconsistent spacing > Simplify and defaulting output to 'human readable'. Machine readable output > is available as an option and the current mixed output formatting is neither > friendly for human or machine reading and can be replaced. > !image-2023-11-27-13-49-14-247.png! > *Not a goal now (consider a follow up Jira):* > Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting: > * gcstats - uses MB > * getcompactionthroughput - uses MB/s > * getstreamthroughput - uses MB/s > * info - uses MiB/GiB -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18857) Allow CQL client certificate authentication to work without sending an AUTHENTICATE request
[ https://issues.apache.org/jira/browse/CASSANDRA-18857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793913#comment-17793913 ] Andy Tolbert commented on CASSANDRA-18857: -- Apologies for the late follow up here. Realized that for this to fully work [CASSANDRA-18811] is needed. I've created a [pull request|https://github.com/apache/cassandra/pull/2969] that I will update as soon as [CASSANDRA-18811] lands in trunk. > Allow CQL client certificate authentication to work without sending an > AUTHENTICATE request > --- > > Key: CASSANDRA-18857 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18857 > Project: Cassandra > Issue Type: Improvement > Components: Feature/Encryption >Reporter: Andy Tolbert >Priority: Normal > Time Spent: 50m > Remaining Estimate: 0h > > Currently when using {{MutualTlsAuthenticator}} or > {{MutualTlsWithPasswordFallbackAuthenticator}} a client is prompted with an > {{AUTHENTICATE}} message to which they must respond with an {{AUTH_RESPONSE}} > (e.g. a user name and password). This shouldn't be needed as the role can be > identified using only the certificate. > To address this, we could add the capability to authenticate early in > processing of a {{STARTUP}} message if we can determine that both the > configured authenticator supports certificate authentication and a client > certificate was provided. If the certificate can be authenticated, a > {{READY}} response is returned, otherwise an {{ERROR}} is returned. > This change can be done done in a fully backwards compatible way and requires > no protocol or driver changes; I will supply a patch shortly! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19047) Guardrail for the number of tables is not working
[ https://issues.apache.org/jira/browse/CASSANDRA-19047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793907#comment-17793907 ] Ekaterina Dimitrova edited comment on CASSANDRA-19047 at 12/6/23 7:31 PM: -- Apologize for the late response, I missed the latest notifications. {quote}Maybe we can simply remove the {{@Deprecated}} tag so {{YamlConfigurationLoader}} doesn't complain about it, and manually throw the deprecation warning if its value is anything different than the default and {{Integer#MAX_VALUE}} {quote} Only if the value is not set to the MAX_VALUE, I think. If it is set to the default - I think people should be warned. If they want to stop from emitting deprecation warning - then they should set it to MAX_VALUE and/or move to guardrails. This is a way to prepare them for the future as that property can be removed in the next major. We can add a section in the NEWS.txt to explain. was (Author: e.dimitrova): Apologize for the late response, I missed the latest notifications. {quote}Maybe we can simply remove the {{@Deprecated}} tag so {{YamlConfigurationLoader}} doesn't complain about it, and manually throw the deprecation warning if its value is anything different than the default and {{Integer#MAX_VALUE}} {quote} Only if the value is not set to the MAX_VALUE, I think. If it is set to the default - I think people should be warned. If they want to stop from emitting it - then they should set it to MAX_VALUE and/or move to guardrails. This is a way to prepare them for the future as that property can be removed in the next major. We can add a section in the NEWS.txt to explain. > Guardrail for the number of tables is not working > - > > Key: CASSANDRA-19047 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19047 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Mohammad Aburadeh >Priority: Urgent > > Hi, > We installed Cassandra 4.1.3 and we got the following warning when creating > more than 150 tables: > {code:java} > WARN [Native-Transport-Requests-6] 2023-11-21 18:35:24,585 > CreateTableStatement.java:421 - Cluster already contains 161 tables in 6 > keyspaces. Having a large number of tables will significantly slow down > schema dependent cluster operations. {code} > I tried to disable "table_count_warn_threshold" by setting its value to "-1" > but that did not work. > Then I tried to set the guardrail for number of tables to "-1" to disable the > above but did not work as well. It seems there is no way to disable checking > the number of tables. > Also, I tried to set "tables_warn_threshold" to a value less than > "tables_count_warn_threshold", it seems Cassandra always uses > "tables_count_warn_threshold" when throwing the warning. > *Two issues in Cassandra 4.1.3:* > 1- There should be a way to disable this feature. Either by setting the > guardrail parameter to -1 or setting tables_count_warn_threshold to -1. > 2- The guardrail for number of tables should overwrite > tables_count_warn_threshold because I always get the following warning when I > try to increase the number of tables: > {code:java} > WARN [main] 2023-11-21 18:26:16,988 YamlConfigurationLoader.java:427 - > [keyspace_count_warn_threshold, table_count_warn_threshold] parameters have > been deprecated. They have new names and/or value format; For more > information, please refer to NEWS.txt {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19047) Guardrail for the number of tables is not working
[ https://issues.apache.org/jira/browse/CASSANDRA-19047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793907#comment-17793907 ] Ekaterina Dimitrova commented on CASSANDRA-19047: - Apologize for the late response, I missed the latest notifications. {quote}Maybe we can simply remove the {{@Deprecated}} tag so {{YamlConfigurationLoader}} doesn't complain about it, and manually throw the deprecation warning if its value is anything different than the default and {{Integer#MAX_VALUE}} {quote} Only if the value is not set to the MAX_VALUE, I think. If it is set to the default - I think people should be warned. If they want to stop from emitting it - then they should set it to MAX_VALUE and/or move to guardrails. This is a way to prepare them for the future as that property can be removed in the next major. We can add a section in the NEWS.txt to explain. > Guardrail for the number of tables is not working > - > > Key: CASSANDRA-19047 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19047 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Mohammad Aburadeh >Priority: Urgent > > Hi, > We installed Cassandra 4.1.3 and we got the following warning when creating > more than 150 tables: > {code:java} > WARN [Native-Transport-Requests-6] 2023-11-21 18:35:24,585 > CreateTableStatement.java:421 - Cluster already contains 161 tables in 6 > keyspaces. Having a large number of tables will significantly slow down > schema dependent cluster operations. {code} > I tried to disable "table_count_warn_threshold" by setting its value to "-1" > but that did not work. > Then I tried to set the guardrail for number of tables to "-1" to disable the > above but did not work as well. It seems there is no way to disable checking > the number of tables. > Also, I tried to set "tables_warn_threshold" to a value less than > "tables_count_warn_threshold", it seems Cassandra always uses > "tables_count_warn_threshold" when throwing the warning. > *Two issues in Cassandra 4.1.3:* > 1- There should be a way to disable this feature. Either by setting the > guardrail parameter to -1 or setting tables_count_warn_threshold to -1. > 2- The guardrail for number of tables should overwrite > tables_count_warn_threshold because I always get the following warning when I > try to increase the number of tables: > {code:java} > WARN [main] 2023-11-21 18:26:16,988 YamlConfigurationLoader.java:427 - > [keyspace_count_warn_threshold, table_count_warn_threshold] parameters have > been deprecated. They have new names and/or value format; For more > information, please refer to NEWS.txt {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19104) Standardize tablestats formatting and data units
[ https://issues.apache.org/jira/browse/CASSANDRA-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brad Schoening updated CASSANDRA-19104: --- Description: Tablestats reports output in plaintext, JSON or YAML. The human readable output currently has a mix of KiB, bytes with inconsistent spacing Simplify and defaulting output to 'human readable'. Machine readable output is available as an option and the current mixed output formatting is neither friendly for human or machine reading and can be replaced. !image-2023-11-27-13-49-14-247.png! *Not a goal now (consider a follow up Jira):* Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting: * gcstats - uses MB * getcompactionthroughput - uses MB/s * getstreamthroughput - uses MB/s * info - uses MiB/GiB was: Tablestats reports output in plaintext, JSON or YAML. The human readable output currently has a mix of KiB, bytes with inconsistent spacing Considering simplifying and defaulting output to 'human readable'. Machine readable output is available as an option and the current mixed output formatting is neither friendly for human or machine reading. !image-2023-11-27-13-49-14-247.png! *Not a goal now (consider a follow up Jira):* Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting: * gcstats - uses MB * getcompactionthroughput - uses MB/s * getstreamthroughput - uses MB/s * info - uses MiB/GiB > Standardize tablestats formatting and data units > > > Key: CASSANDRA-19104 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19104 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool >Reporter: Brad Schoening >Assignee: Leo Toff >Priority: Normal > > Tablestats reports output in plaintext, JSON or YAML. The human readable > output currently has a mix of KiB, bytes with inconsistent spacing > Simplify and defaulting output to 'human readable'. Machine readable output > is available as an option and the current mixed output formatting is neither > friendly for human or machine reading and can be replaced. > !image-2023-11-27-13-49-14-247.png! > *Not a goal now (consider a follow up Jira):* > Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting: > * gcstats - uses MB > * getcompactionthroughput - uses MB/s > * getstreamthroughput - uses MB/s > * info - uses MiB/GiB -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19120) local consistencies may get timeout if blocking read repair is sending the read repair mutation to other DC
[ https://issues.apache.org/jira/browse/CASSANDRA-19120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793903#comment-17793903 ] Stefan Miklosovic commented on CASSANDRA-19120: --- Feel free to fix that test I started, no problem with that. I had strong suspicion that I did not do it right anyway ... > local consistencies may get timeout if blocking read repair is sending the > read repair mutation to other DC > > > Key: CASSANDRA-19120 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19120 > Project: Cassandra > Issue Type: Bug >Reporter: Runtian Liu >Priority: Normal > Attachments: image-2023-11-29-15-26-08-056.png, signature.asc > > Time Spent: 10m > Remaining Estimate: 0h > > For a two DCs cluster setup. When a new node is being added to DC1, for > blocking read repair triggered by local_quorum in DC1, it will require to > send read repair mutation to an extra node(1)(2). The selector for read > repair may select *ANY* node that has not been contacted before(3) instead of > selecting the DC1 nodes. If a node from DC2 is selected, this will cause 100% > timeout because of the bug described below: > When we initialized the latch(4) for blocking read repair, the shouldBlockOn > function will only return true for local nodes(5), the blockFor value will be > reduced if a local node doesn't require repair(6). The blockFor is same as > the number of read repair mutation sent out. But when the coordinator node > receives the response from the target nodes, the latch only count down for > nodes in same DC(7). The latch will wait till timeout and the read request > will timeout. > This can be reproduced if you have a constant load on a 3 + 3 cluster when > adding a node. If you have someway to trigger blocking read repair(maybe by > adding load using stress tool). If you use local_quorum consistency with a > constant read after write load in the same DC that you are adding node. You > will see read timeout issue from time to time because of the bug described > above > > I think for read repair when selecting the extra node to do repair, we should > prefer local nodes than the nodes from other region. Also, we need to fix the > latch part so even if we send mutation to the nodes in other DC, we don't get > a timeout. > (1)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L455] > (2)[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ConsistencyLevel.java#L183] > (3)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L458] > (4)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L96] > (5)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L71] > (6)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L88] > (7)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L113] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19175) Test Failure: dtest.sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ca_mismatch
[ https://issues.apache.org/jira/browse/CASSANDRA-19175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793898#comment-17793898 ] Ekaterina Dimitrova edited comment on CASSANDRA-19175 at 12/6/23 7:10 PM: -- {quote} if the machine ran out of space {quote} I saw more flakies from this one, but I am not sure all of them were when we ran out of space. Examples from 4.1: [https://ci-cassandra.apache.org/job/Cassandra-4.1/441/testReport/dtest.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/] [https://ci-cassandra.apache.org/job/Cassandra-4.1/438/testReport/dtest-offheap.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/] EDIT: I moved the ticket to 5.x and 4.1.x, instead of 5.1.beta as it is obviously not a regression at least was (Author: e.dimitrova): {quote} if the machine ran out of space {quote} I saw more flakies from this one, but I am not sure all of them were when we ran out of space. Examples from 4.1: [https://ci-cassandra.apache.org/job/Cassandra-4.1/441/testReport/dtest.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/] [https://ci-cassandra.apache.org/job/Cassandra-4.1/438/testReport/dtest-offheap.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/] > Test Failure: > dtest.sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ca_mismatch > --- > > Key: CASSANDRA-19175 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19175 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.1.x, 5.x > > > [https://ci-cassandra.apache.org/job/Cassandra-trunk/1782/testReport/dtest.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/] > h3. > {code:java} > Error Message > assert False > Stacktrace > self = 0x7fca5921d050> def test_ca_mismatch(self): """CA mismatch should cause nodes > to fail to connect""" credNode1 = sslkeygen.generate_credentials("127.0.0.1") > credNode2 = sslkeygen.generate_credentials("127.0.0.2") # mismatching CA! > self.setup_nodes(credNode1, credNode2) > self.fixture_dtest_setup.allow_log_errors = True > self.cluster.start(no_wait=True) found = self._grep_msg(self.node1, > _LOG_ERR_HANDSHAKE) self.cluster.stop() > assert found E assert False > sslnodetonode_test.py:115: AssertionError > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19175) Test Failure: dtest.sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ca_mismatch
[ https://issues.apache.org/jira/browse/CASSANDRA-19175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-19175: Fix Version/s: 5.x (was: 5.1-beta) > Test Failure: > dtest.sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ca_mismatch > --- > > Key: CASSANDRA-19175 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19175 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.1.x, 5.x > > > [https://ci-cassandra.apache.org/job/Cassandra-trunk/1782/testReport/dtest.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/] > h3. > {code:java} > Error Message > assert False > Stacktrace > self = 0x7fca5921d050> def test_ca_mismatch(self): """CA mismatch should cause nodes > to fail to connect""" credNode1 = sslkeygen.generate_credentials("127.0.0.1") > credNode2 = sslkeygen.generate_credentials("127.0.0.2") # mismatching CA! > self.setup_nodes(credNode1, credNode2) > self.fixture_dtest_setup.allow_log_errors = True > self.cluster.start(no_wait=True) found = self._grep_msg(self.node1, > _LOG_ERR_HANDSHAKE) self.cluster.stop() > assert found E assert False > sslnodetonode_test.py:115: AssertionError > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19175) Test Failure: dtest.sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ca_mismatch
[ https://issues.apache.org/jira/browse/CASSANDRA-19175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793898#comment-17793898 ] Ekaterina Dimitrova commented on CASSANDRA-19175: - {quote} if the machine ran out of space {quote} I saw more flakies from this one, but I am not sure all of them were when we ran out of space. Examples from 4.1: [https://ci-cassandra.apache.org/job/Cassandra-4.1/441/testReport/dtest.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/] [https://ci-cassandra.apache.org/job/Cassandra-4.1/438/testReport/dtest-offheap.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/] > Test Failure: > dtest.sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ca_mismatch > --- > > Key: CASSANDRA-19175 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19175 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.1.x, 5.1-beta > > > [https://ci-cassandra.apache.org/job/Cassandra-trunk/1782/testReport/dtest.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/] > h3. > {code:java} > Error Message > assert False > Stacktrace > self = 0x7fca5921d050> def test_ca_mismatch(self): """CA mismatch should cause nodes > to fail to connect""" credNode1 = sslkeygen.generate_credentials("127.0.0.1") > credNode2 = sslkeygen.generate_credentials("127.0.0.2") # mismatching CA! > self.setup_nodes(credNode1, credNode2) > self.fixture_dtest_setup.allow_log_errors = True > self.cluster.start(no_wait=True) found = self._grep_msg(self.node1, > _LOG_ERR_HANDSHAKE) self.cluster.stop() > assert found E assert False > sslnodetonode_test.py:115: AssertionError > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19175) Test Failure: dtest.sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ca_mismatch
[ https://issues.apache.org/jira/browse/CASSANDRA-19175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-19175: Fix Version/s: 4.1.x > Test Failure: > dtest.sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ca_mismatch > --- > > Key: CASSANDRA-19175 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19175 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.1.x, 5.1-beta > > > [https://ci-cassandra.apache.org/job/Cassandra-trunk/1782/testReport/dtest.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/] > h3. > {code:java} > Error Message > assert False > Stacktrace > self = 0x7fca5921d050> def test_ca_mismatch(self): """CA mismatch should cause nodes > to fail to connect""" credNode1 = sslkeygen.generate_credentials("127.0.0.1") > credNode2 = sslkeygen.generate_credentials("127.0.0.2") # mismatching CA! > self.setup_nodes(credNode1, credNode2) > self.fixture_dtest_setup.allow_log_errors = True > self.cluster.start(no_wait=True) found = self._grep_msg(self.node1, > _LOG_ERR_HANDSHAKE) self.cluster.stop() > assert found E assert False > sslnodetonode_test.py:115: AssertionError > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19104) Standardize tablestats formatting and data units
[ https://issues.apache.org/jira/browse/CASSANDRA-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793879#comment-17793879 ] Brad Schoening edited comment on CASSANDRA-19104 at 12/6/23 6:47 PM: - Ok, I suppose for storage KiB might make sense as the lowest reportable units. Jacek's example is ambiguous, but I'd say with a leading zero before the decimal point, there should be three significant digits afterwards, so 0.000 KiB (which is the maximum resolution available since bytes are unitary) {noformat} Example Bytes repaired: 0.00 KiB Bytes unrepaired: 4.31 TiB Bytes pending repair: 0.000 KiB {noformat} >From Wikipedia [Significant >Figures|https://en.wikipedia.org/wiki/Significant_figures] {quote}[Leading zeros|https://en.wikipedia.org/wiki/Leading_zero]. For instance, 013 kg has two significant figures—1 and 3—while the leading zero is insignificant since it does not impact the mass indication; 013 kg is equivalent to 13 kg, rendering the zero unnecessary. Similarly, in the case of 0.056 m, there are two insignificant leading zeros since 0.056 m is the same as 56 mm, thus the leading zeros do not contribute to the length indication. {quote} For storage units in bytes, we should probably use 0.001 KiB (one byte) and 0.000 KiB (zero bytes), 0.01 KiB (10 bytes) was (Author: bschoeni): Ok, I suppose for storage KiB might make sense as the lowest reportable units. Jacek's example is ambiguous, but I'd say with a leading zero before the decimal point, there should be three significant digits afterwards, so 0.000 KiB (which is the maximum resolution available since bytes are unitary) {noformat} Example Bytes repaired: 0.00 KiB Bytes unrepaired: 4.31 TiB Bytes pending repair: 0.000 KiB {noformat} >From Wikipedia [Significant >Figures|https://en.wikipedia.org/wiki/Significant_figures] {quote}[Leading zeros|https://en.wikipedia.org/wiki/Leading_zero]. For instance, 013 kg has two significant figures—1 and 3—while the leading zero is insignificant since it does not impact the mass indication; 013 kg is equivalent to 13 kg, rendering the zero unnecessary. Similarly, in the case of 0.056 m, there are two insignificant leading zeros since 0.056 m is the same as 56 mm, thus the leading zeros do not contribute to the length indication. {quote} For storage and bytes, we should probably use 0.001 KiB (one byte) and 0.000 KiB (zero bytes), 0.01 KiB (10 bytes) > Standardize tablestats formatting and data units > > > Key: CASSANDRA-19104 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19104 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool >Reporter: Brad Schoening >Assignee: Leo Toff >Priority: Normal > > Tablestats reports output in plaintext, JSON or YAML. The human readable > output currently has a mix of KiB, bytes with inconsistent spacing > Considering simplifying and defaulting output to 'human readable'. Machine > readable output is available as an option and the current mixed output > formatting is neither friendly for human or machine reading. > !image-2023-11-27-13-49-14-247.png! > *Not a goal now (consider a follow up Jira):* > Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting: > * gcstats - uses MB > * getcompactionthroughput - uses MB/s > * getstreamthroughput - uses MB/s > * info - uses MiB/GiB -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19104) Standardize tablestats formatting and data units
[ https://issues.apache.org/jira/browse/CASSANDRA-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793879#comment-17793879 ] Brad Schoening edited comment on CASSANDRA-19104 at 12/6/23 6:46 PM: - Ok, I suppose for storage KiB might make sense as the lowest reportable units. Jacek's example is ambiguous, but I'd say with a leading zero before the decimal point, there should be three significant digits afterwards, so 0.000 KiB (which is the maximum resolution available since bytes are unitary) {noformat} Example Bytes repaired: 0.00 KiB Bytes unrepaired: 4.31 TiB Bytes pending repair: 0.000 KiB {noformat} >From Wikipedia [Significant >Figures|https://en.wikipedia.org/wiki/Significant_figures] {quote}[Leading zeros|https://en.wikipedia.org/wiki/Leading_zero]. For instance, 013 kg has two significant figures—1 and 3—while the leading zero is insignificant since it does not impact the mass indication; 013 kg is equivalent to 13 kg, rendering the zero unnecessary. Similarly, in the case of 0.056 m, there are two insignificant leading zeros since 0.056 m is the same as 56 mm, thus the leading zeros do not contribute to the length indication. {quote} For storage and bytes, we should probably use 0.001 KiB (one byte) and 0.000 KiB (zero bytes), 0.01 KiB (10 bytes) was (Author: bschoeni): Ok, I suppose for storage KiB might make sense as the lowest reportable units. Jacek's example is ambiguous, but I'd say with a leading zero before the decimal point, there should be three significant digits afterwards, so 0.000 KiB (which is the maximum resolution available since bytes are unitary) {noformat} Example Bytes repaired: 0.00 KiB Bytes unrepaired: 4.31 TiB Bytes pending repair: 0.000 KiB {noformat} >From Wikipedia [Significant >Figures|https://en.wikipedia.org/wiki/Significant_figures] {quote}[Leading zeros|https://en.wikipedia.org/wiki/Leading_zero]. For instance, 013 kg has two significant figures—1 and 3—while the leading zero is insignificant since it does not impact the mass indication; 013 kg is equivalent to 13 kg, rendering the zero unnecessary. Similarly, in the case of 0.056 m, there are two insignificant leading zeros since 0.056 m is the same as 56 mm, thus the leading zeros do not contribute to the length indication. {quote} > Standardize tablestats formatting and data units > > > Key: CASSANDRA-19104 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19104 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool >Reporter: Brad Schoening >Assignee: Leo Toff >Priority: Normal > > Tablestats reports output in plaintext, JSON or YAML. The human readable > output currently has a mix of KiB, bytes with inconsistent spacing > Considering simplifying and defaulting output to 'human readable'. Machine > readable output is available as an option and the current mixed output > formatting is neither friendly for human or machine reading. > !image-2023-11-27-13-49-14-247.png! > *Not a goal now (consider a follow up Jira):* > Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting: > * gcstats - uses MB > * getcompactionthroughput - uses MB/s > * getstreamthroughput - uses MB/s > * info - uses MiB/GiB -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19104) Standardize tablestats formatting and data units
[ https://issues.apache.org/jira/browse/CASSANDRA-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793879#comment-17793879 ] Brad Schoening commented on CASSANDRA-19104: Ok, I suppose for storage KiB might make sense as the lowest reportable units. Jacek's example is ambiguous, but I'd say with a leading zero before the decimal point, there should be three significant digits afterwards, so 0.000 KiB {noformat} Example Bytes repaired: 0.00 KiB Bytes unrepaired: 4.31 TiB Bytes pending repair: 0.000 KiB {noformat} >From Wikipedia [Significant >Figures|https://en.wikipedia.org/wiki/Significant_figures] {quote}[Leading zeros|https://en.wikipedia.org/wiki/Leading_zero]. For instance, 013 kg has two significant figures—1 and 3—while the leading zero is insignificant since it does not impact the mass indication; 013 kg is equivalent to 13 kg, rendering the zero unnecessary. Similarly, in the case of 0.056 m, there are two insignificant leading zeros since 0.056 m is the same as 56 mm, thus the leading zeros do not contribute to the length indication.{quote} > Standardize tablestats formatting and data units > > > Key: CASSANDRA-19104 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19104 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool >Reporter: Brad Schoening >Assignee: Leo Toff >Priority: Normal > > Tablestats reports output in plaintext, JSON or YAML. The human readable > output currently has a mix of KiB, bytes with inconsistent spacing > Considering simplifying and defaulting output to 'human readable'. Machine > readable output is available as an option and the current mixed output > formatting is neither friendly for human or machine reading. > !image-2023-11-27-13-49-14-247.png! > *Not a goal now (consider a follow up Jira):* > Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting: > * gcstats - uses MB > * getcompactionthroughput - uses MB/s > * getstreamthroughput - uses MB/s > * info - uses MiB/GiB -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19104) Standardize tablestats formatting and data units
[ https://issues.apache.org/jira/browse/CASSANDRA-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793879#comment-17793879 ] Brad Schoening edited comment on CASSANDRA-19104 at 12/6/23 6:42 PM: - Ok, I suppose for storage KiB might make sense as the lowest reportable units. Jacek's example is ambiguous, but I'd say with a leading zero before the decimal point, there should be three significant digits afterwards, so 0.000 KiB (which is the maximum resolution available since bytes are unitary) {noformat} Example Bytes repaired: 0.00 KiB Bytes unrepaired: 4.31 TiB Bytes pending repair: 0.000 KiB {noformat} >From Wikipedia [Significant >Figures|https://en.wikipedia.org/wiki/Significant_figures] {quote}[Leading zeros|https://en.wikipedia.org/wiki/Leading_zero]. For instance, 013 kg has two significant figures—1 and 3—while the leading zero is insignificant since it does not impact the mass indication; 013 kg is equivalent to 13 kg, rendering the zero unnecessary. Similarly, in the case of 0.056 m, there are two insignificant leading zeros since 0.056 m is the same as 56 mm, thus the leading zeros do not contribute to the length indication. {quote} was (Author: bschoeni): Ok, I suppose for storage KiB might make sense as the lowest reportable units. Jacek's example is ambiguous, but I'd say with a leading zero before the decimal point, there should be three significant digits afterwards, so 0.000 KiB {noformat} Example Bytes repaired: 0.00 KiB Bytes unrepaired: 4.31 TiB Bytes pending repair: 0.000 KiB {noformat} >From Wikipedia [Significant >Figures|https://en.wikipedia.org/wiki/Significant_figures] {quote}[Leading zeros|https://en.wikipedia.org/wiki/Leading_zero]. For instance, 013 kg has two significant figures—1 and 3—while the leading zero is insignificant since it does not impact the mass indication; 013 kg is equivalent to 13 kg, rendering the zero unnecessary. Similarly, in the case of 0.056 m, there are two insignificant leading zeros since 0.056 m is the same as 56 mm, thus the leading zeros do not contribute to the length indication.{quote} > Standardize tablestats formatting and data units > > > Key: CASSANDRA-19104 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19104 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool >Reporter: Brad Schoening >Assignee: Leo Toff >Priority: Normal > > Tablestats reports output in plaintext, JSON or YAML. The human readable > output currently has a mix of KiB, bytes with inconsistent spacing > Considering simplifying and defaulting output to 'human readable'. Machine > readable output is available as an option and the current mixed output > formatting is neither friendly for human or machine reading. > !image-2023-11-27-13-49-14-247.png! > *Not a goal now (consider a follow up Jira):* > Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting: > * gcstats - uses MB > * getcompactionthroughput - uses MB/s > * getstreamthroughput - uses MB/s > * info - uses MiB/GiB -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] Cassandra 18852: Make bulk writer resilient to cluster resize events [cassandra-analytics]
arjunashok commented on code in PR #17: URL: https://github.com/apache/cassandra-analytics/pull/17#discussion_r1417772899 ## cassandra-analytics-integration-framework/src/main/java/org/apache/cassandra/testing/CassandraIntegrationTest.java: ## @@ -59,6 +59,13 @@ */ int numDcs() default 1; +/** + * This is only applied in context of multi-DC tests. Returns true if the keyspace is replicated + * across multiple DCs. Defaults to {@code true} + * @return whether the multi-DC test uses a cross-DC keyspace + */ +boolean useCrossDcKeyspace() default true; Review Comment: Agreed. We can revisit this when we are refactoring these helpers to make them less error prone (minimize the no. params being passed). ## cassandra-analytics-integration-tests/src/test/java/org/apache/cassandra/analytics/expansion/JoiningBaseTest.java: ## @@ -0,0 +1,163 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.cassandra.analytics.expansion; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.Set; +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.TimeUnit; +import java.util.function.BiConsumer; + +import com.google.common.util.concurrent.Uninterruptibles; + +import com.datastax.driver.core.ConsistencyLevel; +import o.a.c.analytics.sidecar.shaded.testing.common.data.QualifiedTableName; +import org.apache.cassandra.analytics.ResiliencyTestBase; +import org.apache.cassandra.analytics.TestTokenSupplier; +import org.apache.cassandra.distributed.UpgradeableCluster; +import org.apache.cassandra.distributed.api.Feature; +import org.apache.cassandra.distributed.api.IUpgradeableInstance; +import org.apache.cassandra.distributed.api.TokenSupplier; +import org.apache.cassandra.distributed.shared.ClusterUtils; +import org.apache.cassandra.testing.CassandraIntegrationTest; +import org.apache.cassandra.testing.ConfigurableCassandraTestContext; + +import static junit.framework.TestCase.assertNotNull; +import static org.assertj.core.api.Assertions.assertThat; + +public class JoiningBaseTest extends ResiliencyTestBase Review Comment: In these scenario specific base classes, we're only grouping common functionality instead of defining a contract for subclasses to implement, so this seemed appropriate. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] Cassandra 18852: Make bulk writer resilient to cluster resize events [cassandra-analytics]
arjunashok commented on code in PR #17: URL: https://github.com/apache/cassandra-analytics/pull/17#discussion_r1417772294 ## cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/RecordWriter.java: ## @@ -132,37 +133,32 @@ public List write(Iterator> sourceI Map valueMap = new HashMap<>(); try { -List exclusions = failureHandler.getFailedInstances(); Set> newRanges = initialTokenRangeMapping.getRangeMap().asMapOfRanges().entrySet() .stream() - .filter(e -> !exclusions.contains(e.getValue())) .map(Map.Entry::getKey) .collect(Collectors.toSet()); +Range tokenRange = getTokenRange(taskContext); +Set> subRanges = newRanges.contains(tokenRange) ? + Collections.singleton(tokenRange) : + getIntersectingSubRanges(newRanges, tokenRange); while (dataIterator.hasNext()) { Tuple2 rowData = dataIterator.next(); -streamSession = maybeCreateStreamSession(taskContext, streamSession, rowData, newRanges, failureHandler); - -sessions.add(streamSession); +streamSession = maybeCreateStreamSession(taskContext, streamSession, rowData, subRanges, failureHandler, results); maybeCreateTableWriter(partitionId, baseDir); writeRow(rowData, valueMap, partitionId, streamSession.getTokenRange()); checkBatchSize(streamSession, partitionId, job); } -// Finalize SSTable for the last StreamSession -if (sstableWriter != null || (streamSession != null && batchSize != 0)) +// Cleanup SSTable writer and schedule the last stream Review Comment: Makes sense. ## cassandra-analytics-core/src/test/java/org/apache/cassandra/spark/bulkwriter/RecordWriterTest.java: ## @@ -346,19 +366,22 @@ void writeBuffered() private void validateSuccessfulWrite(MockBulkWriterContext writerContext, Iterator> data, - String[] columnNames) + String[] columnNames) throws InterruptedException { validateSuccessfulWrite(writerContext, data, columnNames, UPLOADED_TABLES); } private void validateSuccessfulWrite(MockBulkWriterContext writerContext, Iterator> data, String[] columnNames, - int uploadedTables) + int uploadedTables) throws InterruptedException { RecordWriter rw = new RecordWriter(writerContext, columnNames, () -> tc, SSTableWriter::new); rw.write(data); +// Wait for uploads to finish +Thread.sleep(500); Review Comment: In general, I agree with the flakiness introduced by sleep. This was added because when the entire test suite was executed, we did see the uploads not finishing before we look up the no. files that we uploaded. We could potentially use a latch in the `MockBulkWriterContext` to make this more deterministic. Will explore some more. ## cassandra-analytics-integration-framework/src/main/java/org/apache/cassandra/sidecar/testing/IntegrationTestModule.java: ## @@ -90,16 +88,25 @@ public InstanceMetadata instanceFromId(int id) throws NoSuchElementException * @return instance meta information * @throws NoSuchElementException when the instance for {@code host} does not exist */ +@Override public InstanceMetadata instanceFromHost(String host) throws NoSuchElementException { -return cassandraTestContext.instancesConfig.instanceFromHost(host); +return cassandraTestContext.instancesConfig().instanceFromHost(host); } } @Provides @Singleton public SidecarConfiguration sidecarConfiguration() { -return new SidecarConfigurationImpl(new ServiceConfigurationImpl("127.0.0.1")); +ServiceConfiguration conf = ServiceConfigurationImpl.builder() +.host("0.0.0.0") // binds to all interfaces, potential security issue if left running for long Review Comment: Will defer this one to @JeetKunDoug. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,
[jira] [Updated] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails with no internode encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-19178: - Resolution: Invalid Status: Resolved (was: Triage Needed) I don't see any debug logs here, examining the one on the other side of the 'Connection reset by peer' may reveal something. bq. Any idea on how to further investigate the issue? This jira is for the development of Apache Cassandra and as such, makes for a poor vehicle for support. We recommend contacting the community via slack or the ML instead: https://cassandra.apache.org/_/community.html If in the end you discover a bug then please come back and file an actionable report here. > Cluster upgrade 3.x -> 4.x fails with no internode encryption > - > > Key: CASSANDRA-19178 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19178 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Aldo >Priority: Normal > Attachments: cassandra7.downgrade.log, cassandra7.log > > > I have a Docker swarm cluster with 3 distinct Cassandra services (named > {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different > servers. The 3 services are running the version 3.11.16, using the official > Cassandra image 3.11.16 on Docker Hub. The first service is configured just > with the following environment variables > {code:java} > CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7" > CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code} > which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance > the _cassandra.yaml_ for the first service contains the following (and the > rest is the image default): > {code:java} > # grep tasks /etc/cassandra/cassandra.yaml > - seeds: "tasks.cassandra7,tasks.cassandra9" > listen_address: tasks.cassandra7 > broadcast_address: tasks.cassandra7 > broadcast_rpc_address: tasks.cassandra7 {code} > Other services (8 and 9) have a similar configuration, obviously with a > different {{CASSANDRA_LISTEN_ADDRESS }}(\{{{}tasks.cassandra8}} and > {{{}tasks.cassandra9{}}}). > The cluster is running smoothly and all the nodes are perfectly able to > rejoin the cluster whichever event occurs, thanks to the Docker Swarm > {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for > Docker swarm to restart it, force update it in order to force a restart, > scale to 0 and then 1 the service, restart an entire server, turn off and > then turn on all the 3 servers. Never found an issue on this. > I also just completed a full upgrade of the cluster from version 2.2.8 to > 3.11.16 (simply upgrading the Docker official image associated with the > services) without issues. I was also able, thanks to a 2.2.8 snapshot on each > server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I > finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables > have now the {{me-*}} prefix. > > The problem I'm facing right now is the upgrade from 3.11.16 to 4.x. The > procedure that I follow is very simple: > # I start from the _cassandra7_ service (which is a seed node) > # {{nodetool drain}} > # Wait for the {{DRAINING ... DRAINED}} messages to appear in the log > # Upgrade the Docker image of _cassandra7_ to the official 4.1.3 version > The procedure is exactly the same I followed for the upgrade 2.2.8 --> > 3.11.16, obviously with a different version at step 4. Unfortunately the > upgrade 3.x --> 4.x is not working, the _cassandra7_ service restarts and > attempts to communicate with the other seed node ({_}cassandra9{_}) but the > log of _cassandra7_ shows the following: > {code:java} > INFO [Messaging-EventLoop-3-3] 2023-12-06 17:15:04,727 > OutboundConnectionInitiator.java:390 - Failed to connect to peer > tasks.cassandra9/10.0.2.196:7000(tasks.cassandra9/10.0.2.196:7000) > io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: > Connection reset by peer{code} > The relevant port of the log, related to the missing internode communication, > is attached in _cassandra7.log_ > In the log of _cassandra9_ there is nothing after the abovementioned step #4. > So only _cassandra7_ is saying something in the logs. > I tried with multiple versions (4.0.11 but also 4.0.0) but the outcome is > always the same. Of course when I follow the steps 1..3, then restore the 3.x > snapshot and finally perform the step #4 using the official 3.11.16 version > the node 7 restarts correctly and joins the cluster. I attached the relevant > part of the log (see {_}cassandra7.downgrade.log{_}) where you can see that > node 7 and 9 can communicate. > I suspect this could be related to the port 7000 now (with Cassandra 4.x) > supporting
[jira] [Updated] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails with no internode encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aldo updated CASSANDRA-19178: - Description: I have a Docker swarm cluster with 3 distinct Cassandra services (named {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different servers. The 3 services are running the version 3.11.16, using the official Cassandra image 3.11.16 on Docker Hub. The first service is configured just with the following environment variables {code:java} CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7" CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code} which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance the _cassandra.yaml_ for the first service contains the following (and the rest is the image default): {code:java} # grep tasks /etc/cassandra/cassandra.yaml - seeds: "tasks.cassandra7,tasks.cassandra9" listen_address: tasks.cassandra7 broadcast_address: tasks.cassandra7 broadcast_rpc_address: tasks.cassandra7 {code} Other services (8 and 9) have a similar configuration, obviously with a different {{CASSANDRA_LISTEN_ADDRESS }}(\{{{}tasks.cassandra8}} and {{{}tasks.cassandra9{}}}). The cluster is running smoothly and all the nodes are perfectly able to rejoin the cluster whichever event occurs, thanks to the Docker Swarm {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for Docker swarm to restart it, force update it in order to force a restart, scale to 0 and then 1 the service, restart an entire server, turn off and then turn on all the 3 servers. Never found an issue on this. I also just completed a full upgrade of the cluster from version 2.2.8 to 3.11.16 (simply upgrading the Docker official image associated with the services) without issues. I was also able, thanks to a 2.2.8 snapshot on each server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables have now the {{me-*}} prefix. The problem I'm facing right now is the upgrade from 3.11.16 to 4.x. The procedure that I follow is very simple: # I start from the _cassandra7_ service (which is a seed node) # {{nodetool drain}} # Wait for the {{DRAINING ... DRAINED}} messages to appear in the log # Upgrade the Docker image of _cassandra7_ to the official 4.1.3 version The procedure is exactly the same I followed for the upgrade 2.2.8 --> 3.11.16, obviously with a different version at step 4. Unfortunately the upgrade 3.x --> 4.x is not working, the _cassandra7_ service restarts and attempts to communicate with the other seed node ({_}cassandra9{_}) but the log of _cassandra7_ shows the following: {code:java} INFO [Messaging-EventLoop-3-3] 2023-12-06 17:15:04,727 OutboundConnectionInitiator.java:390 - Failed to connect to peer tasks.cassandra9/10.0.2.196:7000(tasks.cassandra9/10.0.2.196:7000) io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer{code} The relevant port of the log, related to the missing internode communication, is attached in _cassandra7.log_ In the log of _cassandra9_ there is nothing after the abovementioned step #4. So only _cassandra7_ is saying something in the logs. I tried with multiple versions (4.0.11 but also 4.0.0) but the outcome is always the same. Of course when I follow the steps 1..3, then restore the 3.x snapshot and finally perform the step #4 using the official 3.11.16 version the node 7 restarts correctly and joins the cluster. I attached the relevant part of the log (see {_}cassandra7.downgrade.log{_}) where you can see that node 7 and 9 can communicate. I suspect this could be related to the port 7000 now (with Cassandra 4.x) supporting both encrypted and unencrypted traffic. As stated previously I'm using the untouched official Cassandra images so all my cluster, inside the Docker Swarm, is not (and has never been) configured with encryption. I can also add the following: if I perform the 4 above steps also for the _cassandra9_ and _cassandra8_ services, in the end the cluster works. But this is not acceptable, because the cluster is unavailable until I finish the full upgrade of all nodes: I need to perform a step-update, one node after the other, where only 1 node is temporarily down and the other N-1 stay up. Any idea on how to further investigate the issue? Thanks was: I have a Docker swarm cluster with 3 distinct Cassandra services (named {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different servers. The 3 services are running the version 3.11.16, using the official Cassandra image 3.11.16 on Docker Hub. The first service is configured just with the following environment variables {code:java} CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7" CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code} which in turn, at startup, modifies
Re: [PR] CASSANDRA-18969: source files missing from sources jars due to maven … [cassandra-java-driver]
hhughes commented on PR #1900: URL: https://github.com/apache/cassandra-java-driver/pull/1900#issuecomment-1843370589 @michaelsembwever done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails with no internode encryption
Aldo created CASSANDRA-19178: Summary: Cluster upgrade 3.x -> 4.x fails with no internode encryption Key: CASSANDRA-19178 URL: https://issues.apache.org/jira/browse/CASSANDRA-19178 Project: Cassandra Issue Type: Bug Components: Cluster/Gossip Reporter: Aldo Attachments: cassandra7.downgrade.log, cassandra7.log I have a Docker swarm cluster with 3 distinct Cassandra services (named {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different servers. The 3 services are running the version 3.11.16, using the official Cassandra image 3.11.16 on Docker Hub. The first service is configured just with the following environment variables {code:java} CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7" CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code} which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance the _cassandra.yaml_ for the first service contains the following (and the rest is the image default): {code:java} # grep tasks /etc/cassandra/cassandra.yaml - seeds: "tasks.cassandra7,tasks.cassandra9" listen_address: tasks.cassandra7 broadcast_address: tasks.cassandra7 broadcast_rpc_address: tasks.cassandra7 {code} Other services (8 and 9) have a similar configuration, obviously with a different {{CASSANDRA_LISTEN_ADDRESS }}({{{}tasks.cassandra8{}}} and {{{}tasks.cassandra9{}}}). The cluster is running smoothly and all the nodes are perfectly able to rejoin the cluster whichever event occurs, thanks to the Docker Swarm {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for Docker swarm to restart it, force update it in order to force a restart, scale to 0 and then 1 the service, restart an entire server, turn off and then turn on all the 3 servers. Never found an issue on this. I also just completed a full upgrade of the cluster from version 2.2.8 to 3.11.16 (simply upgrading the Docker official image associated with the services) without issues. I was also able, thanks to a 2.2.8 snapshot on each server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables have now the {{me-*}} prefix. The problem I'm facing right now is the upgrade from 3.11.16 to 4.x. The procedure that I follow is very simple: # I start from the _cassandra7_ service (which is a seed node) # {{nodetool drain}} # Wait for the {{DRAINING ... DRAINED}} messages to appear in the log # Upgrade the Docker image of _cassandra7_ to the official 4.1.3 version The procedure is exactly the same I followed for the upgrade 2.2.8 --> 3.11.16, obviously with a different version at step 4. Unfortunately the upgrade 3.x --> 4.x is not working, the _cassandra7_ service restarts and attempts to communicate with the other seed node ({_}cassandra9{_}) but the log of _cassandra7_ shows the following: {code:java} INFO [Messaging-EventLoop-3-3] 2023-12-06 17:15:04,727 OutboundConnectionInitiator.java:390 - Failed to connect to peer tasks.cassandra9/10.0.2.196:7000(tasks.cassandra9/10.0.2.196:7000) io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer{code} The relevant port of the log, related to the missing internode communication, is attached in _cassandra7.log_ In the log of _cassandra9_ there is nothing after the abovementioned step #4. So only _cassandra7_ is saying something in the logs. I tried with multiple versions (4.0.11 but also 4.0.0) but the outcome is always the same. Of course when I follow the steps 1..3, then restore the 3.x snapshot and finally perform the step #4 using the official 3.11.16 version the node 7 restarts correctly and joins the cluster. I attached the relevant part of the log (see {_}cassandra7.downgrade.log{_}) where you can see that node 7 and 9 can communicate. I suspect this could be related to the port 7000 now (with Cassandra 4.x) supporting both encrypted and unencrypted traffic. As stated previously I'm using the untouched official Cassandra images so all my cluster, inside the Docker Swarm, is not (and has never been) configured with encryption. Any idea on how to further investigate the issue? Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] Cassandra 18852: Make bulk writer resilient to cluster resize events [cassandra-analytics]
yifan-c commented on code in PR #17: URL: https://github.com/apache/cassandra-analytics/pull/17#discussion_r1416591381 ## cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/RecordWriter.java: ## @@ -207,46 +203,62 @@ private Set instancesFromMapping(Map, List rowData, - Set> newRanges, - ReplicaAwareFailureHandler failureHandler) throws IOException + Set> subRanges, + ReplicaAwareFailureHandler failureHandler, + List results) +throws IOException, ExecutionException, InterruptedException { BigInteger token = rowData._1().getToken(); Range tokenRange = getTokenRange(taskContext); Preconditions.checkState(tokenRange.contains(token), String.format("Received Token %s outside of expected range %s", token, tokenRange)); -// token range for this partition is not among the write-replica-set ranges -if (!newRanges.contains(tokenRange)) +// We have split ranges likely resulting from pending nodes +// Evaluate creating a new session if the token from current row is part of a sub-range +if (subRanges.size() > 1) { -Set> subRanges = getIntersectingSubRanges(newRanges, tokenRange); -// We have split ranges - likely resulting from pending nodes -if (subRanges.size() > 1) -{ -// Create session using sub-range that contains the token from current row -Range matchingRange = subRanges.stream().filter(r -> r.contains(token)).findFirst().get(); -Preconditions.checkState(matchingRange != null, - String.format("Received Token %s outside of expected range %s", token, matchingRange)); +// Create session using sub-range that contains the token from current row +Range matchingSubRange = subRanges.stream().filter(r -> r.contains(token)).findFirst().get(); +Preconditions.checkState(matchingSubRange != null, + String.format("Received Token %s outside of expected range %s", token, matchingSubRange)); Review Comment: The `checkState` will not ever see `matchingSubRange == null`. The reason is that at line#222, if the value is null, the `get()` operation throws exception already. If the intent to provide a more user friendly error message, can you not call `get()` and use `Optional> matchingSubRangeOpt` to capture the result and run `checkState` on the optional. ## cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/RecordWriter.java: ## @@ -132,37 +133,32 @@ public List write(Iterator> sourceI Map valueMap = new HashMap<>(); try { -List exclusions = failureHandler.getFailedInstances(); Set> newRanges = initialTokenRangeMapping.getRangeMap().asMapOfRanges().entrySet() .stream() - .filter(e -> !exclusions.contains(e.getValue())) .map(Map.Entry::getKey) .collect(Collectors.toSet()); +Range tokenRange = getTokenRange(taskContext); +Set> subRanges = newRanges.contains(tokenRange) ? + Collections.singleton(tokenRange) : + getIntersectingSubRanges(newRanges, tokenRange); while (dataIterator.hasNext()) { Tuple2 rowData = dataIterator.next(); -streamSession = maybeCreateStreamSession(taskContext, streamSession, rowData, newRanges, failureHandler); - -sessions.add(streamSession); +streamSession = maybeCreateStreamSession(taskContext, streamSession, rowData, subRanges, failureHandler, results); maybeCreateTableWriter(partitionId, baseDir); writeRow(rowData, valueMap, partitionId, streamSession.getTokenRange()); checkBatchSize(streamSession, partitionId, job); } -// Finalize SSTable for the last StreamSession -if (sstableWriter != null || (streamSession != null && batchSize != 0)) +// Cleanup SSTable writer and schedule the last stream Review Comment: "Cleanup SSTable writer" reads wrong to me. I would stick with "Finalize". The code is to flush any data to sstable by closing the writer. Cleanup leads me to
[jira] [Commented] (CASSANDRA-19166) StackOverflowError on ALTER after many previous schema changes
[ https://issues.apache.org/jira/browse/CASSANDRA-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793852#comment-17793852 ] Abe Ratnofsky commented on CASSANDRA-19166: --- Thank you [~jlewandowski]! > StackOverflowError on ALTER after many previous schema changes > -- > > Key: CASSANDRA-19166 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19166 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Abe Ratnofsky >Assignee: Abe Ratnofsky >Priority: Normal > Fix For: 4.1.4, 5.0-rc > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Since 4.1, TableMetadataRefCache re-wraps its fields in > Collections.unmodifiableMap on every local schema update. This causes > TableMetadataRefCache's Map fields to reference chains of nested > UnmodifiableMaps. Eventually, this leads to a StackOverflowError on get(), > which has to traverse lots of these maps to fetch the actual value. > https://github.com/apache/cassandra/blob/4059faf5b948c5a285c25fb0f2e4c4288ee7c305/src/java/org/apache/cassandra/schema/TableMetadataRefCache.java#L53 > The issue goes away on restart, since TableMetadataRefCache is reloaded from > disk. > See CASSANDRA-17044, when TableMetadataRefCache was introduced. This issue > was discovered on a real test cluster where schema changes were failing, via > a heap dump. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19166) StackOverflowError on ALTER after many previous schema changes
[ https://issues.apache.org/jira/browse/CASSANDRA-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacek Lewandowski updated CASSANDRA-19166: -- Since Version: 4.1.0 Source Control Link: https://github.com/apache/cassandra/commit/a443990bfa64e239810876121f2877064f2d9ae8 Resolution: Fixed Status: Resolved (was: Ready to Commit) Committed, thank you for the patch [~aratnofsky] ! > StackOverflowError on ALTER after many previous schema changes > -- > > Key: CASSANDRA-19166 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19166 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Abe Ratnofsky >Assignee: Abe Ratnofsky >Priority: Normal > Fix For: 4.1.x, 5.0-rc > > Time Spent: 50m > Remaining Estimate: 0h > > Since 4.1, TableMetadataRefCache re-wraps its fields in > Collections.unmodifiableMap on every local schema update. This causes > TableMetadataRefCache's Map fields to reference chains of nested > UnmodifiableMaps. Eventually, this leads to a StackOverflowError on get(), > which has to traverse lots of these maps to fetch the actual value. > https://github.com/apache/cassandra/blob/4059faf5b948c5a285c25fb0f2e4c4288ee7c305/src/java/org/apache/cassandra/schema/TableMetadataRefCache.java#L53 > The issue goes away on restart, since TableMetadataRefCache is reloaded from > disk. > See CASSANDRA-17044, when TableMetadataRefCache was introduced. This issue > was discovered on a real test cluster where schema changes were failing, via > a heap dump. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19166) StackOverflowError on ALTER after many previous schema changes
[ https://issues.apache.org/jira/browse/CASSANDRA-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacek Lewandowski updated CASSANDRA-19166: -- Fix Version/s: 4.1.4 (was: 4.1.x) > StackOverflowError on ALTER after many previous schema changes > -- > > Key: CASSANDRA-19166 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19166 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Abe Ratnofsky >Assignee: Abe Ratnofsky >Priority: Normal > Fix For: 4.1.4, 5.0-rc > > Time Spent: 50m > Remaining Estimate: 0h > > Since 4.1, TableMetadataRefCache re-wraps its fields in > Collections.unmodifiableMap on every local schema update. This causes > TableMetadataRefCache's Map fields to reference chains of nested > UnmodifiableMaps. Eventually, this leads to a StackOverflowError on get(), > which has to traverse lots of these maps to fetch the actual value. > https://github.com/apache/cassandra/blob/4059faf5b948c5a285c25fb0f2e4c4288ee7c305/src/java/org/apache/cassandra/schema/TableMetadataRefCache.java#L53 > The issue goes away on restart, since TableMetadataRefCache is reloaded from > disk. > See CASSANDRA-17044, when TableMetadataRefCache was introduced. This issue > was discovered on a real test cluster where schema changes were failing, via > a heap dump. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) branch cassandra-4.1 updated (13e5956285 -> a443990bfa)
This is an automated email from the ASF dual-hosted git repository. jlewandowski pushed a change to branch cassandra-4.1 in repository https://gitbox.apache.org/repos/asf/cassandra.git from 13e5956285 Merge branch 'cassandra-4.0' into cassandra-4.1 add a443990bfa Fix StackOverflowError on ALTER after many previous schema changes No new revisions were added by this update. Summary of changes: CHANGES.txt | 1 + .../cassandra/schema/TableMetadataRefCache.java | 20 +--- 2 files changed, 14 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) branch cassandra-5.0 updated (fdfc5e614d -> 676f7ee751)
This is an automated email from the ASF dual-hosted git repository. jlewandowski pushed a change to branch cassandra-5.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git from fdfc5e614d Merge branch 'cassandra-4.1' into cassandra-5.0 add a443990bfa Fix StackOverflowError on ALTER after many previous schema changes add 676f7ee751 Merge branch 'cassandra-4.1' into cassandra-5.0 No new revisions were added by this update. Summary of changes: CHANGES.txt | 2 ++ .../cassandra/schema/TableMetadataRefCache.java | 20 +--- 2 files changed, 15 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) 01/01: Merge branch 'cassandra-5.0' into trunk
This is an automated email from the ASF dual-hosted git repository. jlewandowski pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git commit ea1f9e4504cec1849a96c4b8eac962783662fcd8 Merge: ad86c9d201 676f7ee751 Author: Jacek Lewandowski AuthorDate: Wed Dec 6 18:05:49 2023 +0100 Merge branch 'cassandra-5.0' into trunk * cassandra-5.0: Fix StackOverflowError on ALTER after many previous schema changes CHANGES.txt | 1 + 1 file changed, 1 insertion(+) diff --cc CHANGES.txt index 6fcb72dcf1,c69b0e1234..60879f9d64 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -302,6 -291,6 +302,7 @@@ Merged from 3.0 4.1.4 ++ * Fix StackOverflowError on ALTER after many previous schema changes (CASSANDRA-19166) Merged from 4.0: * Fix NTS log message when an unrecognized strategy option is passed (CASSANDRA-18679) * Fix BulkLoader ignoring cipher suites options (CASSANDRA-18582) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) branch trunk updated (ad86c9d201 -> ea1f9e4504)
This is an automated email from the ASF dual-hosted git repository. jlewandowski pushed a change to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git from ad86c9d201 Merge branch 'cassandra-5.0' into trunk add a443990bfa Fix StackOverflowError on ALTER after many previous schema changes add 676f7ee751 Merge branch 'cassandra-4.1' into cassandra-5.0 new ea1f9e4504 Merge branch 'cassandra-5.0' into trunk The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: CHANGES.txt | 1 + 1 file changed, 1 insertion(+) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19120) local consistencies may get timeout if blocking read repair is sending the read repair mutation to other DC
[ https://issues.apache.org/jira/browse/CASSANDRA-19120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793846#comment-17793846 ] Runtian Liu commented on CASSANDRA-19120: - The test you added is not right or at least not testing the bug we are discussing. The pending node should be joining the first DC instead of second DC (1) which is causing one more remote node to be added to the repairs map (This part is good in your test(2)) If the pending node is joining the first DC, the blockFor will be 3 instead of 2. (3) Since no node is repairs satisfied the if condition (4), the latch will be initialized with 3. So the handler.waitingOn should be 3 instead of 2. (5) Without any change to the latch part, we will run into the timeout error because ack node4 won't do latch count down. (1) [https://github.com/instaclustr/cassandra/commit/853ced996d3637109bf1e183092f0bd9cbb180ca#diff-1ddca3571de225b02568519eada4b76eb136b84c4cc25f061d5c1f806f0fe145R332] (2) [https://github.com/instaclustr/cassandra/commit/853ced996d3637109bf1e183092f0bd9cbb180ca#diff-1ddca3571de225b02568519eada4b76eb136b84c4cc25f061d5c1f806f0fe145R309-R312] (3) [https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L83] (4) [https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L90] (5) [https://github.com/instaclustr/cassandra/commit/853ced996d3637109bf1e183092f0bd9cbb180ca#diff-1ddca3571de225b02568519eada4b76eb136b84c4cc25f061d5c1f806f0fe145R343] > local consistencies may get timeout if blocking read repair is sending the > read repair mutation to other DC > > > Key: CASSANDRA-19120 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19120 > Project: Cassandra > Issue Type: Bug >Reporter: Runtian Liu >Priority: Normal > Attachments: image-2023-11-29-15-26-08-056.png, signature.asc > > Time Spent: 10m > Remaining Estimate: 0h > > For a two DCs cluster setup. When a new node is being added to DC1, for > blocking read repair triggered by local_quorum in DC1, it will require to > send read repair mutation to an extra node(1)(2). The selector for read > repair may select *ANY* node that has not been contacted before(3) instead of > selecting the DC1 nodes. If a node from DC2 is selected, this will cause 100% > timeout because of the bug described below: > When we initialized the latch(4) for blocking read repair, the shouldBlockOn > function will only return true for local nodes(5), the blockFor value will be > reduced if a local node doesn't require repair(6). The blockFor is same as > the number of read repair mutation sent out. But when the coordinator node > receives the response from the target nodes, the latch only count down for > nodes in same DC(7). The latch will wait till timeout and the read request > will timeout. > This can be reproduced if you have a constant load on a 3 + 3 cluster when > adding a node. If you have someway to trigger blocking read repair(maybe by > adding load using stress tool). If you use local_quorum consistency with a > constant read after write load in the same DC that you are adding node. You > will see read timeout issue from time to time because of the bug described > above > > I think for read repair when selecting the extra node to do repair, we should > prefer local nodes than the nodes from other region. Also, we need to fix the > latch part so even if we send mutation to the nodes in other DC, we don't get > a timeout. > (1)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L455] > (2)[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ConsistencyLevel.java#L183] > (3)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L458] > (4)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L96] > (5)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L71] > (6)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L88] > (7)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L113] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail:
(cassandra-website) branch asf-staging updated (3c2f600d -> 606e216c)
This is an automated email from the ASF dual-hosted git repository. git-site-role pushed a change to branch asf-staging in repository https://gitbox.apache.org/repos/asf/cassandra-website.git omit 3c2f600d generate docs for afd7ef66 new 606e216c generate docs for afd7ef66 This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (3c2f600d) \ N -- N -- N refs/heads/asf-staging (606e216c) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: content/search-index.js | 2 +- site-ui/build/ui-bundle.zip | Bin 4883726 -> 4883726 bytes 2 files changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19009) CEP-15: (C*/Accord) Schema based fast path reconfiguration
[ https://issues.apache.org/jira/browse/CASSANDRA-19009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Blake Eggleston updated CASSANDRA-19009: Status: Review In Progress (was: Patch Available) > CEP-15: (C*/Accord) Schema based fast path reconfiguration > --- > > Key: CASSANDRA-19009 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19009 > Project: Cassandra > Issue Type: Improvement > Components: Accord >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Normal > Fix For: 5.0.x > > > This adds availability aware accord fast path reconfiguration, as well as > user configurable fast path settings, which are set at the keyspace level and > (optionally) at the table level for increased granularity. > The major parts are: > *Add availability information to cluster metadata* > Accord topology in C* is not stored in cluster metadata, but is meant to > calculated deterministically from cluster metadata state at a given epoch. > This adds the availability data, as well as the failure detector / gossip > listener and state change deduplication to CMS. > *Move C* accord keys/topology from keyspace prefixes to tableid prefixes* > To support per-table fast path settings, topologies and keys need to include > the table id. Since accord topologies could begin to consume a lot of memory > in clusters with a lot of nodes and tables, topology generation has been > updated to reuse previously allocated shards / shard parts where possible, > which will only increase heap sizes when things actually change. > *Make fast path settings configurable via schema* > There are 2.5 strategies: Simple, Parameterized, and InheritKeyspaceSettings. > Simple will use as many available nodes as possible for the fast path > electorate, this is the default for the keyspace fast path strategy. > Parameterized allows you to set a target size, and preferred datacenters for > the FP electorate. InheritKeyspace tells topology generation to just use the > keyspace fast path settings, and is the default for the table fast path > strategy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19009) CEP-15: (C*/Accord) Schema based fast path reconfiguration
[ https://issues.apache.org/jira/browse/CASSANDRA-19009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Blake Eggleston updated CASSANDRA-19009: Status: Ready to Commit (was: Review In Progress) > CEP-15: (C*/Accord) Schema based fast path reconfiguration > --- > > Key: CASSANDRA-19009 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19009 > Project: Cassandra > Issue Type: Improvement > Components: Accord >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Normal > Fix For: 5.0.x > > > This adds availability aware accord fast path reconfiguration, as well as > user configurable fast path settings, which are set at the keyspace level and > (optionally) at the table level for increased granularity. > The major parts are: > *Add availability information to cluster metadata* > Accord topology in C* is not stored in cluster metadata, but is meant to > calculated deterministically from cluster metadata state at a given epoch. > This adds the availability data, as well as the failure detector / gossip > listener and state change deduplication to CMS. > *Move C* accord keys/topology from keyspace prefixes to tableid prefixes* > To support per-table fast path settings, topologies and keys need to include > the table id. Since accord topologies could begin to consume a lot of memory > in clusters with a lot of nodes and tables, topology generation has been > updated to reuse previously allocated shards / shard parts where possible, > which will only increase heap sizes when things actually change. > *Make fast path settings configurable via schema* > There are 2.5 strategies: Simple, Parameterized, and InheritKeyspaceSettings. > Simple will use as many available nodes as possible for the fast path > electorate, this is the default for the keyspace fast path strategy. > Parameterized allows you to set a target size, and preferred datacenters for > the FP electorate. InheritKeyspace tells topology generation to just use the > keyspace fast path settings, and is the default for the table fast path > strategy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19175) Test Failure: dtest.sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ca_mismatch
[ https://issues.apache.org/jira/browse/CASSANDRA-19175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793835#comment-17793835 ] Brandon Williams commented on CASSANDRA-19175: -- I'm not able to reproduce this, and it makes sense that if the machine ran out of space a log message may be truncated, so this could have been environmental. > Test Failure: > dtest.sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ca_mismatch > --- > > Key: CASSANDRA-19175 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19175 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Priority: Normal > Fix For: 5.1-beta > > > [https://ci-cassandra.apache.org/job/Cassandra-trunk/1782/testReport/dtest.sslnodetonode_test/TestNodeToNodeSSLEncryption/test_ca_mismatch/] > h3. > {code:java} > Error Message > assert False > Stacktrace > self = 0x7fca5921d050> def test_ca_mismatch(self): """CA mismatch should cause nodes > to fail to connect""" credNode1 = sslkeygen.generate_credentials("127.0.0.1") > credNode2 = sslkeygen.generate_credentials("127.0.0.2") # mismatching CA! > self.setup_nodes(credNode1, credNode2) > self.fixture_dtest_setup.allow_log_errors = True > self.cluster.start(no_wait=True) found = self._grep_msg(self.node1, > _LOG_ERR_HANDSHAKE) self.cluster.stop() > assert found E assert False > sslnodetonode_test.py:115: AssertionError > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19084) Test Failure: IndexStreamingFailureTest.testAvailabilityAfterFailed*EntireFileStreaming
[ https://issues.apache.org/jira/browse/CASSANDRA-19084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793829#comment-17793829 ] Ekaterina Dimitrova commented on CASSANDRA-19084: - This seems to be known issue from the original ticket where the test class was introduced. Check the comments in CASSANDRA-18670 and this run: [https://app.circleci.com/pipelines/github/adelapena/cassandra/3045/workflows/02b94d07-ba00-457c-9d2c-c41a020bda01/jobs/61452/tests] CC [~maedhroz] , [~adelapena] and [~mikea] > Test Failure: > IndexStreamingFailureTest.testAvailabilityAfterFailed*EntireFileStreaming > --- > > Key: CASSANDRA-19084 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19084 > Project: Cassandra > Issue Type: Bug > Components: Feature/SAI >Reporter: Michael Semb Wever >Priority: Normal > Fix For: 5.0-rc, 5.x > > > Flakies > https://app.circleci.com/pipelines/github/adelapena/cassandra/3329/workflows/f2124edd-fa0e-4bc5-ab03-ddfb886bf015/jobs/93097/tests > {noformat} > java.lang.NullPointerException > at > java.base/sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:133) > at > java.base/sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:155) > at java.base/java.net.URL.openStream(URL.java:1165) > at > java.base/java.lang.ClassLoader.getResourceAsStream(ClassLoader.java:1739) > at > net.bytebuddy.dynamic.ClassFileLocator$ForClassLoader.locate(ClassFileLocator.java:453) > at > net.bytebuddy.dynamic.ClassFileLocator$ForClassLoader.locate(ClassFileLocator.java:434) > at > net.bytebuddy.dynamic.scaffold.TypeWriter$Default$ForInlining.create(TypeWriter.java:4009) > at > net.bytebuddy.dynamic.scaffold.TypeWriter$Default.make(TypeWriter.java:2224) > at > net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$UsingTypeWriter.make(DynamicType.java:4050) > at > net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase.make(DynamicType.java:3734) > at > net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$Delegator.make(DynamicType.java:3986) > at > org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest$ByteBuddyHelper.installErrors(IndexStreamingFailureTest.java:154) > at > org.apache.cassandra.distributed.shared.AbstractBuilder$1.initialise(AbstractBuilder.java:360) > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.newInstance(AbstractCluster.java:312) > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegateForStartup(AbstractCluster.java:292) > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.startup(AbstractCluster.java:410) > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.startup(AbstractCluster.java:383) > at > org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest.testAvailabilityAfterStreaming(IndexStreamingFailureTest.java:123) > at > org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest.testAvailabilityAfterFailedNonEntireFileStreaming(IndexStreamingFailureTest.java:79) > {noformat} > {noformat} > java.lang.IllegalStateException: Can't use shutdown instances, delegate is > null > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:285) > at > org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.transfer(DelegatingInvokableInstance.java:49) > at > org.apache.cassandra.distributed.api.IInvokableInstance.runsOnInstance(IInvokableInstance.java:45) > at > org.apache.cassandra.distributed.api.IInvokableInstance.runOnInstance(IInvokableInstance.java:46) > at > org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest.testAvailabilityAfterFailedEntireFileStreaming(IndexStreamingFailureTest.java:85) > {noformat} > https://ci-cassandra.apache.org/job/Cassandra-5.0/106/testReport/org.apache.cassandra.distributed.test.sai/IndexStreamingFailureTest/testAvailabilityAfterFailedNonEntireFileStreaming__jdk11_x86_64_novnode/ > {noformat} > java.lang.RuntimeException: The class file could not be written > at > net.bytebuddy.dynamic.scaffold.TypeWriter$Default$ForInlining.create(TypeWriter.java:4021) > at > net.bytebuddy.dynamic.scaffold.TypeWriter$Default.make(TypeWriter.java:2224) > at > net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$UsingTypeWriter.make(DynamicType.java:4050) > at > net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase.make(DynamicType.java:3734) > at > net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$Delegator.make(DynamicType.java:3986) > at >
[jira] [Updated] (CASSANDRA-19072) Test failure: org.apache.cassandra.distributed.test.log.FetchLogFromPeersTest.catchupCoordinatorAheadPlacementsReadTest-_jdk11
[ https://issues.apache.org/jira/browse/CASSANDRA-19072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19072: Test and Documentation Plan: CI summary attached Status: Patch Available (was: In Progress) The issue is that the test requires a specific node not to be a replica for a particular key then to become one following the removal of a peer. With 16 tokens per node, this node is already a replica at the outset and so the test fails. We could select a different key, but the number of vnodes is set externally to the test and so could change. The thing we're testing here is unrelated to the number of tokens, so I've just fixed a couple of specific to only run in a non-vnodes env. Patch [here|https://github.com/beobal/cassandra/commit/7d6b5a5fc0ee78114ef4cfced86229bf8c59019b] CI summary and results attached > Test failure: > org.apache.cassandra.distributed.test.log.FetchLogFromPeersTest.catchupCoordinatorAheadPlacementsReadTest-_jdk11 > -- > > Key: CASSANDRA-19072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19072 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Alex Petrov >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1-alpha1 > > > CircleCI failure: > https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/256/workflows/c4fda8f1-a8d6-4523-be83-5e30b9de39fe/jobs/20464/tests > Also failing on 17: Circleci Failure: > https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/256/workflows/c4fda8f1-a8d6-4523-be83-5e30b9de39fe/jobs/20500/tests > {code} > junit.framework.AssertionFailedError > at > org.apache.cassandra.distributed.test.log.FetchLogFromPeersTest.catchupCoordinatorAheadPlacementsReadTest(FetchLogFromPeersTest.java:217) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19072) Test failure: org.apache.cassandra.distributed.test.log.FetchLogFromPeersTest.catchupCoordinatorAheadPlacementsReadTest-_jdk11
[ https://issues.apache.org/jira/browse/CASSANDRA-19072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19072: Attachment: ci_summary.html result_details.tar.gz > Test failure: > org.apache.cassandra.distributed.test.log.FetchLogFromPeersTest.catchupCoordinatorAheadPlacementsReadTest-_jdk11 > -- > > Key: CASSANDRA-19072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19072 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Alex Petrov >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary.html, result_details.tar.gz > > > CircleCI failure: > https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/256/workflows/c4fda8f1-a8d6-4523-be83-5e30b9de39fe/jobs/20464/tests > Also failing on 17: Circleci Failure: > https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/256/workflows/c4fda8f1-a8d6-4523-be83-5e30b9de39fe/jobs/20500/tests > {code} > junit.framework.AssertionFailedError > at > org.apache.cassandra.distributed.test.log.FetchLogFromPeersTest.catchupCoordinatorAheadPlacementsReadTest(FetchLogFromPeersTest.java:217) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19104) Standardize tablestats formatting and data units
[ https://issues.apache.org/jira/browse/CASSANDRA-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793826#comment-17793826 ] Leo Toff commented on CASSANDRA-19104: -- Got it, keeping KiB/MiB/GiB for now. We should probably get feedback from the mailing list on this as well. Regarding "compacted partitions", the spreadsheet is updated to use byte quantifiers. Regarding "zero bytes", looks like "0.00 KiB" is the preferred format according to the discussion in the mailing list, so I'll go with that. However, I think "0.00 KiB" is ambiguous since it might be interpreted as "something below 0.005 KiB", not necessarily "zero bytes". So my personal preference is "0 B" or "0 bytes". I'll be using "0.00 KiB" until corrected. > Standardize tablestats formatting and data units > > > Key: CASSANDRA-19104 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19104 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool >Reporter: Brad Schoening >Assignee: Leo Toff >Priority: Normal > > Tablestats reports output in plaintext, JSON or YAML. The human readable > output currently has a mix of KiB, bytes with inconsistent spacing > Considering simplifying and defaulting output to 'human readable'. Machine > readable output is available as an option and the current mixed output > formatting is neither friendly for human or machine reading. > !image-2023-11-27-13-49-14-247.png! > *Not a goal now (consider a follow up Jira):* > Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting: > * gcstats - uses MB > * getcompactionthroughput - uses MB/s > * getstreamthroughput - uses MB/s > * info - uses MiB/GiB -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19084) Test Failure: IndexStreamingFailureTest.testAvailabilityAfterFailed*EntireFileStreaming
[ https://issues.apache.org/jira/browse/CASSANDRA-19084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-19084: Bug Category: Parent values: Correctness(12982)Level 1 values: Test Failure(12990) Complexity: Normal Discovered By: User Report Severity: Normal Status: Open (was: Triage Needed) > Test Failure: > IndexStreamingFailureTest.testAvailabilityAfterFailed*EntireFileStreaming > --- > > Key: CASSANDRA-19084 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19084 > Project: Cassandra > Issue Type: Bug > Components: Feature/SAI >Reporter: Michael Semb Wever >Priority: Normal > Fix For: 5.0-rc, 5.x > > > Flakies > https://app.circleci.com/pipelines/github/adelapena/cassandra/3329/workflows/f2124edd-fa0e-4bc5-ab03-ddfb886bf015/jobs/93097/tests > {noformat} > java.lang.NullPointerException > at > java.base/sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:133) > at > java.base/sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:155) > at java.base/java.net.URL.openStream(URL.java:1165) > at > java.base/java.lang.ClassLoader.getResourceAsStream(ClassLoader.java:1739) > at > net.bytebuddy.dynamic.ClassFileLocator$ForClassLoader.locate(ClassFileLocator.java:453) > at > net.bytebuddy.dynamic.ClassFileLocator$ForClassLoader.locate(ClassFileLocator.java:434) > at > net.bytebuddy.dynamic.scaffold.TypeWriter$Default$ForInlining.create(TypeWriter.java:4009) > at > net.bytebuddy.dynamic.scaffold.TypeWriter$Default.make(TypeWriter.java:2224) > at > net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$UsingTypeWriter.make(DynamicType.java:4050) > at > net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase.make(DynamicType.java:3734) > at > net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$Delegator.make(DynamicType.java:3986) > at > org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest$ByteBuddyHelper.installErrors(IndexStreamingFailureTest.java:154) > at > org.apache.cassandra.distributed.shared.AbstractBuilder$1.initialise(AbstractBuilder.java:360) > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.newInstance(AbstractCluster.java:312) > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegateForStartup(AbstractCluster.java:292) > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.startup(AbstractCluster.java:410) > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.startup(AbstractCluster.java:383) > at > org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest.testAvailabilityAfterStreaming(IndexStreamingFailureTest.java:123) > at > org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest.testAvailabilityAfterFailedNonEntireFileStreaming(IndexStreamingFailureTest.java:79) > {noformat} > {noformat} > java.lang.IllegalStateException: Can't use shutdown instances, delegate is > null > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:285) > at > org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.transfer(DelegatingInvokableInstance.java:49) > at > org.apache.cassandra.distributed.api.IInvokableInstance.runsOnInstance(IInvokableInstance.java:45) > at > org.apache.cassandra.distributed.api.IInvokableInstance.runOnInstance(IInvokableInstance.java:46) > at > org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest.testAvailabilityAfterFailedEntireFileStreaming(IndexStreamingFailureTest.java:85) > {noformat} > https://ci-cassandra.apache.org/job/Cassandra-5.0/106/testReport/org.apache.cassandra.distributed.test.sai/IndexStreamingFailureTest/testAvailabilityAfterFailedNonEntireFileStreaming__jdk11_x86_64_novnode/ > {noformat} > java.lang.RuntimeException: The class file could not be written > at > net.bytebuddy.dynamic.scaffold.TypeWriter$Default$ForInlining.create(TypeWriter.java:4021) > at > net.bytebuddy.dynamic.scaffold.TypeWriter$Default.make(TypeWriter.java:2224) > at > net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$UsingTypeWriter.make(DynamicType.java:4050) > at > net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase.make(DynamicType.java:3734) > at > net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$Delegator.make(DynamicType.java:3986) > at > org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest$ByteBuddyHelper.installErrors(IndexStreamingFailureTest.java:155) > at >
[jira] [Updated] (CASSANDRA-19104) Standardize tablestats formatting and data units
[ https://issues.apache.org/jira/browse/CASSANDRA-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leo Toff updated CASSANDRA-19104: - Description: Tablestats reports output in plaintext, JSON or YAML. The human readable output currently has a mix of KiB, bytes with inconsistent spacing Considering simplifying and defaulting output to 'human readable'. Machine readable output is available as an option and the current mixed output formatting is neither friendly for human or machine reading. !image-2023-11-27-13-49-14-247.png! *Not a goal now (consider a follow up Jira):* Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting: * gcstats - uses MB * getcompactionthroughput - uses MB/s * getstreamthroughput - uses MB/s * info - uses MiB/GiB was: Tablestats reports output in plaintext, JSON or YAML. The human readable output currently has a mix of KiB, bytes with inconsistent spacing Considering simplifying and defaulting output to 'human readable'. Machine readable output is available as an option and the current mixed output formatting is neither friendly for human or machine reading. !image-2023-11-27-13-49-14-247.png! > Standardize tablestats formatting and data units > > > Key: CASSANDRA-19104 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19104 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool >Reporter: Brad Schoening >Assignee: Leo Toff >Priority: Normal > > Tablestats reports output in plaintext, JSON or YAML. The human readable > output currently has a mix of KiB, bytes with inconsistent spacing > Considering simplifying and defaulting output to 'human readable'. Machine > readable output is available as an option and the current mixed output > formatting is neither friendly for human or machine reading. > !image-2023-11-27-13-49-14-247.png! > *Not a goal now (consider a follow up Jira):* > Fix inconsistencies with KiB/MiB/GiB and KB/MB/GB formatting: > * gcstats - uses MB > * getcompactionthroughput - uses MB/s > * getstreamthroughput - uses MB/s > * info - uses MiB/GiB -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19009) CEP-15: (C*/Accord) Schema based fast path reconfiguration
[ https://issues.apache.org/jira/browse/CASSANDRA-19009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793822#comment-17793822 ] Alex Petrov commented on CASSANDRA-19009: - Sounds good. If you agree with minor suggestions above, please feel free to make the change on commit. +1 otherwise! > CEP-15: (C*/Accord) Schema based fast path reconfiguration > --- > > Key: CASSANDRA-19009 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19009 > Project: Cassandra > Issue Type: Improvement > Components: Accord >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Normal > Fix For: 5.0.x > > > This adds availability aware accord fast path reconfiguration, as well as > user configurable fast path settings, which are set at the keyspace level and > (optionally) at the table level for increased granularity. > The major parts are: > *Add availability information to cluster metadata* > Accord topology in C* is not stored in cluster metadata, but is meant to > calculated deterministically from cluster metadata state at a given epoch. > This adds the availability data, as well as the failure detector / gossip > listener and state change deduplication to CMS. > *Move C* accord keys/topology from keyspace prefixes to tableid prefixes* > To support per-table fast path settings, topologies and keys need to include > the table id. Since accord topologies could begin to consume a lot of memory > in clusters with a lot of nodes and tables, topology generation has been > updated to reuse previously allocated shards / shard parts where possible, > which will only increase heap sizes when things actually change. > *Make fast path settings configurable via schema* > There are 2.5 strategies: Simple, Parameterized, and InheritKeyspaceSettings. > Simple will use as many available nodes as possible for the fast path > electorate, this is the default for the keyspace fast path strategy. > Parameterized allows you to set a target size, and preferred datacenters for > the FP electorate. InheritKeyspace tells topology generation to just use the > keyspace fast path settings, and is the default for the table fast path > strategy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19169) Don't NPE when initializing CFSs for local system keyspaces with UCS
[ https://issues.apache.org/jira/browse/CASSANDRA-19169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793815#comment-17793815 ] Sam Tunnicliffe commented on CASSANDRA-19169: - Trivial patch [here|https://github.com/beobal/cassandra/commit/cd456f7e30f6128e67631503f0e71f4b99cc] Circle CI runs for branch with CASSANDRA-19169, CASSANDRA-19102 & CASSANDRA-19171: [J11|https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/274/workflows/6843aabd-4749-4cbf-94a5-ec3a546704e6], [J17|https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/274/workflows/440e2e39-038e-45ba-ab91-80335a497f21] Aside from some strangeness with cqlshlib tests which appears completely unrelated, only failures are the ones tracked in CASSANDRA-19072, CASSANDRA-19058 & CASSANDRA-18360 > Don't NPE when initializing CFSs for local system keyspaces with UCS > > > Key: CASSANDRA-19169 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19169 > Project: Cassandra > Issue Type: Bug > Components: Transactional Cluster Metadata >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1-alpha1 > > > When UnifiedCompactionStrategy is used as the default, NPEs are thrown when > flushing the system keyspace tables early during startup. The system keyspace > is > initialised before the cluster metadata, but UCS currently tries to access the > current epoch when initialising the shard manager, to determine whether the > local ranges are out of date. This isn't necessary for the system keyspaces as > they use LocalStrategy and cover the whole token space. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19009) CEP-15: (C*/Accord) Schema based fast path reconfiguration
[ https://issues.apache.org/jira/browse/CASSANDRA-19009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793813#comment-17793813 ] Blake Eggleston edited comment on CASSANDRA-19009 at 12/6/23 4:08 PM: -- > I do not understand fully is the intention for {{maintenance}} scheduled task The maintenance scheduled task is for rapid state changes. For instance, if a node transitioned to DOWN, then back to UP within the update interval, the UNAVAILABLE update would be executed, but the NORMAL update would be rejected. The maintenance task is just so we retry in cases like that was (Author: bdeggleston): > I do not understand fully is the intention for {{maintenance}} scheduled task > CEP-15: (C*/Accord) Schema based fast path reconfiguration > --- > > Key: CASSANDRA-19009 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19009 > Project: Cassandra > Issue Type: Improvement > Components: Accord >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Normal > Fix For: 5.0.x > > > This adds availability aware accord fast path reconfiguration, as well as > user configurable fast path settings, which are set at the keyspace level and > (optionally) at the table level for increased granularity. > The major parts are: > *Add availability information to cluster metadata* > Accord topology in C* is not stored in cluster metadata, but is meant to > calculated deterministically from cluster metadata state at a given epoch. > This adds the availability data, as well as the failure detector / gossip > listener and state change deduplication to CMS. > *Move C* accord keys/topology from keyspace prefixes to tableid prefixes* > To support per-table fast path settings, topologies and keys need to include > the table id. Since accord topologies could begin to consume a lot of memory > in clusters with a lot of nodes and tables, topology generation has been > updated to reuse previously allocated shards / shard parts where possible, > which will only increase heap sizes when things actually change. > *Make fast path settings configurable via schema* > There are 2.5 strategies: Simple, Parameterized, and InheritKeyspaceSettings. > Simple will use as many available nodes as possible for the fast path > electorate, this is the default for the keyspace fast path strategy. > Parameterized allows you to set a target size, and preferred datacenters for > the FP electorate. InheritKeyspace tells topology generation to just use the > keyspace fast path settings, and is the default for the table fast path > strategy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19169) Don't NPE when initializing CFSs for local system keyspaces with UCS
[ https://issues.apache.org/jira/browse/CASSANDRA-19169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793402#comment-17793402 ] Sam Tunnicliffe edited comment on CASSANDRA-19169 at 12/6/23 4:08 PM: -- https://github.com/beobal/cassandra/commits/samt/19169 Edit: added commit to a branch batching together a few small fixes was (Author: beobal): [-https://github.com/beobal/cassandra/commits/samt/19169-] > Don't NPE when initializing CFSs for local system keyspaces with UCS > > > Key: CASSANDRA-19169 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19169 > Project: Cassandra > Issue Type: Bug > Components: Transactional Cluster Metadata >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1-alpha1 > > > When UnifiedCompactionStrategy is used as the default, NPEs are thrown when > flushing the system keyspace tables early during startup. The system keyspace > is > initialised before the cluster metadata, but UCS currently tries to access the > current epoch when initialising the shard manager, to determine whether the > local ranges are out of date. This isn't necessary for the system keyspaces as > they use LocalStrategy and cover the whole token space. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19169) Don't NPE when initializing CFSs for local system keyspaces with UCS
[ https://issues.apache.org/jira/browse/CASSANDRA-19169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793402#comment-17793402 ] Sam Tunnicliffe edited comment on CASSANDRA-19169 at 12/6/23 4:07 PM: -- [-https://github.com/beobal/cassandra/commits/samt/19169-] was (Author: beobal): [https://github.com/beobal/cassandra/commits/samt/19169] > Don't NPE when initializing CFSs for local system keyspaces with UCS > > > Key: CASSANDRA-19169 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19169 > Project: Cassandra > Issue Type: Bug > Components: Transactional Cluster Metadata >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1-alpha1 > > > When UnifiedCompactionStrategy is used as the default, NPEs are thrown when > flushing the system keyspace tables early during startup. The system keyspace > is > initialised before the cluster metadata, but UCS currently tries to access the > current epoch when initialising the shard manager, to determine whether the > local ranges are out of date. This isn't necessary for the system keyspaces as > they use LocalStrategy and cover the whole token space. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19009) CEP-15: (C*/Accord) Schema based fast path reconfiguration
[ https://issues.apache.org/jira/browse/CASSANDRA-19009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793813#comment-17793813 ] Blake Eggleston commented on CASSANDRA-19009: - > I do not understand fully is the intention for {{maintenance}} scheduled task > CEP-15: (C*/Accord) Schema based fast path reconfiguration > --- > > Key: CASSANDRA-19009 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19009 > Project: Cassandra > Issue Type: Improvement > Components: Accord >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Normal > Fix For: 5.0.x > > > This adds availability aware accord fast path reconfiguration, as well as > user configurable fast path settings, which are set at the keyspace level and > (optionally) at the table level for increased granularity. > The major parts are: > *Add availability information to cluster metadata* > Accord topology in C* is not stored in cluster metadata, but is meant to > calculated deterministically from cluster metadata state at a given epoch. > This adds the availability data, as well as the failure detector / gossip > listener and state change deduplication to CMS. > *Move C* accord keys/topology from keyspace prefixes to tableid prefixes* > To support per-table fast path settings, topologies and keys need to include > the table id. Since accord topologies could begin to consume a lot of memory > in clusters with a lot of nodes and tables, topology generation has been > updated to reuse previously allocated shards / shard parts where possible, > which will only increase heap sizes when things actually change. > *Make fast path settings configurable via schema* > There are 2.5 strategies: Simple, Parameterized, and InheritKeyspaceSettings. > Simple will use as many available nodes as possible for the fast path > electorate, this is the default for the keyspace fast path strategy. > Parameterized allows you to set a target size, and preferred datacenters for > the FP electorate. InheritKeyspace tells topology generation to just use the > keyspace fast path settings, and is the default for the table fast path > strategy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19171) Test Failure: org.apache.cassandra.locator.PropertyFileSnitchTest.configContainsRemoteConfig-cdc_jdk17_x86_64
[ https://issues.apache.org/jira/browse/CASSANDRA-19171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19171: Test and Documentation Plan: CI details in comment Status: Patch Available (was: In Progress) Trivial patch [here|https://github.com/beobal/cassandra/commit/6cf2271f7b806b4aecfeada1eb0575f2c646f1fd] Circle CI runs for branch with CASSANDRA-19169, CASSANDRA-19102 & CASSANDRA-19171: [J11|https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/274/workflows/6843aabd-4749-4cbf-94a5-ec3a546704e6], [J17|https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/274/workflows/440e2e39-038e-45ba-ab91-80335a497f21] Aside from some strangeness with cqlshlib tests which appears completely unrelated, only failures are the ones tracked in CASSANDRA-19072, CASSANDRA-19058 & CASSANDRA-18360 > Test Failure: > org.apache.cassandra.locator.PropertyFileSnitchTest.configContainsRemoteConfig-cdc_jdk17_x86_64 > - > > Key: CASSANDRA-19171 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19171 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1-beta > > > h3. > {code:java} > Error Message > Multiple entries with same key: 127.0.0.1:7012=OTHER_DC1:OTHER_RAC1 and > 127.0.0.1:7012=DC1:RAC1 > Stacktrace > java.lang.IllegalArgumentException: Multiple entries with same key: > 127.0.0.1:7012=OTHER_DC1:OTHER_RAC1 and 127.0.0.1:7012=DC1:RAC1 at > com.google.common.collect.ImmutableMap.conflictException(ImmutableMap.java:378) > at > com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:372) > at > com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:246) > at > com.google.common.collect.RegularImmutableMap.fromEntryArrayCheckingBucketOverflow(RegularImmutableMap.java:133) > at > com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:95) > at > com.google.common.collect.RegularImmutableMap.fromEntries(RegularImmutableMap.java:78) > at com.google.common.collect.ImmutableMap.of(ImmutableMap.java:139) at > org.apache.cassandra.locator.PropertyFileSnitchTest.configContainsRemoteConfig(PropertyFileSnitchTest.java:121) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19102) Test Failure: org.apache.cassandra.distributed.test.ReadRepairTest#readRepairRTRangeMovementTest
[ https://issues.apache.org/jira/browse/CASSANDRA-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19102: Test and Documentation Plan: CI details in comment Status: Patch Available (was: In Progress) Trivial patch [here|https://github.com/beobal/cassandra/commit/940ebc97a63a4ec4e207e348c4311aec07c2cd44] Circle CI runs for branch with CASSANDRA-19169, CASSANDRA-19102 & CASSANDRA-19171: [J11|https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/274/workflows/6843aabd-4749-4cbf-94a5-ec3a546704e6], [J17|https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/274/workflows/440e2e39-038e-45ba-ab91-80335a497f21] Aside from some strangeness with cqlshlib tests which appears completely unrelated, only failures are the ones tracked in CASSANDRA-19072, CASSANDRA-19058 & CASSANDRA-18360 > Test Failure: > org.apache.cassandra.distributed.test.ReadRepairTest#readRepairRTRangeMovementTest > > > Key: CASSANDRA-19102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19102 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/java >Reporter: Jacek Lewandowski >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1-beta > > > {noformat} > java.lang.AssertionError: Expected a different error message, but got > Operation failed - received 2 responses and 1 failures: INVALID_ROUTING from > /127.0.0.2:7012 > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.cassandra.distributed.test.ReadRepairTest.readRepairRTRangeMovementTest(ReadRepairTest.java:424) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69) > at > com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38) > at > com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11) > at > com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35) > at > com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:232) > at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:55) > {noformat} > Manual testing in IntelliJ / trunk. Detected during investigation of test > failures of CASSANDRA-18464 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19177) SAI query timeouts can cause resource leaks
[ https://issues.apache.org/jira/browse/CASSANDRA-19177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Adamson updated CASSANDRA-19177: - Bug Category: Parent values: Degradation(12984) Complexity: Normal Discovered By: Code Inspection Severity: Normal Assignee: Mike Adamson Status: Open (was: Triage Needed) > SAI query timeouts can cause resource leaks > --- > > Key: CASSANDRA-19177 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19177 > Project: Cassandra > Issue Type: Bug > Components: Feature/SAI >Reporter: Mike Adamson >Assignee: Mike Adamson >Priority: Normal > > There are several places in the SAI query path where a query timeout can > result in a resource not being closed correctly. We need to make sure that > wherever QueryContext.checkpoint is called we catch the resulting exception > and close any open resources. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19177) SAI query timeouts can cause resource leaks
Mike Adamson created CASSANDRA-19177: Summary: SAI query timeouts can cause resource leaks Key: CASSANDRA-19177 URL: https://issues.apache.org/jira/browse/CASSANDRA-19177 Project: Cassandra Issue Type: Bug Components: Feature/SAI Reporter: Mike Adamson There are several places in the SAI query path where a query timeout can result in a resource not being closed correctly. We need to make sure that wherever QueryContext.checkpoint is called we catch the resulting exception and close any open resources. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18824) Backport CASSANDRA-16418: Cleanup behaviour during node decommission caused missing replica
[ https://issues.apache.org/jira/browse/CASSANDRA-18824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-18824: - Status: Open (was: Patch Available) > Backport CASSANDRA-16418: Cleanup behaviour during node decommission caused > missing replica > --- > > Key: CASSANDRA-18824 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18824 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: Szymon Miezal >Assignee: Szymon Miezal >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > Node decommission triggers data transfer to other nodes. While this transfer > is in progress, > receiving nodes temporarily hold token ranges in a pending state. However, > the cleanup process currently doesn't consider these pending ranges when > calculating token ownership. > As a consequence, data that is already stored in sstables gets inadvertently > cleaned up. > STR: > * Create two node cluster > * Create keyspace with RF=1 > * Insert sample data (assert data is available when querying both nodes) > * Start decommission process of node 1 > * Start running cleanup in a loop on node 2 until decommission on node 1 > finishes > * Verify of all rows are in the cluster - it will fail as the previous step > removed some of the rows > It seems that the cleanup process does not take into account the pending > ranges, it uses only the local ranges - > [https://github.com/apache/cassandra/blob/caad2f24f95b494d05c6b5d86a8d25fbee58d7c2/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L466]. > There are two solutions to the problem. > One would be to change the cleanup process in a way that it start taking > pending ranges into account. Even thought it might sound tempting at first it > will require involving changes and a lot of testing effort. > Alternatively we could interrupt/prevent the cleanup process from running > when any pending range on a node is detected. That sounds like a reasonable > alternative to the problem and something that is relatively easy to implement. > The bug has been already fixed in 4.x with CASSANDRA-16418, the goal of this > ticket is to backport it to 3.x. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18824) Backport CASSANDRA-16418: Cleanup behaviour during node decommission caused missing replica
[ https://issues.apache.org/jira/browse/CASSANDRA-18824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-18824: - Status: Patch Available (was: Needs Committer) > Backport CASSANDRA-16418: Cleanup behaviour during node decommission caused > missing replica > --- > > Key: CASSANDRA-18824 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18824 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: Szymon Miezal >Assignee: Szymon Miezal >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > Node decommission triggers data transfer to other nodes. While this transfer > is in progress, > receiving nodes temporarily hold token ranges in a pending state. However, > the cleanup process currently doesn't consider these pending ranges when > calculating token ownership. > As a consequence, data that is already stored in sstables gets inadvertently > cleaned up. > STR: > * Create two node cluster > * Create keyspace with RF=1 > * Insert sample data (assert data is available when querying both nodes) > * Start decommission process of node 1 > * Start running cleanup in a loop on node 2 until decommission on node 1 > finishes > * Verify of all rows are in the cluster - it will fail as the previous step > removed some of the rows > It seems that the cleanup process does not take into account the pending > ranges, it uses only the local ranges - > [https://github.com/apache/cassandra/blob/caad2f24f95b494d05c6b5d86a8d25fbee58d7c2/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L466]. > There are two solutions to the problem. > One would be to change the cleanup process in a way that it start taking > pending ranges into account. Even thought it might sound tempting at first it > will require involving changes and a lot of testing effort. > Alternatively we could interrupt/prevent the cleanup process from running > when any pending range on a node is detected. That sounds like a reasonable > alternative to the problem and something that is relatively easy to implement. > The bug has been already fixed in 4.x with CASSANDRA-16418, the goal of this > ticket is to backport it to 3.x. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793789#comment-17793789 ] Brandon Williams commented on CASSANDRA-16418: -- bq. Feel free to create a new ticket to add it back or piggyback in some other ticket, I'd be glad to review. That would be CASSANDRA-18824 > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.8, 4.1.1, 5.0-alpha1, 5.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19118) Add support of vector type to COPY command
[ https://issues.apache.org/jira/browse/CASSANDRA-19118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andres de la Peña updated CASSANDRA-19118: -- Reviewers: Andres de la Peña, Maxwell Guo, Stefan Miklosovic, Andres de la Peña (was: Andres de la Peña, Maxwell Guo) Andres de la Peña, Maxwell Guo, Stefan Miklosovic, Andres de la Peña (was: Andres de la Peña, Maxwell Guo, Stefan Miklosovic) Status: Review In Progress (was: Patch Available) > Add support of vector type to COPY command > -- > > Key: CASSANDRA-19118 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19118 > Project: Cassandra > Issue Type: Bug > Components: Tool/cqlsh >Reporter: Szymon Miezal >Assignee: Szymon Miezal >Priority: Normal > Fix For: 5.0-rc, 5.x > > Time Spent: 7h 10m > Remaining Estimate: 0h > > Currently it's not possible to import rows with vector literals via {{COPY}} > command. > STR: > * Create a table > {code:sql} > CREATE TABLE testcopyfrom (id text PRIMARY KEY, embedding_vector > VECTOR > {code} > * Prepare csv file with sample data, for instance: > {code:sql} > 1,"[0.1, 0.2, 0.3, 0.4, 0.5, 0.6]" > 2,"[-0.1, -0.2, -0.3, -0.4, -0.5, -0.6]" {code} > * in cqlsh run > {code:sql} > COPY ks.testcopyfrom FROM data.csv > {code} > It will result in getting: > {code:sql} > TypeError: Received an argument of invalid type for column > "embedding_vector". Expected: , > Got: ; (required argument is not a float){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19084) Test Failure: IndexStreamingFailureTest.testAvailabilityAfterFailed*EntireFileStreaming
[ https://issues.apache.org/jira/browse/CASSANDRA-19084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-19084: Fix Version/s: 5.x > Test Failure: > IndexStreamingFailureTest.testAvailabilityAfterFailed*EntireFileStreaming > --- > > Key: CASSANDRA-19084 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19084 > Project: Cassandra > Issue Type: Bug > Components: Feature/SAI >Reporter: Michael Semb Wever >Priority: Normal > Fix For: 5.0-rc, 5.x > > > Flakies > https://app.circleci.com/pipelines/github/adelapena/cassandra/3329/workflows/f2124edd-fa0e-4bc5-ab03-ddfb886bf015/jobs/93097/tests > {noformat} > java.lang.NullPointerException > at > java.base/sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:133) > at > java.base/sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:155) > at java.base/java.net.URL.openStream(URL.java:1165) > at > java.base/java.lang.ClassLoader.getResourceAsStream(ClassLoader.java:1739) > at > net.bytebuddy.dynamic.ClassFileLocator$ForClassLoader.locate(ClassFileLocator.java:453) > at > net.bytebuddy.dynamic.ClassFileLocator$ForClassLoader.locate(ClassFileLocator.java:434) > at > net.bytebuddy.dynamic.scaffold.TypeWriter$Default$ForInlining.create(TypeWriter.java:4009) > at > net.bytebuddy.dynamic.scaffold.TypeWriter$Default.make(TypeWriter.java:2224) > at > net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$UsingTypeWriter.make(DynamicType.java:4050) > at > net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase.make(DynamicType.java:3734) > at > net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$Delegator.make(DynamicType.java:3986) > at > org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest$ByteBuddyHelper.installErrors(IndexStreamingFailureTest.java:154) > at > org.apache.cassandra.distributed.shared.AbstractBuilder$1.initialise(AbstractBuilder.java:360) > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.newInstance(AbstractCluster.java:312) > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegateForStartup(AbstractCluster.java:292) > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.startup(AbstractCluster.java:410) > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.startup(AbstractCluster.java:383) > at > org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest.testAvailabilityAfterStreaming(IndexStreamingFailureTest.java:123) > at > org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest.testAvailabilityAfterFailedNonEntireFileStreaming(IndexStreamingFailureTest.java:79) > {noformat} > {noformat} > java.lang.IllegalStateException: Can't use shutdown instances, delegate is > null > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:285) > at > org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.transfer(DelegatingInvokableInstance.java:49) > at > org.apache.cassandra.distributed.api.IInvokableInstance.runsOnInstance(IInvokableInstance.java:45) > at > org.apache.cassandra.distributed.api.IInvokableInstance.runOnInstance(IInvokableInstance.java:46) > at > org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest.testAvailabilityAfterFailedEntireFileStreaming(IndexStreamingFailureTest.java:85) > {noformat} > https://ci-cassandra.apache.org/job/Cassandra-5.0/106/testReport/org.apache.cassandra.distributed.test.sai/IndexStreamingFailureTest/testAvailabilityAfterFailedNonEntireFileStreaming__jdk11_x86_64_novnode/ > {noformat} > java.lang.RuntimeException: The class file could not be written > at > net.bytebuddy.dynamic.scaffold.TypeWriter$Default$ForInlining.create(TypeWriter.java:4021) > at > net.bytebuddy.dynamic.scaffold.TypeWriter$Default.make(TypeWriter.java:2224) > at > net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$UsingTypeWriter.make(DynamicType.java:4050) > at > net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase.make(DynamicType.java:3734) > at > net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase$Delegator.make(DynamicType.java:3986) > at > org.apache.cassandra.distributed.test.sai.IndexStreamingFailureTest$ByteBuddyHelper.installErrors(IndexStreamingFailureTest.java:155) > at > org.apache.cassandra.distributed.shared.AbstractBuilder$1.initialise(AbstractBuilder.java:360) > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.newInstance(AbstractCluster.java:312) > at >
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793739#comment-17793739 ] Paulo Motta commented on CASSANDRA-16418: - bq. However, from the API pov CompactionManager.performCleanup can be now called anytime - I think it was important precondition for that method - wouldn't be good to keep it there, just changing the condition to check pending ranges rather than joining status? Good point, this was overlooked during review - I suggested removing that just to cleanup but looking back I think there is value in keeping it for safety if this API is used elsewhere. Feel free to create a new ticket to add it back or piggyback in some other ticket, I'd be glad to review. To me it'd be nice that CompactionManager API is a dumb local API unaware of token ranges/membership status since it's just a local operation, but practically these concerns are mixed across the codebase so developers expect that any local API is safe from a distributed standpoint. > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.8, 4.1.1, 5.0-alpha1, 5.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-18969: source files missing from sources jars due to maven … [cassandra-java-driver]
michaelsembwever commented on PR #1900: URL: https://github.com/apache/cassandra-java-driver/pull/1900#issuecomment-1842960983 Here's the commit (PR on top) to add the required checksums to the distribution* artefacts https://github.com/hhughes/java-driver/pull/1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19142) logback-core-1.2.12.jar vulnerability: CVE-2023-6378
[ https://issues.apache.org/jira/browse/CASSANDRA-19142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-19142: - Reviewers: Stefan Miklosovic > logback-core-1.2.12.jar vulnerability: CVE-2023-6378 > > > Key: CASSANDRA-19142 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19142 > Project: Cassandra > Issue Type: Bug > Components: Dependencies >Reporter: Brandon Williams >Assignee: Brandon Williams >Priority: Normal > Fix For: 3.0.30, 3.11.17, 4.0.12, 4.1.4, 5.0-beta2 > > > https://nvd.nist.gov/vuln/detail/CVE-2023-6378 > {quote} > A serialization vulnerability in logback receiver component part of logback > version 1.4.11 allows an attacker to mount a Denial-Of-Service attack by > sending poisoned data. > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19142) logback-core-1.2.12.jar vulnerability: CVE-2023-6378
[ https://issues.apache.org/jira/browse/CASSANDRA-19142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-19142: - Fix Version/s: 3.0.30 3.11.17 4.0.12 4.1.4 5.0-beta2 (was: 3.0.x) (was: 3.11.x) (was: 5.x) (was: 4.0.x) (was: 4.1.x) (was: 5.0-rc) Since Version: NA Source Control Link: https://github.com/apache/cassandra/commit/a1421ec324e4bf8ab46df2a72af298f9286e0d59 Resolution: Fixed Status: Resolved (was: Ready to Commit) Thanks for the review! Committed. > logback-core-1.2.12.jar vulnerability: CVE-2023-6378 > > > Key: CASSANDRA-19142 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19142 > Project: Cassandra > Issue Type: Bug > Components: Dependencies >Reporter: Brandon Williams >Assignee: Brandon Williams >Priority: Normal > Fix For: 3.0.30, 3.11.17, 4.0.12, 4.1.4, 5.0-beta2 > > > https://nvd.nist.gov/vuln/detail/CVE-2023-6378 > {quote} > A serialization vulnerability in logback receiver component part of logback > version 1.4.11 allows an attacker to mount a Denial-Of-Service attack by > sending poisoned data. > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) branch cassandra-4.0 updated (e1b0b44f9e -> 8e5fc74c9a)
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a change to branch cassandra-4.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git from e1b0b44f9e Fix repeated tests on CircleCI and long-testsome/burn-testsome targets new a1421ec324 Suppress CVE-2023-6378 new 2e3d7e76f5 Merge branch 'cassandra-3.0' into cassandra-3.11 new 8e5fc74c9a Merge branch 'cassandra-3.11' into cassandra-4.0 The 3 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .build/dependency-check-suppressions.xml | 9 + CHANGES.txt | 1 + 2 files changed, 10 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) 01/01: Merge branch 'cassandra-3.11' into cassandra-4.0
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch cassandra-4.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git commit 8e5fc74c9a3d734bfded9bde3fff399d4b67d65a Merge: e1b0b44f9e 2e3d7e76f5 Author: Brandon Williams AuthorDate: Wed Dec 6 06:32:32 2023 -0600 Merge branch 'cassandra-3.11' into cassandra-4.0 .build/dependency-check-suppressions.xml | 9 + CHANGES.txt | 1 + 2 files changed, 10 insertions(+) diff --cc .build/dependency-check-suppressions.xml index d806926aaf,774e2e7886..0c32a06b17 --- a/.build/dependency-check-suppressions.xml +++ b/.build/dependency-check-suppressions.xml @@@ -62,6 -96,17 +62,15 @@@ CVE-2022-42003 CVE-2022-42004 CVE-2023-35116 - CVE-2022-42003 - CVE-2022-42004 + + + ^pkg:maven/ch\.qos\.logback/logback\-core@.*$ + CVE-2023-6378 + + + ^pkg:maven/ch\.qos\.logback/logback\-classic@.*$ + CVE-2023-6378 + diff --cc CHANGES.txt index f79af3a59b,96e34db044..771cf1f3c0 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -20,8 -2,8 +20,9 @@@ Merged from 3.11 * Fix delayed SSTable release with unsafe_aggressive_sstable_expiration (CASSANDRA-18756) * Revert CASSANDRA-18543 (CASSANDRA-18854) * Fix NPE when using udfContext in UDF after a restart of a node (CASSANDRA-18739) + * Moved jflex from runtime to build dependencies (CASSANDRA-18664) Merged from 3.0: + * Suppress CVE-2023-6378 (CASSANDRA-19142) * Do not set RPC_READY to false on transports shutdown in order to not fail counter updates for deployments with coordinator and storage nodes with transports turned off (CASSANDRA-18935) * Suppress CVE-2023-44487 (CASSANDRA-18943) * Fix nodetool enable/disablebinary to correctly set rpc readiness in gossip (CASSANDRA-18935) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) 01/01: Merge branch 'cassandra-4.1' into cassandra-5.0
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch cassandra-5.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git commit fdfc5e614d6d7e3e84f0200870d6ac34917f601d Merge: fe7997884d 13e5956285 Author: Brandon Williams AuthorDate: Wed Dec 6 06:33:36 2023 -0600 Merge branch 'cassandra-4.1' into cassandra-5.0 .build/dependency-check-suppressions.xml | 10 ++ CHANGES.txt | 5 + 2 files changed, 15 insertions(+) diff --cc CHANGES.txt index 9b4900eb03,79e8ee7a84..2602ec4b23 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -64,10 -19,17 +64,15 @@@ Merged from 4.0 * Improve performance of compactions when table does not have an index (CASSANDRA-18773) * JMH improvements - faster build and async profiler (CASSANDRA-18871) * Enable 3rd party JDK installations for Debian package (CASSANDRA-18844) - * Fix NTS log message when an unrecognized strategy option is passed (CASSANDRA-18679) - * Fix BulkLoader ignoring cipher suites options (CASSANDRA-18582) - * Migrate Python optparse to argparse (CASSANDRA-17914) Merged from 3.11: - * Fix delayed SSTable release with unsafe_aggressive_sstable_expiration (CASSANDRA-18756) - * Revert CASSANDRA-18543 (CASSANDRA-18854) - * Fix NPE when using udfContext in UDF after a restart of a node (CASSANDRA-18739) Merged from 3.0: ++<<< HEAD ++=== + * Suppress CVE-2023-6378 (CASSANDRA-19142) + * Do not set RPC_READY to false on transports shutdown in order to not fail counter updates for deployments with coordinator and storage nodes with transports turned off (CASSANDRA-18935) ++>>> cassandra-4.1 * Suppress CVE-2023-44487 (CASSANDRA-18943) + * Implement the logic in bin/stop-server (CASSANDRA-18838) * Fix nodetool enable/disablebinary to correctly set rpc readiness in gossip (CASSANDRA-18935) * Implement the logic in bin/stop-server (CASSANDRA-18838) * Upgrade snappy-java to 1.1.10.4 (CASSANDRA-18878) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) 01/01: Merge branch 'cassandra-4.0' into cassandra-4.1
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch cassandra-4.1 in repository https://gitbox.apache.org/repos/asf/cassandra.git commit 13e595628548e7cdf06dd666a3d839af8fad6655 Merge: 4059faf5b9 8e5fc74c9a Author: Brandon Williams AuthorDate: Wed Dec 6 06:32:48 2023 -0600 Merge branch 'cassandra-4.0' into cassandra-4.1 .build/dependency-check-suppressions.xml | 9 + CHANGES.txt | 1 + 2 files changed, 10 insertions(+) diff --cc CHANGES.txt index 10096d23f2,771cf1f3c0..79e8ee7a84 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -26,7 -20,9 +26,8 @@@ Merged from 3.11 * Fix delayed SSTable release with unsafe_aggressive_sstable_expiration (CASSANDRA-18756) * Revert CASSANDRA-18543 (CASSANDRA-18854) * Fix NPE when using udfContext in UDF after a restart of a node (CASSANDRA-18739) - * Moved jflex from runtime to build dependencies (CASSANDRA-18664) Merged from 3.0: + * Suppress CVE-2023-6378 (CASSANDRA-19142) * Do not set RPC_READY to false on transports shutdown in order to not fail counter updates for deployments with coordinator and storage nodes with transports turned off (CASSANDRA-18935) * Suppress CVE-2023-44487 (CASSANDRA-18943) * Fix nodetool enable/disablebinary to correctly set rpc readiness in gossip (CASSANDRA-18935) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) branch cassandra-3.11 updated (6d7cd61412 -> 2e3d7e76f5)
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a change to branch cassandra-3.11 in repository https://gitbox.apache.org/repos/asf/cassandra.git from 6d7cd61412 Merge branch 'cassandra-3.0' into cassandra-3.11 new a1421ec324 Suppress CVE-2023-6378 new 2e3d7e76f5 Merge branch 'cassandra-3.0' into cassandra-3.11 The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .build/dependency-check-suppressions.xml | 9 + CHANGES.txt | 1 + 2 files changed, 10 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) branch cassandra-4.1 updated (4059faf5b9 -> 13e5956285)
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a change to branch cassandra-4.1 in repository https://gitbox.apache.org/repos/asf/cassandra.git from 4059faf5b9 Merge branch 'cassandra-4.0' into cassandra-4.1 new a1421ec324 Suppress CVE-2023-6378 new 2e3d7e76f5 Merge branch 'cassandra-3.0' into cassandra-3.11 new 8e5fc74c9a Merge branch 'cassandra-3.11' into cassandra-4.0 new 13e5956285 Merge branch 'cassandra-4.0' into cassandra-4.1 The 4 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .build/dependency-check-suppressions.xml | 9 + CHANGES.txt | 1 + 2 files changed, 10 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) 01/01: Merge branch 'cassandra-5.0' into trunk
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git commit ad86c9d201e7b17eb7cd3cbddb315e151062727f Merge: c5a2781b22 fdfc5e614d Author: Brandon Williams AuthorDate: Wed Dec 6 06:34:26 2023 -0600 Merge branch 'cassandra-5.0' into trunk .build/dependency-check-suppressions.xml | 10 ++ CHANGES.txt | 1 + 2 files changed, 11 insertions(+) diff --cc CHANGES.txt index a8c465c72a,2602ec4b23..6fcb72dcf1 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -65,9 -65,12 +65,10 @@@ Merged from 4.0 * JMH improvements - faster build and async profiler (CASSANDRA-18871) * Enable 3rd party JDK installations for Debian package (CASSANDRA-18844) Merged from 3.11: + * Revert CASSANDRA-18543 (CASSANDRA-18854) Merged from 3.0: -<<< HEAD -=== + * Suppress CVE-2023-6378 (CASSANDRA-19142) * Do not set RPC_READY to false on transports shutdown in order to not fail counter updates for deployments with coordinator and storage nodes with transports turned off (CASSANDRA-18935) ->>> cassandra-4.1 * Suppress CVE-2023-44487 (CASSANDRA-18943) * Implement the logic in bin/stop-server (CASSANDRA-18838) * Fix nodetool enable/disablebinary to correctly set rpc readiness in gossip (CASSANDRA-18935) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) branch trunk updated (c5a2781b22 -> ad86c9d201)
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a change to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git from c5a2781b22 Enable bytebuddy rule after starting nodes to fix DecommissionAvoidWriteTimeoutsTest new a1421ec324 Suppress CVE-2023-6378 new 2e3d7e76f5 Merge branch 'cassandra-3.0' into cassandra-3.11 new 8e5fc74c9a Merge branch 'cassandra-3.11' into cassandra-4.0 new 13e5956285 Merge branch 'cassandra-4.0' into cassandra-4.1 new fdfc5e614d Merge branch 'cassandra-4.1' into cassandra-5.0 new ad86c9d201 Merge branch 'cassandra-5.0' into trunk The 6 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .build/dependency-check-suppressions.xml | 10 ++ CHANGES.txt | 1 + 2 files changed, 11 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) branch cassandra-5.0 updated (fe7997884d -> fdfc5e614d)
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a change to branch cassandra-5.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git from fe7997884d Write ccm clusters under test's TMPDIR new a1421ec324 Suppress CVE-2023-6378 new 2e3d7e76f5 Merge branch 'cassandra-3.0' into cassandra-3.11 new 8e5fc74c9a Merge branch 'cassandra-3.11' into cassandra-4.0 new 13e5956285 Merge branch 'cassandra-4.0' into cassandra-4.1 new fdfc5e614d Merge branch 'cassandra-4.1' into cassandra-5.0 The 5 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .build/dependency-check-suppressions.xml | 10 ++ CHANGES.txt | 5 + 2 files changed, 15 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) branch cassandra-3.0 updated: Suppress CVE-2023-6378
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch cassandra-3.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git The following commit(s) were added to refs/heads/cassandra-3.0 by this push: new a1421ec324 Suppress CVE-2023-6378 a1421ec324 is described below commit a1421ec324e4bf8ab46df2a72af298f9286e0d59 Author: Brandon Williams AuthorDate: Fri Dec 1 08:43:51 2023 -0600 Suppress CVE-2023-6378 Patch by brandonwilliams, reviewed by smiklosovic for CASSANDRA-19142 --- .build/dependency-check-suppressions.xml | 9 + CHANGES.txt | 1 + 2 files changed, 10 insertions(+) diff --git a/.build/dependency-check-suppressions.xml b/.build/dependency-check-suppressions.xml index 1d9fba6218..04a74bb4b2 100644 --- a/.build/dependency-check-suppressions.xml +++ b/.build/dependency-check-suppressions.xml @@ -107,4 +107,13 @@ CVE-2019-17267 + + +^pkg:maven/ch\.qos\.logback/logback\-core@.*$ +CVE-2023-6378 + + +^pkg:maven/ch\.qos\.logback/logback\-classic@.*$ +CVE-2023-6378 + diff --git a/CHANGES.txt b/CHANGES.txt index 10c771ae2d..b53bc55d26 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.0.30 + * Suppress CVE-2023-6378 (CASSANDRA-19142) * Do not set RPC_READY to false on transports shutdown in order to not fail counter updates for deployments with coordinator and storage nodes with transports turned off (CASSANDRA-18935) * Suppress CVE-2023-44487 (CASSANDRA-18943) * Fix nodetool enable/disablebinary to correctly set rpc readiness in gossip (CASSANDRA-18935) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) 01/01: Merge branch 'cassandra-3.0' into cassandra-3.11
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch cassandra-3.11 in repository https://gitbox.apache.org/repos/asf/cassandra.git commit 2e3d7e76f5763698a3a2e161d6ec4773654d3b29 Merge: 6d7cd61412 a1421ec324 Author: Brandon Williams AuthorDate: Wed Dec 6 06:32:19 2023 -0600 Merge branch 'cassandra-3.0' into cassandra-3.11 .build/dependency-check-suppressions.xml | 9 + CHANGES.txt | 1 + 2 files changed, 10 insertions(+) diff --cc .build/dependency-check-suppressions.xml index e3e244e62b,04a74bb4b2..774e2e7886 --- a/.build/dependency-check-suppressions.xml +++ b/.build/dependency-check-suppressions.xml @@@ -90,14 -89,31 +90,23 @@@ CVE-2019-0205 - - + -^pkg:maven/org\.codehaus\.jackson/jackson\-mapper\-asl@.*$ -CVE-2017-7525 -CVE-2017-15095 -CVE-2017-17485 -CVE-2018-5968 -CVE-2018-14718 -CVE-2018-1000873 -CVE-2018-7489 -CVE-2019-10172 -CVE-2019-14540 -CVE-2019-14893 -CVE-2019-16335 -CVE-2019-17267 +^pkg:maven/com\.fasterxml\.jackson\.core/jackson\-databind@.*$ +CVE-2022-42003 +CVE-2022-42004 +CVE-2023-35116 + CVE-2022-42003 + CVE-2022-42004 + + + ^pkg:maven/ch\.qos\.logback/logback\-core@.*$ + CVE-2023-6378 + + + ^pkg:maven/ch\.qos\.logback/logback\-classic@.*$ + CVE-2023-6378 + diff --cc CHANGES.txt index a6cce43bd9,b53bc55d26..96e34db044 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,8 -1,5 +1,9 @@@ -3.0.30 +3.11.17 + * Fix delayed SSTable release with unsafe_aggressive_sstable_expiration (CASSANDRA-18756) + * Revert CASSANDRA-18543 (CASSANDRA-18854) + * Fix NPE when using udfContext in UDF after a restart of a node (CASSANDRA-18739) +Merged from 3.0: + * Suppress CVE-2023-6378 (CASSANDRA-19142) * Do not set RPC_READY to false on transports shutdown in order to not fail counter updates for deployments with coordinator and storage nodes with transports turned off (CASSANDRA-18935) * Suppress CVE-2023-44487 (CASSANDRA-18943) * Fix nodetool enable/disablebinary to correctly set rpc readiness in gossip (CASSANDRA-18935) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19166) StackOverflowError on ALTER after many previous schema changes
[ https://issues.apache.org/jira/browse/CASSANDRA-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacek Lewandowski updated CASSANDRA-19166: -- Reviewers: Caleb Rackliffe, Jacek Lewandowski, Jacek Lewandowski (was: Caleb Rackliffe, Jacek Lewandowski) Caleb Rackliffe, Jacek Lewandowski, Jacek Lewandowski (was: Caleb Rackliffe, Jacek Lewandowski) Status: Review In Progress (was: Patch Available) > StackOverflowError on ALTER after many previous schema changes > -- > > Key: CASSANDRA-19166 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19166 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Abe Ratnofsky >Assignee: Abe Ratnofsky >Priority: Normal > Fix For: 4.1.x, 5.0-rc > > Time Spent: 50m > Remaining Estimate: 0h > > Since 4.1, TableMetadataRefCache re-wraps its fields in > Collections.unmodifiableMap on every local schema update. This causes > TableMetadataRefCache's Map fields to reference chains of nested > UnmodifiableMaps. Eventually, this leads to a StackOverflowError on get(), > which has to traverse lots of these maps to fetch the actual value. > https://github.com/apache/cassandra/blob/4059faf5b948c5a285c25fb0f2e4c4288ee7c305/src/java/org/apache/cassandra/schema/TableMetadataRefCache.java#L53 > The issue goes away on restart, since TableMetadataRefCache is reloaded from > disk. > See CASSANDRA-17044, when TableMetadataRefCache was introduced. This issue > was discovered on a real test cluster where schema changes were failing, via > a heap dump. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19166) StackOverflowError on ALTER after many previous schema changes
[ https://issues.apache.org/jira/browse/CASSANDRA-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacek Lewandowski updated CASSANDRA-19166: -- Status: Ready to Commit (was: Review In Progress) > StackOverflowError on ALTER after many previous schema changes > -- > > Key: CASSANDRA-19166 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19166 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Abe Ratnofsky >Assignee: Abe Ratnofsky >Priority: Normal > Fix For: 4.1.x, 5.0-rc > > Time Spent: 50m > Remaining Estimate: 0h > > Since 4.1, TableMetadataRefCache re-wraps its fields in > Collections.unmodifiableMap on every local schema update. This causes > TableMetadataRefCache's Map fields to reference chains of nested > UnmodifiableMaps. Eventually, this leads to a StackOverflowError on get(), > which has to traverse lots of these maps to fetch the actual value. > https://github.com/apache/cassandra/blob/4059faf5b948c5a285c25fb0f2e4c4288ee7c305/src/java/org/apache/cassandra/schema/TableMetadataRefCache.java#L53 > The issue goes away on restart, since TableMetadataRefCache is reloaded from > disk. > See CASSANDRA-17044, when TableMetadataRefCache was introduced. This issue > was discovered on a real test cluster where schema changes were failing, via > a heap dump. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19166) StackOverflowError on ALTER after many previous schema changes
[ https://issues.apache.org/jira/browse/CASSANDRA-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793671#comment-17793671 ] Jacek Lewandowski commented on CASSANDRA-19166: --- tests looks ok, I'm going to merge it > StackOverflowError on ALTER after many previous schema changes > -- > > Key: CASSANDRA-19166 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19166 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Abe Ratnofsky >Assignee: Abe Ratnofsky >Priority: Normal > Fix For: 4.1.x, 5.0-rc > > Time Spent: 50m > Remaining Estimate: 0h > > Since 4.1, TableMetadataRefCache re-wraps its fields in > Collections.unmodifiableMap on every local schema update. This causes > TableMetadataRefCache's Map fields to reference chains of nested > UnmodifiableMaps. Eventually, this leads to a StackOverflowError on get(), > which has to traverse lots of these maps to fetch the actual value. > https://github.com/apache/cassandra/blob/4059faf5b948c5a285c25fb0f2e4c4288ee7c305/src/java/org/apache/cassandra/schema/TableMetadataRefCache.java#L53 > The issue goes away on restart, since TableMetadataRefCache is reloaded from > disk. > See CASSANDRA-17044, when TableMetadataRefCache was introduced. This issue > was discovered on a real test cluster where schema changes were failing, via > a heap dump. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19166) StackOverflowError on ALTER after many previous schema changes
[ https://issues.apache.org/jira/browse/CASSANDRA-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793582#comment-17793582 ] Jacek Lewandowski edited comment on CASSANDRA-19166 at 12/6/23 11:29 AM: - Test links in the PRs 4.1 - https://github.com/apache/cassandra/pull/2964 5.0 - https://github.com/apache/cassandra/pull/2965 was (Author: jlewandowski): 4.1 j8 https://app.circleci.com/pipelines/github/jacek-lewandowski/cassandra/1149/workflows/4de1eef7-1b0a-4975-a205-c6dbb5b8f37b > StackOverflowError on ALTER after many previous schema changes > -- > > Key: CASSANDRA-19166 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19166 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Abe Ratnofsky >Assignee: Abe Ratnofsky >Priority: Normal > Fix For: 4.1.x, 5.0-rc > > Time Spent: 50m > Remaining Estimate: 0h > > Since 4.1, TableMetadataRefCache re-wraps its fields in > Collections.unmodifiableMap on every local schema update. This causes > TableMetadataRefCache's Map fields to reference chains of nested > UnmodifiableMaps. Eventually, this leads to a StackOverflowError on get(), > which has to traverse lots of these maps to fetch the actual value. > https://github.com/apache/cassandra/blob/4059faf5b948c5a285c25fb0f2e4c4288ee7c305/src/java/org/apache/cassandra/schema/TableMetadataRefCache.java#L53 > The issue goes away on restart, since TableMetadataRefCache is reloaded from > disk. > See CASSANDRA-17044, when TableMetadataRefCache was introduced. This issue > was discovered on a real test cluster where schema changes were failing, via > a heap dump. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org