[jira] [Commented] (CASSANDRA-18635) Test failure: org.apache.cassandra.distributed.test.UpgradeSSTablesTest
[ https://issues.apache.org/jira/browse/CASSANDRA-18635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817593#comment-17817593 ] Berenguer Blasi commented on CASSANDRA-18635: - ^ ahhh so iiuc this might be sthg different. As this ticket's failure Andres bisected it to CASSANDRA-17851 thx > Test failure: org.apache.cassandra.distributed.test.UpgradeSSTablesTest > --- > > Key: CASSANDRA-18635 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18635 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/java >Reporter: Brandon Williams >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 5.0-rc, 5.x > > > Seen here: > https://app.circleci.com/pipelines/github/driftx/cassandra/1095/workflows/6114e2e3-8dcc-4bb0-b664-ae7d82c3349f/jobs/33405/tests > {noformat} > junit.framework.AssertionFailedError: expected:<0> but was:<2> > at > org.apache.cassandra.distributed.test.UpgradeSSTablesTest.upgradeSSTablesInterruptsOngoingCompaction(UpgradeSSTablesTest.java:86) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817590#comment-17817590 ] Manish Khandelwal commented on CASSANDRA-18762: --- We are also getting the same issue on multi DC setup. Though in single DC things run fine for 11 nodes. But once another DC is addded it starts to fail pretty quickly. Getting the same error as mentioned in the issue here. Running repair table wise seems to be successful most of the times. But on keyspace level repairs always fails for one of the keyspace. This keyspace has three tables, all STCS with one table having almost no data. Tried setting *-XX:MaxDirectMemorySize* but results are same, i.e., getting out of memory. We are on java8. and Cassandra 4.0.10. I think with multi DC should be easy to reproduce. > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Attachments: Cluster-dm-metrics-1.PNG, > image-2023-12-06-15-28-05-459.png, image-2023-12-06-15-29-31-491.png, > image-2023-12-06-15-58-55-007.png > > > We are seeing repeated failures of nodes with 16GB of heap on a VM with 32GB > of physical RAM due to direct memory. This seems to be related to > CASSANDRA-15202 which moved Merkel trees off-heap in 4.0. Using Cassandra > 4.0.6 with Java 11. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at >
[jira] [Commented] (CASSANDRA-19120) local consistencies may get timeout if blocking read repair is sending the read repair mutation to other DC
[ https://issues.apache.org/jira/browse/CASSANDRA-19120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817582#comment-17817582 ] Runtian Liu commented on CASSANDRA-19120: - Updated the four PRs: 4.0: https://github.com/apache/cassandra/pull/2981 4.1: [https://github.com/apache/cassandra/pull/3019] 5.0: [https://github.com/apache/cassandra/pull/3020] trunk: [https://github.com/apache/cassandra/pull/3021] > local consistencies may get timeout if blocking read repair is sending the > read repair mutation to other DC > > > Key: CASSANDRA-19120 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19120 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Runtian Liu >Assignee: Runtian Liu >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > Attachments: image-2023-11-29-15-26-08-056.png, signature.asc > > Time Spent: 10m > Remaining Estimate: 0h > > For a two DCs cluster setup. When a new node is being added to DC1, for > blocking read repair triggered by local_quorum in DC1, it will require to > send read repair mutation to an extra node(1)(2). The selector for read > repair may select *ANY* node that has not been contacted before(3) instead of > selecting the DC1 nodes. If a node from DC2 is selected, this will cause 100% > timeout because of the bug described below: > When we initialized the latch(4) for blocking read repair, the shouldBlockOn > function will only return true for local nodes(5), the blockFor value will be > reduced if a local node doesn't require repair(6). The blockFor is same as > the number of read repair mutation sent out. But when the coordinator node > receives the response from the target nodes, the latch only count down for > nodes in same DC(7). The latch will wait till timeout and the read request > will timeout. > This can be reproduced if you have a constant load on a 3 + 3 cluster when > adding a node. If you have someway to trigger blocking read repair(maybe by > adding load using stress tool). If you use local_quorum consistency with a > constant read after write load in the same DC that you are adding node. You > will see read timeout issue from time to time because of the bug described > above > > I think for read repair when selecting the extra node to do repair, we should > prefer local nodes than the nodes from other region. Also, we need to fix the > latch part so even if we send mutation to the nodes in other DC, we don't get > a timeout. > (1)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L455] > (2)[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ConsistencyLevel.java#L183] > (3)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L458] > (4)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L96] > (5)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L71] > (6)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L88] > (7)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L113] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19400) IndexStatusManager needs to prioritize SUCCESS over UNKNOWN states to maximize availability
[ https://issues.apache.org/jira/browse/CASSANDRA-19400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-19400: Bug Category: Parent values: Availability(12983)Level 1 values: Unavailable(12994) Discovered By: Fuzz Test Severity: Low > IndexStatusManager needs to prioritize SUCCESS over UNKNOWN states to > maximize availability > --- > > Key: CASSANDRA-19400 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19400 > Project: Cassandra > Issue Type: Bug > Components: Feature/SAI >Reporter: Caleb Rackliffe >Priority: Low > Fix For: 5.0.x, 5.x > > > {{IndexStatusManager}} is responsible for knowing what SAI indexes are > queryable across the ring, endpoint by endpoint. There are two statuses that > SAI treats as queryable, but it should not treat them equally. > {{BUILD_SUCCEEDED}} means the index is definitely available and should be > able to serve queries without issue. {{UNKNOWN}} indicates that the status of > the index hasn’t propagated yet to this coordinator. It may be just fine, or > it may not be. If it isn’t a query will not return incorrect results, but it > will fail. If there are enough {{BUILD_SUCCEEDED}} replicas, we should ignore > {{UNKNOWN}} replicas and maximize availability. If the UNKNOWN replica is > going to become {{BUILD_SUCCEEDED}} shortly, it will happily start taking > requests at that point and spread the load. If not, we’ll avoid futile > attempts to query it too early. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19400) IndexStatusManager needs to prioritize SUCCESS over UNKNOWN states to maximize availability
[ https://issues.apache.org/jira/browse/CASSANDRA-19400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-19400: Workflow: Copy of Cassandra Bug Workflow (was: Copy of Cassandra Default Workflow) Issue Type: Bug (was: Improvement) > IndexStatusManager needs to prioritize SUCCESS over UNKNOWN states to > maximize availability > --- > > Key: CASSANDRA-19400 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19400 > Project: Cassandra > Issue Type: Bug > Components: Feature/SAI >Reporter: Caleb Rackliffe >Priority: Normal > Fix For: 5.0.x, 5.x > > > {{IndexStatusManager}} is responsible for knowing what SAI indexes are > queryable across the ring, endpoint by endpoint. There are two statuses that > SAI treats as queryable, but it should not treat them equally. > {{BUILD_SUCCEEDED}} means the index is definitely available and should be > able to serve queries without issue. {{UNKNOWN}} indicates that the status of > the index hasn’t propagated yet to this coordinator. It may be just fine, or > it may not be. If it isn’t a query will not return incorrect results, but it > will fail. If there are enough {{BUILD_SUCCEEDED}} replicas, we should ignore > {{UNKNOWN}} replicas and maximize availability. If the UNKNOWN replica is > going to become {{BUILD_SUCCEEDED}} shortly, it will happily start taking > requests at that point and spread the load. If not, we’ll avoid futile > attempts to query it too early. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19400) IndexStatusManager needs to prioritize SUCCESS over UNKNOWN states to maximize availability
[ https://issues.apache.org/jira/browse/CASSANDRA-19400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-19400: Change Category: Operability Complexity: Normal Fix Version/s: 5.0.x 5.x Status: Open (was: Triage Needed) > IndexStatusManager needs to prioritize SUCCESS over UNKNOWN states to > maximize availability > --- > > Key: CASSANDRA-19400 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19400 > Project: Cassandra > Issue Type: Improvement > Components: Feature/SAI >Reporter: Caleb Rackliffe >Priority: Normal > Fix For: 5.0.x, 5.x > > > {{IndexStatusManager}} is responsible for knowing what SAI indexes are > queryable across the ring, endpoint by endpoint. There are two statuses that > SAI treats as queryable, but it should not treat them equally. > {{BUILD_SUCCEEDED}} means the index is definitely available and should be > able to serve queries without issue. {{UNKNOWN}} indicates that the status of > the index hasn’t propagated yet to this coordinator. It may be just fine, or > it may not be. If it isn’t a query will not return incorrect results, but it > will fail. If there are enough {{BUILD_SUCCEEDED}} replicas, we should ignore > {{UNKNOWN}} replicas and maximize availability. If the UNKNOWN replica is > going to become {{BUILD_SUCCEEDED}} shortly, it will happily start taking > requests at that point and spread the load. If not, we’ll avoid futile > attempts to query it too early. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19400) IndexStatusManager needs to prioritize SUCCESS over UNKNOWN states to maximize availability
Caleb Rackliffe created CASSANDRA-19400: --- Summary: IndexStatusManager needs to prioritize SUCCESS over UNKNOWN states to maximize availability Key: CASSANDRA-19400 URL: https://issues.apache.org/jira/browse/CASSANDRA-19400 Project: Cassandra Issue Type: Improvement Components: Feature/SAI Reporter: Caleb Rackliffe {{IndexStatusManager}} is responsible for knowing what SAI indexes are queryable across the ring, endpoint by endpoint. There are two statuses that SAI treats as queryable, but it should not treat them equally. {{BUILD_SUCCEEDED}} means the index is definitely available and should be able to serve queries without issue. {{UNKNOWN}} indicates that the status of the index hasn’t propagated yet to this coordinator. It may be just fine, or it may not be. If it isn’t a query will not return incorrect results, but it will fail. If there are enough {{BUILD_SUCCEEDED}} replicas, we should ignore {{UNKNOWN}} replicas and maximize availability. If the UNKNOWN replica is going to become {{BUILD_SUCCEEDED}} shortly, it will happily start taking requests at that point and spread the load. If not, we’ll avoid futile attempts to query it too early. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18667) Add multi-threaded SAI read and write fuzz test
[ https://issues.apache.org/jira/browse/CASSANDRA-18667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-18667: Epic Link: CASSANDRA-19224 (was: CASSANDRA-18473) > Add multi-threaded SAI read and write fuzz test > --- > > Key: CASSANDRA-18667 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18667 > Project: Cassandra > Issue Type: Improvement > Components: Feature/SAI >Reporter: Mike Adamson >Priority: Normal > > We currently don't have a basic unit test that does multi-threaded reads and > writes to the index. We should add one to avoid potential basic concurrency > errors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18940) SAI post-filtering reads don't update local table latency metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-18940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-18940: Epic Link: CASSANDRA-19224 (was: CASSANDRA-18473) > SAI post-filtering reads don't update local table latency metrics > - > > Key: CASSANDRA-18940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18940 > Project: Cassandra > Issue Type: Bug > Components: Feature/2i Index, Feature/SAI, Observability/Metrics >Reporter: Caleb Rackliffe >Assignee: Mike Adamson >Priority: Normal > Fix For: 5.0.x, 5.x > > Attachments: > draft_fix_for_SAI_post-filtering_reads_not_updating_local_table_metrics.patch > > > Once an SAI index finds matches (primary keys), it reads the associated rows > and post-filters them to incorporate partial writes, tombstones, etc. > However, those reads are not currently updating the local table latency > metrics. It should be simple enough to attach a metrics recording > transformation to the iterator produced by querying local storage. (I've > attached a patch that should apply cleanly to trunk, but there may be a > better way...) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRASC-107) Improve logging for slice restore task
Yifan Cai created CASSANDRASC-107: - Summary: Improve logging for slice restore task Key: CASSANDRASC-107 URL: https://issues.apache.org/jira/browse/CASSANDRASC-107 Project: Sidecar for Apache Cassandra Issue Type: Improvement Reporter: Yifan Cai I want to propose logging improvements. Add more logs to the individual steps during the restore task, i.e. RestoreSliceTask and StorageClient. In other places like retrying to poll the object existence, the stack trace can be omitted, as it provides no additional knowledge than object not found. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRASC-106) Add restore task watcher to report long running tasks
Yifan Cai created CASSANDRASC-106: - Summary: Add restore task watcher to report long running tasks Key: CASSANDRASC-106 URL: https://issues.apache.org/jira/browse/CASSANDRASC-106 Project: Sidecar for Apache Cassandra Issue Type: Improvement Reporter: Yifan Cai Having a watcher to report the long running restore slice task can provide better insights. The watcher can live inside the RestoreProcessor and periodically examine the futures of the running tasks. Ideally, it signals the task to log the current stack trace. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRASC-105) RestoreSliceTask could be stuck due to missing exception handling
Yifan Cai created CASSANDRASC-105: - Summary: RestoreSliceTask could be stuck due to missing exception handling Key: CASSANDRASC-105 URL: https://issues.apache.org/jira/browse/CASSANDRASC-105 Project: Sidecar for Apache Cassandra Issue Type: Bug Components: Rest API Reporter: Yifan Cai In RestoreSliceTask, there are a few places could throw exceptions but missing exception handling in call-sites. As a result, the RetoreSliceTask never fulfill the promise, i.e. the task is stuck. For example, downloadObjectIfAbsent could throw instead of returning a future, in such case, the task will never fail or complete. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) 01/01: Merge branch 'cassandra-5.0' into trunk
This is an automated email from the ASF dual-hosted git repository. smiklosovic pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git commit 48607b83952ab923c401fd7886d76a5a5a5b3c78 Merge: 8bdf2615bc a04dc83cfc Author: Stefan Miklosovic AuthorDate: Wed Feb 14 21:19:10 2024 +0100 Merge branch 'cassandra-5.0' into trunk - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) branch cassandra-5.0 updated (8b037a6c84 -> a04dc83cfc)
This is an automated email from the ASF dual-hosted git repository. smiklosovic pushed a change to branch cassandra-5.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git from 8b037a6c84 Deprecate native_transport_port_ssl add a9a7dd0caf increment version to 4.1.5 add a04dc83cfc Merge branch 'cassandra-4.1' into cassandra-5.0 No new revisions were added by this update. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) branch trunk updated (8bdf2615bc -> 48607b8395)
This is an automated email from the ASF dual-hosted git repository. smiklosovic pushed a change to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git from 8bdf2615bc Merge branch 'cassandra-5.0' into trunk add a9a7dd0caf increment version to 4.1.5 add a04dc83cfc Merge branch 'cassandra-4.1' into cassandra-5.0 new 48607b8395 Merge branch 'cassandra-5.0' into trunk The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) branch cassandra-4.1 updated (89a8155916 -> a9a7dd0caf)
This is an automated email from the ASF dual-hosted git repository. smiklosovic pushed a change to branch cassandra-4.1 in repository https://gitbox.apache.org/repos/asf/cassandra.git from 89a8155916 Merge branch 'cassandra-4.0' into cassandra-4.1 add a9a7dd0caf increment version to 4.1.5 No new revisions were added by this update. Summary of changes: CHANGES.txt | 6 ++ build.xml| 2 +- debian/changelog | 6 ++ 3 files changed, 13 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19018) An SAI-specific mechanism to ensure consistency isn't violated for multi-column (i.e. AND) queries at CL > ONE
[ https://issues.apache.org/jira/browse/CASSANDRA-19018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817515#comment-17817515 ] Caleb Rackliffe commented on CASSANDRA-19018: - I've finally narrowed in on a concrete repro for the range tombstone problems... {noformat} @Test public void testPartialUpdatesWithDeleteBetween() { CLUSTER.schemaChange(withKeyspace("CREATE TABLE %s.partial_updates (k int, c int, a int, b int, PRIMARY KEY (k, c)) WITH read_repair = 'NONE'")); CLUSTER.schemaChange(withKeyspace("CREATE INDEX ON %s.partial_updates(a) USING 'sai'")); CLUSTER.schemaChange(withKeyspace("CREATE INDEX ON %s.partial_updates(b) USING 'sai'")); SAIUtil.waitForIndexQueryable(CLUSTER, KEYSPACE); // insert a split row w/ a range tombstone sandwiched in the middle temporally CLUSTER.get(1).executeInternal(withKeyspace("INSERT INTO %s.partial_updates(k, c, a) VALUES (0, 1, 1) USING TIMESTAMP 1")); CLUSTER.get(2).executeInternal(withKeyspace("DELETE FROM %s.partial_updates USING TIMESTAMP 2 WHERE k = 0 AND c > 0")); CLUSTER.get(2).executeInternal(withKeyspace("INSERT INTO %s.partial_updates(k, c, b) VALUES (0, 1, 2) USING TIMESTAMP 3")); String select = withKeyspace("SELECT * FROM %s.partial_updates WHERE a = 1 AND b = 2"); Object[][] initialRows = CLUSTER.coordinator(1).execute(select, ConsistencyLevel.ALL); assertRows(initialRows); <-- This returns a row when it shouldn't! } {noformat} tl;dr Because we can degrade intersections to unions inside SAI on unrepaired data, RFP no longer implicitly covers all delete cases without sending range tombstones to the coordinator or identifying silent replicas purely at the row level. In the case above, RFP could be made to work if it identified "silent" columns rather than entire rows. (i.e. It would notice that "a" from node 1 has no corresponding value from node 2, so the response from node 2 needs to be protected. Assuming data isn't always horrifically out of date, this is likely better than trying to send mostly unnecessary RTs.) > An SAI-specific mechanism to ensure consistency isn't violated for > multi-column (i.e. AND) queries at CL > ONE > -- > > Key: CASSANDRA-19018 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19018 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Feature/SAI >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 5.0-rc, 5.x > > Attachments: ci_summary-1.html, ci_summary.html, > result_details.tar-1.gz, result_details.tar.gz > > Time Spent: 8h 50m > Remaining Estimate: 0h > > CASSANDRA-19007 is going to be where we add a guardrail around > filtering/index queries that use intersection/AND over partially updated > non-key columns. (ex. Restricting one clustering column and one normal column > does not cause a consistency problem, as primary keys cannot be partially > updated.) This issue exists to attempt to fix this specifically for SAI in > 5.0.x, as Accord will (last I checked) not be available until the 5.1 release. > The SAI-specific version of the originally reported issue is this: > {noformat} > try (Cluster cluster = init(Cluster.build(2).withConfig(config -> > config.with(GOSSIP).with(NETWORK)).start())) > { > cluster.schemaChange(withKeyspace("CREATE TABLE %s.t (k int > PRIMARY KEY, a int, b int)")); > cluster.schemaChange(withKeyspace("CREATE INDEX ON %s.t(a) USING > 'sai'")); > cluster.schemaChange(withKeyspace("CREATE INDEX ON %s.t(b) USING > 'sai'")); > // insert a split row > cluster.get(1).executeInternal(withKeyspace("INSERT INTO %s.t(k, > a) VALUES (0, 1)")); > cluster.get(2).executeInternal(withKeyspace("INSERT INTO %s.t(k, > b) VALUES (0, 2)")); > // Uncomment this line and test succeeds w/ partial writes > completed... > //cluster.get(1).nodetoolResult("repair", > KEYSPACE).asserts().success(); > String select = withKeyspace("SELECT * FROM %s.t WHERE a = 1 AND > b = 2"); > Object[][] initialRows = cluster.coordinator(1).execute(select, > ConsistencyLevel.ALL); > assertRows(initialRows, row(0, 1, 2)); // not found!! > } > {noformat} > To make a long story short, the local SAI indexes are hiding local partial > matches from the coordinator that would combine there to form full matches. > Simple non-index filtering queries also suffer from this problem, but they > hide the partial matches in a different way. I'll outline a possible solution > for this in the comments that takes advantage of replica filtering protection > and
[jira] [Comment Edited] (CASSANDRA-19168) Test Failure: VectorUpdateDeleteTest fails with heap_buffers
[ https://issues.apache.org/jira/browse/CASSANDRA-19168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817046#comment-17817046 ] Ekaterina Dimitrova edited comment on CASSANDRA-19168 at 2/14/24 8:07 PM: -- Thanks! The patch was squashed and updated with your suggestion (new branch for 5.0 so we easily compare with the PR): 5.0 - [https://github.com/ekaterinadimitrova2/cassandra/tree/C-19168-5.0-final] trunk - [https://github.com/ekaterinadimitrova2/cassandra/tree/C-19168-trunk] Running CI at the moment: 5.0 - [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=C-19168-5.0-final] trunk - [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=C-19168-trunk] I also ran the updateTest with all possible options for memtable_allocation_type locally on both branches; it completed successfully. *5.0 CI failures:* truncateWhileUpgrading-_jdk17 - -possible related to CASSANDRA-18635, checking with Berenguer- Different one - CASSANDRA-19398 *trunk CI failures:* - test_consistent_range_movement_true_with_replica_down_should_fail - seems unrelated, I will check and open a ticket - testOptionalMtlsModeDoNotAllowNonSSLConnections-cassandra.testtag_IS_UNDEFINED - known from CASSANDRA-19239 - test_move_single_node_localhost - known from CASSANDRA-19226 - test_authorization_handle_unavailable - known from CASSANDRA-19217 - org.apache.cassandra.simulator.test.HarrySimulatorTest - known from CASSANDRA-19279 - test_stop_failure_policy - known from CASSANDRA-19100 - optionalTlsConnectionAllowedToRegularPortTest-cassandra.testtag_IS_UNDEFINED and testOptionalMtlsModeDoNotAllowNonSSLConnections-cassandra.testtag_IS_UNDEFINED - known from CASSANDRA-19239 was (Author: e.dimitrova): Thanks! The patch was squashed and updated with your suggestion (new branch for 5.0 so we easily compare with the PR): 5.0 - [https://github.com/ekaterinadimitrova2/cassandra/tree/C-19168-5.0-final] trunk - [https://github.com/ekaterinadimitrova2/cassandra/tree/C-19168-trunk] Running CI at the moment: 5.0 - [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=C-19168-5.0-final] trunk - [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=C-19168-trunk] I also ran the updateTest with all possible options for memtable_allocation_type locally on both branches; it completed successfully. *5.0 CI failures:* truncateWhileUpgrading-_jdk17 - possible related to CASSANDRA-18635, checking with Berenguer *trunk CI failures:* - test_consistent_range_movement_true_with_replica_down_should_fail - seems unrelated, I will check and open a ticket - testOptionalMtlsModeDoNotAllowNonSSLConnections-cassandra.testtag_IS_UNDEFINED - known from CASSANDRA-19239 - test_move_single_node_localhost - known from CASSANDRA-19226 - test_authorization_handle_unavailable - known from CASSANDRA-19217 - org.apache.cassandra.simulator.test.HarrySimulatorTest - known from CASSANDRA-19279 - test_stop_failure_policy - known from CASSANDRA-19100 - optionalTlsConnectionAllowedToRegularPortTest-cassandra.testtag_IS_UNDEFINED and testOptionalMtlsModeDoNotAllowNonSSLConnections-cassandra.testtag_IS_UNDEFINED - known from CASSANDRA-19239 > Test Failure: VectorUpdateDeleteTest fails with heap_buffers > > > Key: CASSANDRA-19168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19168 > Project: Cassandra > Issue Type: Bug > Components: Feature/Vector Search >Reporter: Branimir Lambov >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 5.0-rc, 5.x > > > When {{memtable_allocation_type}} is set to {{heap_buffers}}, {{updateTest}} > fails with > {code} > junit.framework.AssertionFailedError: Result set does not contain a row with > pk = 0 > at > org.apache.cassandra.index.sai.cql.VectorTypeTest.assertContainsInt(VectorTypeTest.java:133) > at > org.apache.cassandra.index.sai.cql.VectorUpdateDeleteTest.updateTest(VectorUpdateDeleteTest.java:308) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18635) Test failure: org.apache.cassandra.distributed.test.UpgradeSSTablesTest
[ https://issues.apache.org/jira/browse/CASSANDRA-18635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817513#comment-17817513 ] Ekaterina Dimitrova commented on CASSANDRA-18635: - Thanks, I opened CASSANDRA-19398 My testing shows that the test was not failing when introduced, but it failed on the current 5.0. > Test failure: org.apache.cassandra.distributed.test.UpgradeSSTablesTest > --- > > Key: CASSANDRA-18635 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18635 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/java >Reporter: Brandon Williams >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 5.0-rc, 5.x > > > Seen here: > https://app.circleci.com/pipelines/github/driftx/cassandra/1095/workflows/6114e2e3-8dcc-4bb0-b664-ae7d82c3349f/jobs/33405/tests > {noformat} > junit.framework.AssertionFailedError: expected:<0> but was:<2> > at > org.apache.cassandra.distributed.test.UpgradeSSTablesTest.upgradeSSTablesInterruptsOngoingCompaction(UpgradeSSTablesTest.java:86) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19398) Test Failure: org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading
[ https://issues.apache.org/jira/browse/CASSANDRA-19398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-19398: Fix Version/s: 5.x > Test Failure: > org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading > -- > > Key: CASSANDRA-19398 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19398 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Priority: Normal > Fix For: 5.0-rc, 5.x > > > [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2646/workflows/bc2bba74-9e56-4bea-8de7-4ff840c4f450/jobs/56028/tests#failed-test-0] > {code:java} > junit.framework.AssertionFailedError at > org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading(UpgradeSSTablesTest.java:220) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra-builds) branch trunk updated: Ninja fix the ninja fix
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra-builds.git The following commit(s) were added to refs/heads/trunk by this push: new d995eb4 Ninja fix the ninja fix d995eb4 is described below commit d995eb4d9440c9752c93ea4692ea8ac0d42d46b5 Author: Brandon Williams AuthorDate: Wed Feb 14 14:04:32 2024 -0600 Ninja fix the ninja fix --- cassandra-release/finish_release.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cassandra-release/finish_release.sh b/cassandra-release/finish_release.sh index 143a7ea..7a777d2 100755 --- a/cassandra-release/finish_release.sh +++ b/cassandra-release/finish_release.sh @@ -278,6 +278,6 @@ echo ' 7) update #cassandra topic on slack' echo ' 8) tweet from @cassandra' echo ' 9) release version in JIRA' echo ' 10) remove old version (eg: `svn rm https://dist.apache.org/repos/dist/release/cassandra/`)' -echo ' 11) increment build.xml (base.version), CHANGES.txt, and ubuntu2004_test.docker (ccm's installed) for the next release' +echo ' 11) increment build.xml (base.version), CHANGES.txt, and ubuntu2004_test.docker (ccm installed) for the next release' echo ' 12) Add release in https://reporter.apache.org/addrelease.html?cassandra (same as instructions in email you will receive from the \"Apache Reporter Service\")' echo ' 13) update current_ version in cassandra-dtest/upgrade_tests/upgrade_manifest.py' - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19398) Test Failure: org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading
[ https://issues.apache.org/jira/browse/CASSANDRA-19398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817510#comment-17817510 ] Ekaterina Dimitrova commented on CASSANDRA-19398: - Not reproduced on the commit that introduced the test: {code:java} .circleci/generate.sh -h \ -e REPEATED_UTEST_TARGET=test-jvm-dtest-some \ -e REPEATED_UTEST_CLASS=org.apache.cassandra.distributed.test.UpgradeSSTablesTest \ -e REPEATED_UTEST_METHODS=truncateWhileUpgrading \ -e REPEATED_UTEST_COUNT=2000 {code} [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2649/workflows/b9524d18-a142-4394-b6ee-f41fef6da93d] Reproduced on current 5.0: {code:java} .circleci/generate.sh -ps {code} {code:java} -e REPEATED_JVM_DTESTS=org.apache.cassandra.distributed.test.UpgradeSSTablesTest#truncateWhileUpgrading -e REPEATED_JVM_DTESTS_COUNT=2000{code} https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19398-5.0 > Test Failure: > org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading > -- > > Key: CASSANDRA-19398 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19398 > Project: Cassandra > Issue Type: Bug >Reporter: Ekaterina Dimitrova >Priority: Normal > Fix For: 5.0.x > > > [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2646/workflows/bc2bba74-9e56-4bea-8de7-4ff840c4f450/jobs/56028/tests#failed-test-0] > {code:java} > junit.framework.AssertionFailedError at > org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading(UpgradeSSTablesTest.java:220) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19398) Test Failure: org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading
[ https://issues.apache.org/jira/browse/CASSANDRA-19398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-19398: Bug Category: Parent values: Correctness(12982)Level 1 values: Test Failure(12990) Complexity: Normal Component/s: CI Discovered By: User Report Fix Version/s: 5.0-rc (was: 5.0.x) Severity: Normal Status: Open (was: Triage Needed) > Test Failure: > org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading > -- > > Key: CASSANDRA-19398 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19398 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Priority: Normal > Fix For: 5.0-rc > > > [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2646/workflows/bc2bba74-9e56-4bea-8de7-4ff840c4f450/jobs/56028/tests#failed-test-0] > {code:java} > junit.framework.AssertionFailedError at > org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading(UpgradeSSTablesTest.java:220) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19180) Support reloading certificate stores in cassandra-java-driver
[ https://issues.apache.org/jira/browse/CASSANDRA-19180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817502#comment-17817502 ] Bret McGuire commented on CASSANDRA-19180: -- Thanks [~brandon.williams] ! With your +1 we have two approvals from committers so we're all set! > Support reloading certificate stores in cassandra-java-driver > - > > Key: CASSANDRA-19180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19180 > Project: Cassandra > Issue Type: New Feature > Components: Client/java-driver >Reporter: Abe Ratnofsky >Assignee: Abe Ratnofsky >Priority: Normal > Time Spent: 2.5h > Remaining Estimate: 0h > > Currently, apache/cassandra-java-driver does not reload SSLContext when the > underlying certificate store files change. When the DefaultSslEngineFactory > (and the other factories) are set up, they build a fixed instance of > javax.net.ssl.SSLContext that doesn't change: > https://github.com/apache/cassandra-java-driver/blob/12e3e3ea027c51c5807e5e46ba542f894edfa4e7/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/DefaultSslEngineFactory.java#L74 > This fixed SSLContext is used to negotiate SSL with the cluster, and if a > keystore is reloaded on disk it isn't picked up by the driver, and future > reconnections will fail if the keystore certificates have expired by the time > they're used to handshake a new connection. > We should reload client certificates so that applications that provide them > can use short-lived certificates and not require a bounce to pick up new > certificates. This is especially relevant in a world with CASSANDRA-18554 and > broad use of mTLS. > I have a patch for this that is nearly ready. Now that the project has moved > under apache/ - who can I work with to understand how CI works now? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra-java-driver) 01/03: CASSANDRA-19180: Support reloading keystore in cassandra-java-driver
This is an automated email from the ASF dual-hosted git repository. absurdfarce pushed a commit to branch 4.x in repository https://gitbox.apache.org/repos/asf/cassandra-java-driver.git commit 8e73232102d6275b4f13de9d089d3a9b224c9727 Author: Abe Ratnofsky AuthorDate: Thu Jan 18 14:20:44 2024 -0500 CASSANDRA-19180: Support reloading keystore in cassandra-java-driver --- .../api/core/config/DefaultDriverOption.java | 6 + .../driver/api/core/config/TypedDriverOption.java | 6 + .../internal/core/ssl/DefaultSslEngineFactory.java | 35 +-- .../core/ssl/ReloadingKeyManagerFactory.java | 257 +++ core/src/main/resources/reference.conf | 7 + .../core/ssl/ReloadingKeyManagerFactoryTest.java | 272 + .../ReloadingKeyManagerFactoryTest/README.md | 39 +++ .../certs/client-alternate.keystore| Bin 0 -> 2467 bytes .../certs/client-original.keystore | Bin 0 -> 2457 bytes .../certs/client.truststore| Bin 0 -> 1002 bytes .../certs/server.keystore | Bin 0 -> 2407 bytes .../certs/server.truststore| Bin 0 -> 1890 bytes manual/core/ssl/README.md | 10 +- upgrade_guide/README.md| 11 + 14 files changed, 627 insertions(+), 16 deletions(-) diff --git a/core/src/main/java/com/datastax/oss/driver/api/core/config/DefaultDriverOption.java b/core/src/main/java/com/datastax/oss/driver/api/core/config/DefaultDriverOption.java index 4c0668570..c10a8237c 100644 --- a/core/src/main/java/com/datastax/oss/driver/api/core/config/DefaultDriverOption.java +++ b/core/src/main/java/com/datastax/oss/driver/api/core/config/DefaultDriverOption.java @@ -255,6 +255,12 @@ public enum DefaultDriverOption implements DriverOption { * Value-type: {@link String} */ SSL_KEYSTORE_PASSWORD("advanced.ssl-engine-factory.keystore-password"), + /** + * The duration between attempts to reload the keystore. + * + * Value-type: {@link java.time.Duration} + */ + SSL_KEYSTORE_RELOAD_INTERVAL("advanced.ssl-engine-factory.keystore-reload-interval"), /** * The location of the truststore file. * diff --git a/core/src/main/java/com/datastax/oss/driver/api/core/config/TypedDriverOption.java b/core/src/main/java/com/datastax/oss/driver/api/core/config/TypedDriverOption.java index ec3607973..88c012fa3 100644 --- a/core/src/main/java/com/datastax/oss/driver/api/core/config/TypedDriverOption.java +++ b/core/src/main/java/com/datastax/oss/driver/api/core/config/TypedDriverOption.java @@ -235,6 +235,12 @@ public class TypedDriverOption { /** The keystore password. */ public static final TypedDriverOption SSL_KEYSTORE_PASSWORD = new TypedDriverOption<>(DefaultDriverOption.SSL_KEYSTORE_PASSWORD, GenericType.STRING); + + /** The duration between attempts to reload the keystore. */ + public static final TypedDriverOption SSL_KEYSTORE_RELOAD_INTERVAL = + new TypedDriverOption<>( + DefaultDriverOption.SSL_KEYSTORE_RELOAD_INTERVAL, GenericType.DURATION); + /** The location of the truststore file. */ public static final TypedDriverOption SSL_TRUSTSTORE_PATH = new TypedDriverOption<>(DefaultDriverOption.SSL_TRUSTSTORE_PATH, GenericType.STRING); diff --git a/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/DefaultSslEngineFactory.java b/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/DefaultSslEngineFactory.java index 085b36dc5..55a6e9c7d 100644 --- a/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/DefaultSslEngineFactory.java +++ b/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/DefaultSslEngineFactory.java @@ -27,11 +27,12 @@ import java.io.InputStream; import java.net.InetSocketAddress; import java.net.SocketAddress; import java.nio.file.Files; +import java.nio.file.Path; import java.nio.file.Paths; import java.security.KeyStore; import java.security.SecureRandom; +import java.time.Duration; import java.util.List; -import javax.net.ssl.KeyManagerFactory; import javax.net.ssl.SSLContext; import javax.net.ssl.SSLEngine; import javax.net.ssl.SSLParameters; @@ -54,6 +55,7 @@ import net.jcip.annotations.ThreadSafe; * truststore-password = password123 * keystore-path = /path/to/client.keystore * keystore-password = password123 + * keystore-reload-interval = 30 minutes * } * } * @@ -66,6 +68,7 @@ public class DefaultSslEngineFactory implements SslEngineFactory { private final SSLContext sslContext; private final String[] cipherSuites; private final boolean requireHostnameValidation; + private ReloadingKeyManagerFactory kmf; /** Builds a new instance from the driver configuration. */ public DefaultSslEngineFactory(DriverContext driverContext) { @@ -132,20 +135,8 @@ public class DefaultSslEngineFactory implements SslEngineFactory { }
(cassandra-java-driver) 03/03: Address PR feedback: reload-interval to use Optional internally and null in config, rather than using sentinel Duration.ZERO
This is an automated email from the ASF dual-hosted git repository. absurdfarce pushed a commit to branch 4.x in repository https://gitbox.apache.org/repos/asf/cassandra-java-driver.git commit ea2e475185b5863ef6eed347f57286d6a3bfd8a9 Author: Abe Ratnofsky AuthorDate: Fri Feb 2 14:56:22 2024 -0500 Address PR feedback: reload-interval to use Optional internally and null in config, rather than using sentinel Duration.ZERO --- .../internal/core/ssl/DefaultSslEngineFactory.java | 14 +-- .../core/ssl/ReloadingKeyManagerFactory.java | 29 +++--- .../core/ssl/ReloadingKeyManagerFactoryTest.java | 4 +-- 3 files changed, 27 insertions(+), 20 deletions(-) diff --git a/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/DefaultSslEngineFactory.java b/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/DefaultSslEngineFactory.java index adf23f8e8..bb95dc738 100644 --- a/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/DefaultSslEngineFactory.java +++ b/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/DefaultSslEngineFactory.java @@ -33,6 +33,7 @@ import java.security.KeyStore; import java.security.SecureRandom; import java.time.Duration; import java.util.List; +import java.util.Optional; import javax.net.ssl.SSLContext; import javax.net.ssl.SSLEngine; import javax.net.ssl.SSLParameters; @@ -153,14 +154,11 @@ public class DefaultSslEngineFactory implements SslEngineFactory { private ReloadingKeyManagerFactory buildReloadingKeyManagerFactory(DriverExecutionProfile config) throws Exception { Path keystorePath = Paths.get(config.getString(DefaultDriverOption.SSL_KEYSTORE_PATH)); -String password = -config.isDefined(DefaultDriverOption.SSL_KEYSTORE_PASSWORD) -? config.getString(DefaultDriverOption.SSL_KEYSTORE_PASSWORD) -: null; -Duration reloadInterval = -config.isDefined(DefaultDriverOption.SSL_KEYSTORE_RELOAD_INTERVAL) -? config.getDuration(DefaultDriverOption.SSL_KEYSTORE_RELOAD_INTERVAL) -: Duration.ZERO; +String password = config.getString(DefaultDriverOption.SSL_KEYSTORE_PASSWORD, null); +Optional reloadInterval = +Optional.ofNullable( + config.getDuration(DefaultDriverOption.SSL_KEYSTORE_RELOAD_INTERVAL, null)); + return ReloadingKeyManagerFactory.create(keystorePath, password, reloadInterval); } diff --git a/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/ReloadingKeyManagerFactory.java b/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/ReloadingKeyManagerFactory.java index 540ddfd79..8a9e11bb2 100644 --- a/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/ReloadingKeyManagerFactory.java +++ b/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/ReloadingKeyManagerFactory.java @@ -36,6 +36,7 @@ import java.security.cert.CertificateException; import java.security.cert.X509Certificate; import java.time.Duration; import java.util.Arrays; +import java.util.Optional; import java.util.concurrent.Executors; import java.util.concurrent.ScheduledExecutorService; import java.util.concurrent.TimeUnit; @@ -68,12 +69,12 @@ public class ReloadingKeyManagerFactory extends KeyManagerFactory implements Aut * * @param keystorePath the keystore file to reload * @param keystorePassword the keystore password - * @param reloadInterval the duration between reload attempts. Set to {@link - * java.time.Duration#ZERO} to disable scheduled reloading. + * @param reloadInterval the duration between reload attempts. Set to {@link Optional#empty()} to + * disable scheduled reloading. * @return */ - public static ReloadingKeyManagerFactory create( - Path keystorePath, String keystorePassword, Duration reloadInterval) + static ReloadingKeyManagerFactory create( + Path keystorePath, String keystorePassword, Optional reloadInterval) throws UnrecoverableKeyException, KeyStoreException, NoSuchAlgorithmException, CertificateException, IOException { KeyManagerFactory kmf = KeyManagerFactory.getInstance(KeyManagerFactory.getDefaultAlgorithm()); @@ -103,14 +104,24 @@ public class ReloadingKeyManagerFactory extends KeyManagerFactory implements Aut this.spi = spi; } - private void start(Path keystorePath, String keystorePassword, Duration reloadInterval) { + private void start( + Path keystorePath, String keystorePassword, Optional reloadInterval) { this.keystorePath = keystorePath; this.keystorePassword = keystorePassword; // Ensure that reload is called once synchronously, to make sure the file exists etc. reload(); -if (!reloadInterval.isZero()) { +if (!reloadInterval.isPresent() || reloadInterval.get().isZero()) { + final String msg = + "KeyStore reloading is disabled. If your Cassandra cluster requires client certificates, " +
(cassandra-java-driver) 02/03: PR feedback: avoid extra exception wrapping, provide thread naming, improve error messages, etc.
This is an automated email from the ASF dual-hosted git repository. absurdfarce pushed a commit to branch 4.x in repository https://gitbox.apache.org/repos/asf/cassandra-java-driver.git commit c7719aed14705b735571ecbfbda23d3b8506eb11 Author: Abe Ratnofsky AuthorDate: Tue Jan 23 16:09:35 2024 -0500 PR feedback: avoid extra exception wrapping, provide thread naming, improve error messages, etc. --- .../api/core/config/DefaultDriverOption.java | 12 +++--- .../internal/core/ssl/DefaultSslEngineFactory.java | 4 +- .../core/ssl/ReloadingKeyManagerFactory.java | 44 ++ 3 files changed, 28 insertions(+), 32 deletions(-) diff --git a/core/src/main/java/com/datastax/oss/driver/api/core/config/DefaultDriverOption.java b/core/src/main/java/com/datastax/oss/driver/api/core/config/DefaultDriverOption.java index c10a8237c..afe16e968 100644 --- a/core/src/main/java/com/datastax/oss/driver/api/core/config/DefaultDriverOption.java +++ b/core/src/main/java/com/datastax/oss/driver/api/core/config/DefaultDriverOption.java @@ -255,12 +255,6 @@ public enum DefaultDriverOption implements DriverOption { * Value-type: {@link String} */ SSL_KEYSTORE_PASSWORD("advanced.ssl-engine-factory.keystore-password"), - /** - * The duration between attempts to reload the keystore. - * - * Value-type: {@link java.time.Duration} - */ - SSL_KEYSTORE_RELOAD_INTERVAL("advanced.ssl-engine-factory.keystore-reload-interval"), /** * The location of the truststore file. * @@ -982,6 +976,12 @@ public enum DefaultDriverOption implements DriverOption { * Value-type: boolean */ METRICS_GENERATE_AGGREGABLE_HISTOGRAMS("advanced.metrics.histograms.generate-aggregable"), + /** + * The duration between attempts to reload the keystore. + * + * Value-type: {@link java.time.Duration} + */ + SSL_KEYSTORE_RELOAD_INTERVAL("advanced.ssl-engine-factory.keystore-reload-interval"), ; private final String path; diff --git a/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/DefaultSslEngineFactory.java b/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/DefaultSslEngineFactory.java index 55a6e9c7d..adf23f8e8 100644 --- a/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/DefaultSslEngineFactory.java +++ b/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/DefaultSslEngineFactory.java @@ -150,8 +150,8 @@ public class DefaultSslEngineFactory implements SslEngineFactory { } } - private ReloadingKeyManagerFactory buildReloadingKeyManagerFactory( - DriverExecutionProfile config) { + private ReloadingKeyManagerFactory buildReloadingKeyManagerFactory(DriverExecutionProfile config) + throws Exception { Path keystorePath = Paths.get(config.getString(DefaultDriverOption.SSL_KEYSTORE_PATH)); String password = config.isDefined(DefaultDriverOption.SSL_KEYSTORE_PASSWORD) diff --git a/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/ReloadingKeyManagerFactory.java b/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/ReloadingKeyManagerFactory.java index 9aaee7011..540ddfd79 100644 --- a/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/ReloadingKeyManagerFactory.java +++ b/core/src/main/java/com/datastax/oss/driver/internal/core/ssl/ReloadingKeyManagerFactory.java @@ -73,26 +73,17 @@ public class ReloadingKeyManagerFactory extends KeyManagerFactory implements Aut * @return */ public static ReloadingKeyManagerFactory create( - Path keystorePath, String keystorePassword, Duration reloadInterval) { -KeyManagerFactory kmf; -try { - kmf = KeyManagerFactory.getInstance(KeyManagerFactory.getDefaultAlgorithm()); -} catch (NoSuchAlgorithmException e) { - throw new RuntimeException(e); -} + Path keystorePath, String keystorePassword, Duration reloadInterval) + throws UnrecoverableKeyException, KeyStoreException, NoSuchAlgorithmException, + CertificateException, IOException { +KeyManagerFactory kmf = KeyManagerFactory.getInstance(KeyManagerFactory.getDefaultAlgorithm()); KeyStore ks; try (InputStream ksf = Files.newInputStream(keystorePath)) { ks = KeyStore.getInstance(KEYSTORE_TYPE); ks.load(ksf, keystorePassword.toCharArray()); -} catch (IOException | CertificateException | KeyStoreException | NoSuchAlgorithmException e) { - throw new RuntimeException(e); -} -try { - kmf.init(ks, keystorePassword.toCharArray()); -} catch (KeyStoreException | NoSuchAlgorithmException | UnrecoverableKeyException e) { - throw new RuntimeException(e); } +kmf.init(ks, keystorePassword.toCharArray()); ReloadingKeyManagerFactory reloadingKeyManagerFactory = new ReloadingKeyManagerFactory(kmf); reloadingKeyManagerFactory.start(keystorePath, keystorePassword, reloadInterval); @@ -115,24 +106,26 @@ public class
(cassandra-java-driver) branch 4.x updated (8d5849cb3 -> ea2e47518)
This is an automated email from the ASF dual-hosted git repository. absurdfarce pushed a change to branch 4.x in repository https://gitbox.apache.org/repos/asf/cassandra-java-driver.git from 8d5849cb3 Remove ASL header from test resource files (that was breaking integration tests) new 8e7323210 CASSANDRA-19180: Support reloading keystore in cassandra-java-driver new c7719aed1 PR feedback: avoid extra exception wrapping, provide thread naming, improve error messages, etc. new ea2e47518 Address PR feedback: reload-interval to use Optional internally and null in config, rather than using sentinel Duration.ZERO The 3 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../api/core/config/DefaultDriverOption.java | 6 + .../driver/api/core/config/TypedDriverOption.java | 6 + .../internal/core/ssl/DefaultSslEngineFactory.java | 33 +-- .../core/ssl/ReloadingKeyManagerFactory.java | 264 core/src/main/resources/reference.conf | 7 + .../core/ssl/ReloadingKeyManagerFactoryTest.java | 270 + .../ReloadingKeyManagerFactoryTest/README.md | 39 +++ .../certs/client-alternate.keystore| Bin 0 -> 2467 bytes .../certs/client-original.keystore | Bin 0 -> 2457 bytes .../certs/client.truststore| Bin 0 -> 1002 bytes .../certs/server.keystore | Bin 0 -> 2407 bytes .../certs/server.truststore| Bin 0 -> 1890 bytes manual/core/ssl/README.md | 10 +- upgrade_guide/README.md| 11 + 14 files changed, 630 insertions(+), 16 deletions(-) create mode 100644 core/src/main/java/com/datastax/oss/driver/internal/core/ssl/ReloadingKeyManagerFactory.java create mode 100644 core/src/test/java/com/datastax/oss/driver/internal/core/ssl/ReloadingKeyManagerFactoryTest.java create mode 100644 core/src/test/resources/ReloadingKeyManagerFactoryTest/README.md create mode 100644 core/src/test/resources/ReloadingKeyManagerFactoryTest/certs/client-alternate.keystore create mode 100644 core/src/test/resources/ReloadingKeyManagerFactoryTest/certs/client-original.keystore create mode 100644 core/src/test/resources/ReloadingKeyManagerFactoryTest/certs/client.truststore create mode 100644 core/src/test/resources/ReloadingKeyManagerFactoryTest/certs/server.keystore create mode 100644 core/src/test/resources/ReloadingKeyManagerFactoryTest/certs/server.truststore - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19180: Support reloading keystore in cassandra-java-driver [cassandra-java-driver]
absurdfarce merged PR #1907: URL: https://github.com/apache/cassandra-java-driver/pull/1907 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19399) Zombie repair session blocks further incremental repairs due to SSTable lock
[ https://issues.apache.org/jira/browse/CASSANDRA-19399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817479#comment-17817479 ] Sebastian Marsching commented on CASSANDRA-19399: - I considered that it might be the same as CASSANDRA-19182, but I ran {{sstablemetadata}} for all SSTables in the affected keyspace and none of them had the pending repair flag set, so I it seems to be something else. > Zombie repair session blocks further incremental repairs due to SSTable lock > > > Key: CASSANDRA-19399 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19399 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Sebastian Marsching >Priority: Normal > Fix For: 4.1.x > > Attachments: system.log.txt > > > We have experienced the following bug in C* 4.1.3 at least twice: > Somtimes, a failed incremental repair session keeps future incremental repair > sessions from running. These future sessions fail with the following message > in the log file: > {code:java} > PendingAntiCompaction.java:210 - Prepare phase for incremental repair session > c8b65260-cb53-11ee-a219-3d5d7e5cdec7 has failed because it encountered > intersecting sstables belonging to another incremental repair session > (02d7c1a0-cb3a-11ee-aa89-a1b2ad548382). This is caused by starting an > incremental repair session before a previous one has completed. Check > nodetool repair_admin for hung sessions and fix them. {code} > This happens, even though there are no active repair sessions on any node > ({{{}nodetool repair_admin list{}}} prints {{{}no sessions{}}}). > When running {{{}nodetool repair_admin list --all{}}}, the offending session > is listed as failed: > {code:java} > id | state | last activity | > coordinator | participants > > > > > > > | participants_wp > > > > > > > > > > > 02d7c1a0-cb3a-11ee-aa89-a1b2ad548382 | FAILED | 5454 (s) | > /192.168.108.235:7000 | > 192.168.108.224,192.168.108.96,192.168.108.97,192.168.108.225,192.168.108.226,192.168.108.98,192.168.108.99,192.168.108.227,192.168.108.100,192.168.108.228,192.168.108.229,192.168.108.101,192.168.108.230,192.168.108.102,192.168.108.103,192.168.108.231,192.168.108.221,192.168.108.94,192.168.108.222,192.168.108.95,192.168.108.223,192.168.108.241,192.168.108.242,192.168.108.243,192.168.108.244,192.168.108.104,192.168.108.105,192.168.108.235 > > {code} > This still happens after canceling the repair session, regardless of whether > it is canceled on the coordinator node or on all nodes (using > {{{}--force{}}}). > I attached all lines from the C* system log that refer to the offending > session. It seems like another repair session was started while this session > was still running (possibly due to a bug in Cassandra Reaper), but the > session was failed right after that but still seems to hold a lock on some of > the SSTables. > The problem can be resolved by restarting the nodes affected by this (which > typically means doing a rolling restart of the whole cluster), but this is > obviously not ideal... -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19399) Zombie repair session blocks further incremental repairs due to SSTable lock
[ https://issues.apache.org/jira/browse/CASSANDRA-19399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817476#comment-17817476 ] Andy Tolbert commented on CASSANDRA-19399: -- Could this be the same as [CASSANDRA-19182]? > Zombie repair session blocks further incremental repairs due to SSTable lock > > > Key: CASSANDRA-19399 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19399 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Sebastian Marsching >Priority: Normal > Fix For: 4.1.x > > Attachments: system.log.txt > > > We have experienced the following bug in C* 4.1.3 at least twice: > Somtimes, a failed incremental repair session keeps future incremental repair > sessions from running. These future sessions fail with the following message > in the log file: > {code:java} > PendingAntiCompaction.java:210 - Prepare phase for incremental repair session > c8b65260-cb53-11ee-a219-3d5d7e5cdec7 has failed because it encountered > intersecting sstables belonging to another incremental repair session > (02d7c1a0-cb3a-11ee-aa89-a1b2ad548382). This is caused by starting an > incremental repair session before a previous one has completed. Check > nodetool repair_admin for hung sessions and fix them. {code} > This happens, even though there are no active repair sessions on any node > ({{{}nodetool repair_admin list{}}} prints {{{}no sessions{}}}). > When running {{{}nodetool repair_admin list --all{}}}, the offending session > is listed as failed: > {code:java} > id | state | last activity | > coordinator | participants > > > > > > > | participants_wp > > > > > > > > > > > 02d7c1a0-cb3a-11ee-aa89-a1b2ad548382 | FAILED | 5454 (s) | > /192.168.108.235:7000 | > 192.168.108.224,192.168.108.96,192.168.108.97,192.168.108.225,192.168.108.226,192.168.108.98,192.168.108.99,192.168.108.227,192.168.108.100,192.168.108.228,192.168.108.229,192.168.108.101,192.168.108.230,192.168.108.102,192.168.108.103,192.168.108.231,192.168.108.221,192.168.108.94,192.168.108.222,192.168.108.95,192.168.108.223,192.168.108.241,192.168.108.242,192.168.108.243,192.168.108.244,192.168.108.104,192.168.108.105,192.168.108.235 > > {code} > This still happens after canceling the repair session, regardless of whether > it is canceled on the coordinator node or on all nodes (using > {{{}--force{}}}). > I attached all lines from the C* system log that refer to the offending > session. It seems like another repair session was started while this session > was still running (possibly due to a bug in Cassandra Reaper), but the > session was failed right after that but still seems to hold a lock on some of > the SSTables. > The problem can be resolved by restarting the nodes affected by this (which > typically means doing a rolling restart of the whole cluster), but this is > obviously not ideal... -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19399) Zombie repair session blocks further incremental repairs due to SSTable lock
[ https://issues.apache.org/jira/browse/CASSANDRA-19399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817468#comment-17817468 ] Brandon Williams commented on CASSANDRA-19399: -- /cc [~dcapwell] > Zombie repair session blocks further incremental repairs due to SSTable lock > > > Key: CASSANDRA-19399 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19399 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Sebastian Marsching >Priority: Normal > Fix For: 4.1.x > > Attachments: system.log.txt > > > We have experienced the following bug in C* 4.1.3 at least twice: > Somtimes, a failed incremental repair session keeps future incremental repair > sessions from running. These future sessions fail with the following message > in the log file: > {code:java} > PendingAntiCompaction.java:210 - Prepare phase for incremental repair session > c8b65260-cb53-11ee-a219-3d5d7e5cdec7 has failed because it encountered > intersecting sstables belonging to another incremental repair session > (02d7c1a0-cb3a-11ee-aa89-a1b2ad548382). This is caused by starting an > incremental repair session before a previous one has completed. Check > nodetool repair_admin for hung sessions and fix them. {code} > This happens, even though there are no active repair sessions on any node > ({{{}nodetool repair_admin list{}}} prints {{{}no sessions{}}}). > When running {{{}nodetool repair_admin list --all{}}}, the offending session > is listed as failed: > {code:java} > id | state | last activity | > coordinator | participants > > > > > > > | participants_wp > > > > > > > > > > > 02d7c1a0-cb3a-11ee-aa89-a1b2ad548382 | FAILED | 5454 (s) | > /192.168.108.235:7000 | > 192.168.108.224,192.168.108.96,192.168.108.97,192.168.108.225,192.168.108.226,192.168.108.98,192.168.108.99,192.168.108.227,192.168.108.100,192.168.108.228,192.168.108.229,192.168.108.101,192.168.108.230,192.168.108.102,192.168.108.103,192.168.108.231,192.168.108.221,192.168.108.94,192.168.108.222,192.168.108.95,192.168.108.223,192.168.108.241,192.168.108.242,192.168.108.243,192.168.108.244,192.168.108.104,192.168.108.105,192.168.108.235 > > {code} > This still happens after canceling the repair session, regardless of whether > it is canceled on the coordinator node or on all nodes (using > {{{}--force{}}}). > I attached all lines from the C* system log that refer to the offending > session. It seems like another repair session was started while this session > was still running (possibly due to a bug in Cassandra Reaper), but the > session was failed right after that but still seems to hold a lock on some of > the SSTables. > The problem can be resolved by restarting the nodes affected by this (which > typically means doing a rolling restart of the whole cluster), but this is > obviously not ideal... -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19399) Zombie repair session blocks further incremental repairs due to SSTable lock
[ https://issues.apache.org/jira/browse/CASSANDRA-19399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-19399: - Bug Category: Parent values: Degradation(12984)Level 1 values: Resource Management(12995) Complexity: Normal Discovered By: User Report Fix Version/s: 4.1.x Severity: Normal Status: Open (was: Triage Needed) > Zombie repair session blocks further incremental repairs due to SSTable lock > > > Key: CASSANDRA-19399 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19399 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Sebastian Marsching >Priority: Normal > Fix For: 4.1.x > > Attachments: system.log.txt > > > We have experienced the following bug in C* 4.1.3 at least twice: > Somtimes, a failed incremental repair session keeps future incremental repair > sessions from running. These future sessions fail with the following message > in the log file: > {code:java} > PendingAntiCompaction.java:210 - Prepare phase for incremental repair session > c8b65260-cb53-11ee-a219-3d5d7e5cdec7 has failed because it encountered > intersecting sstables belonging to another incremental repair session > (02d7c1a0-cb3a-11ee-aa89-a1b2ad548382). This is caused by starting an > incremental repair session before a previous one has completed. Check > nodetool repair_admin for hung sessions and fix them. {code} > This happens, even though there are no active repair sessions on any node > ({{{}nodetool repair_admin list{}}} prints {{{}no sessions{}}}). > When running {{{}nodetool repair_admin list --all{}}}, the offending session > is listed as failed: > {code:java} > id | state | last activity | > coordinator | participants > > > > > > > | participants_wp > > > > > > > > > > > 02d7c1a0-cb3a-11ee-aa89-a1b2ad548382 | FAILED | 5454 (s) | > /192.168.108.235:7000 | > 192.168.108.224,192.168.108.96,192.168.108.97,192.168.108.225,192.168.108.226,192.168.108.98,192.168.108.99,192.168.108.227,192.168.108.100,192.168.108.228,192.168.108.229,192.168.108.101,192.168.108.230,192.168.108.102,192.168.108.103,192.168.108.231,192.168.108.221,192.168.108.94,192.168.108.222,192.168.108.95,192.168.108.223,192.168.108.241,192.168.108.242,192.168.108.243,192.168.108.244,192.168.108.104,192.168.108.105,192.168.108.235 > > {code} > This still happens after canceling the repair session, regardless of whether > it is canceled on the coordinator node or on all nodes (using > {{{}--force{}}}). > I attached all lines from the C* system log that refer to the offending > session. It seems like another repair session was started while this session > was still running (possibly due to a bug in Cassandra Reaper), but the > session was failed right after that but still seems to hold a lock on some of > the SSTables. > The problem can be resolved by restarting the nodes affected by this (which > typically means doing a rolling restart of the whole cluster), but this is > obviously not ideal... -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19399) Zombie repair session blocks further incremental repairs due to SSTable lock
Sebastian Marsching created CASSANDRA-19399: --- Summary: Zombie repair session blocks further incremental repairs due to SSTable lock Key: CASSANDRA-19399 URL: https://issues.apache.org/jira/browse/CASSANDRA-19399 Project: Cassandra Issue Type: Bug Components: Consistency/Repair Reporter: Sebastian Marsching Attachments: system.log.txt We have experienced the following bug in C* 4.1.3 at least twice: Somtimes, a failed incremental repair session keeps future incremental repair sessions from running. These future sessions fail with the following message in the log file: {code:java} PendingAntiCompaction.java:210 - Prepare phase for incremental repair session c8b65260-cb53-11ee-a219-3d5d7e5cdec7 has failed because it encountered intersecting sstables belonging to another incremental repair session (02d7c1a0-cb3a-11ee-aa89-a1b2ad548382). This is caused by starting an incremental repair session before a previous one has completed. Check nodetool repair_admin for hung sessions and fix them. {code} This happens, even though there are no active repair sessions on any node ({{{}nodetool repair_admin list{}}} prints {{{}no sessions{}}}). When running {{{}nodetool repair_admin list --all{}}}, the offending session is listed as failed: {code:java} id | state | last activity | coordinator | participants | participants_wp 02d7c1a0-cb3a-11ee-aa89-a1b2ad548382 | FAILED | 5454 (s) | /192.168.108.235:7000 | 192.168.108.224,192.168.108.96,192.168.108.97,192.168.108.225,192.168.108.226,192.168.108.98,192.168.108.99,192.168.108.227,192.168.108.100,192.168.108.228,192.168.108.229,192.168.108.101,192.168.108.230,192.168.108.102,192.168.108.103,192.168.108.231,192.168.108.221,192.168.108.94,192.168.108.222,192.168.108.95,192.168.108.223,192.168.108.241,192.168.108.242,192.168.108.243,192.168.108.244,192.168.108.104,192.168.108.105,192.168.108.235 {code} This still happens after canceling the repair session, regardless of whether it is canceled on the coordinator node or on all nodes (using {{{}--force{}}}). I attached all lines from the C* system log that refer to the offending session. It seems like another repair session was started while this session was still running (possibly due to a bug in Cassandra Reaper), but the session was failed right after that but still seems to hold a lock on some of the SSTables. The problem can be resolved by restarting the nodes affected by this (which typically means doing a rolling restart of the whole cluster), but this is obviously not ideal... -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra-website) branch asf-site updated (2d778def9 -> 95d5d1e87)
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a change to branch asf-site in repository https://gitbox.apache.org/repos/asf/cassandra-website.git discard 2d778def9 generate docs for aa8a03c7 add c4b35db18 Minor release 4.1.4 add 95d5d1e87 generate docs for c4b35db1 This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (2d778def9) \ N -- N -- N refs/heads/asf-site (95d5d1e87) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: content/_/download.html| 8 +- .../managing/configuration/cass_yaml_file.html | 3 +- .../managing/configuration/cass_yaml_file.html | 3 +- .../5.1/cassandra/managing/operating/metrics.html | 180 - .../managing/tools/nodetool/clientstats.html | 8 +- .../managing/tools/nodetool/reconfigurecms.html| 11 +- .../managing/configuration/cass_yaml_file.html | 3 +- .../managing/configuration/cass_yaml_file.html | 3 +- .../cassandra/managing/operating/metrics.html | 180 - .../managing/tools/nodetool/clientstats.html | 8 +- .../managing/tools/nodetool/reconfigurecms.html| 11 +- content/search-index.js| 2 +- .../source/modules/ROOT/pages/download.adoc| 8 +- site-ui/build/ui-bundle.zip| Bin 4883646 -> 4883646 bytes 14 files changed, 387 insertions(+), 41 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19397) Remove all code around native_transport_port_ssl
[ https://issues.apache.org/jira/browse/CASSANDRA-19397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-19397: -- Test and Documentation Plan: CI Status: Patch Available (was: In Progress) > Remove all code around native_transport_port_ssl > > > Key: CASSANDRA-19397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19397 > Project: Cassandra > Issue Type: Task > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > We deprecated native_transport_port_ssl in CASSANDRA-19392 and we told we go > to remove it next. This ticket is about that removal. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19391) Flush metadata snapshot table on every write
[ https://issues.apache.org/jira/browse/CASSANDRA-19391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817452#comment-17817452 ] Marcus Eriksson edited comment on CASSANDRA-19391 at 2/14/24 5:09 PM: -- flushing showed that we couldn't really read the metadata_snapshots sstables due to the reversed longtoken localpartitioner we added in CASSANDRA-19189, so here we add a reverse ordered partitioner (for long keys) which calculates tokens by Long.MAX_VALUE - key. CI a bit shaky, but looks like unrelated failures, will rerun (includes both CASSANDRA-19390 and CASSANDRA-19391) https://github.com/apache/cassandra/pull/3104 was (Author: krummas): flushing showed that we couldn't really read the metadata_snapshots sstables due to the reversed longtoken localpartitioner we added in CASSANDRA-19189, so here we add a reverse ordered partitioner (for long keys) which calculates tokens by Long.MAX_VALUE - key. CI a bit shaky, but looks like unrelated failures, will rerun (includes both CASSANDRA-19390 and CASSANDRA-19391) > Flush metadata snapshot table on every write > > > Key: CASSANDRA-19391 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19391 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Low > Fix For: 5.x > > Attachments: ci_summary.html, result_details.tar.gz > > > We depend on the latest snapshot when starting up, flushing avoids gaps > between latest snapshot and the most recent local log entry -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19391) Flush metadata snapshot table on every write
[ https://issues.apache.org/jira/browse/CASSANDRA-19391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817452#comment-17817452 ] Marcus Eriksson edited comment on CASSANDRA-19391 at 2/14/24 5:06 PM: -- flushing showed that we couldn't really read the metadata_snapshots sstables due to the reversed longtoken localpartitioner we added in CASSANDRA-19189, so here we add a reverse ordered partitioner (for long keys) which calculates tokens by Long.MAX_VALUE - key. CI a bit shaky, but looks like unrelated failures, will rerun (includes both CASSANDRA-19390 and CASSANDRA-19391) was (Author: krummas): flushing showed that we couldn't really read the metadata_snapshots sstables due to the reversed longtoken partitioner we added in CASSANDRA-19189, so here we add a reverse ordered partitioner (for long keys) which calculates tokens by Long.MAX_VALUE - key. CI a bit shaky, but looks like unrelated failures, will rerun (includes both CASSANDRA-19390 and CASSANDRA-19391) > Flush metadata snapshot table on every write > > > Key: CASSANDRA-19391 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19391 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Low > Fix For: 5.x > > Attachments: ci_summary.html, result_details.tar.gz > > > We depend on the latest snapshot when starting up, flushing avoids gaps > between latest snapshot and the most recent local log entry -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19391) Flush metadata snapshot table on every write
[ https://issues.apache.org/jira/browse/CASSANDRA-19391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817452#comment-17817452 ] Marcus Eriksson commented on CASSANDRA-19391: - flushing showed that we couldn't really read the metadata_snapshots sstables due to the reversed longtoken partitioner we added in CASSANDRA-19189, so here we add a reverse ordered partitioner (for long keys) which calculates tokens by Long.MAX_VALUE - key. CI a bit shaky, but looks like unrelated failures, will rerun (includes both CASSANDRA-19390 and CASSANDRA-19391) > Flush metadata snapshot table on every write > > > Key: CASSANDRA-19391 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19391 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Low > Fix For: 5.x > > Attachments: ci_summary.html, result_details.tar.gz > > > We depend on the latest snapshot when starting up, flushing avoids gaps > between latest snapshot and the most recent local log entry -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19391) Flush metadata snapshot table on every write
[ https://issues.apache.org/jira/browse/CASSANDRA-19391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-19391: Attachment: ci_summary.html result_details.tar.gz > Flush metadata snapshot table on every write > > > Key: CASSANDRA-19391 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19391 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Low > Fix For: 5.x > > Attachments: ci_summary.html, result_details.tar.gz > > > We depend on the latest snapshot when starting up, flushing avoids gaps > between latest snapshot and the most recent local log entry -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRASC-104) Relocate Sidecar common classes in vertx-client-shaded
[ https://issues.apache.org/jira/browse/CASSANDRASC-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francisco Guerrero updated CASSANDRASC-104: --- Fix Version/s: 1.0 Source Control Link: https://github.com/apache/cassandra-sidecar/commit/b5570109c19acaf91281fd7901041c0c2b1f3b6c Resolution: Fixed Status: Resolved (was: Ready to Commit) > Relocate Sidecar common classes in vertx-client-shaded > -- > > Key: CASSANDRASC-104 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-104 > Project: Sidecar for Apache Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Francisco Guerrero >Assignee: Francisco Guerrero >Priority: Normal > Labels: pull-request-available > Fix For: 1.0 > > > It is desirable to relocate the common classes > {{org.apache.cassandra.sidecar.common.*}} in the {{vertx-client-shaded}} > subproject. The benefits are the following: > - Better isolation of the shared classes when loading them in downstream > projects (i.e Analytics) > - Avoids having two classes loaded in the same classpath, but with different > internal definition (for example when annotations are relocated but the class > itself is not) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra-sidecar) branch trunk updated: CASSANDRASC-104 Relocate Sidecar common classes in vertx-client-shaded
This is an automated email from the ASF dual-hosted git repository. frankgh pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra-sidecar.git The following commit(s) were added to refs/heads/trunk by this push: new b557010 CASSANDRASC-104 Relocate Sidecar common classes in vertx-client-shaded b557010 is described below commit b5570109c19acaf91281fd7901041c0c2b1f3b6c Author: Francisco Guerrero AuthorDate: Mon Feb 12 21:13:23 2024 -0800 CASSANDRASC-104 Relocate Sidecar common classes in vertx-client-shaded Patch by Francisco Guerrero; Reviewed by Yifan Cai for CASSANDRASC-104 --- CHANGES.txt | 1 + vertx-client-shaded/build.gradle | 1 + 2 files changed, 2 insertions(+) diff --git a/CHANGES.txt b/CHANGES.txt index c12d3ca..e1ac034 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,5 +1,6 @@ 1.0.0 - + * Relocate Sidecar common classes in vertx-client-shaded (CASSANDRASC-104) * Automated yaml type binding for deserialization (CASSANDRASC-103) * Upgrade Vert.x version in Sidecar to 4.5 (CASSANDRASC-101) * Break restore job into stage and import phases and persist restore slice status on phase completion (CASSANDRASC-99) diff --git a/vertx-client-shaded/build.gradle b/vertx-client-shaded/build.gradle index 189a82a..24519e8 100644 --- a/vertx-client-shaded/build.gradle +++ b/vertx-client-shaded/build.gradle @@ -69,6 +69,7 @@ shadowJar { archiveClassifier.set('') // Our use of Jackson should be an implementation detail - shade everything so no matter what // version of Jackson is available in the classpath we don't break consumers of the client +relocate 'org.apache.cassandra.sidecar.common', 'o.a.c.sidecar.client.shaded.common' relocate 'com.fasterxml.jackson', 'o.a.c.sidecar.client.shaded.com.fasterxml.jackson' relocate 'io.netty', 'o.a.c.sidecar.client.shaded.io.netty' relocate 'io.vertx', 'o.a.c.sidecar.client.shaded.io.vertx' - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRASC-104) Relocate Sidecar common classes in vertx-client-shaded
[ https://issues.apache.org/jira/browse/CASSANDRASC-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817451#comment-17817451 ] ASF subversion and git services commented on CASSANDRASC-104: - Commit b5570109c19acaf91281fd7901041c0c2b1f3b6c in cassandra-sidecar's branch refs/heads/trunk from Francisco Guerrero [ https://gitbox.apache.org/repos/asf?p=cassandra-sidecar.git;h=b557010 ] CASSANDRASC-104 Relocate Sidecar common classes in vertx-client-shaded Patch by Francisco Guerrero; Reviewed by Yifan Cai for CASSANDRASC-104 > Relocate Sidecar common classes in vertx-client-shaded > -- > > Key: CASSANDRASC-104 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-104 > Project: Sidecar for Apache Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Francisco Guerrero >Assignee: Francisco Guerrero >Priority: Normal > Labels: pull-request-available > > It is desirable to relocate the common classes > {{org.apache.cassandra.sidecar.common.*}} in the {{vertx-client-shaded}} > subproject. The benefits are the following: > - Better isolation of the shared classes when loading them in downstream > projects (i.e Analytics) > - Avoids having two classes loaded in the same classpath, but with different > internal definition (for example when annotations are relocated but the class > itself is not) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19390) Transformation.Kind should contain an explicit integer id
[ https://issues.apache.org/jira/browse/CASSANDRA-19390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817450#comment-17817450 ] Marcus Eriksson commented on CASSANDRA-19390: - a bit shaky ci results, a few timeouts etc, don't think any are related, but will rerun, both 19390+19391 in this run > Transformation.Kind should contain an explicit integer id > - > > Key: CASSANDRA-19390 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19390 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Low > Fix For: 5.x > > Attachments: ci_summary.html, result_details.tar.gz > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19390) Transformation.Kind should contain an explicit integer id
[ https://issues.apache.org/jira/browse/CASSANDRA-19390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-19390: Attachment: ci_summary.html result_details.tar.gz > Transformation.Kind should contain an explicit integer id > - > > Key: CASSANDRA-19390 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19390 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Low > Fix For: 5.x > > Attachments: ci_summary.html, result_details.tar.gz > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra-website) branch asf-staging updated (367c839bb -> 95d5d1e87)
This is an automated email from the ASF dual-hosted git repository. git-site-role pushed a change to branch asf-staging in repository https://gitbox.apache.org/repos/asf/cassandra-website.git discard 367c839bb generate docs for aa8a03c7 add c4b35db18 Minor release 4.1.4 new 95d5d1e87 generate docs for c4b35db1 This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (367c839bb) \ N -- N -- N refs/heads/asf-staging (95d5d1e87) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: content/_/download.html| 8 .../managing/configuration/cass_yaml_file.html | 3 ++- .../managing/configuration/cass_yaml_file.html | 3 ++- .../managing/configuration/cass_yaml_file.html | 3 ++- .../managing/configuration/cass_yaml_file.html | 3 ++- content/search-index.js| 2 +- .../source/modules/ROOT/pages/download.adoc| 8 site-ui/build/ui-bundle.zip| Bin 4883646 -> 4883646 bytes 8 files changed, 17 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19285 Fix flaky Host replacement tests and shrink tests [cassandra-analytics]
yifan-c closed pull request #39: CASSANDRA-19285 Fix flaky Host replacement tests and shrink tests URL: https://github.com/apache/cassandra-analytics/pull/39 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra-website) branch trunk updated: Minor release 4.1.4
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra-website.git The following commit(s) were added to refs/heads/trunk by this push: new c4b35db18 Minor release 4.1.4 c4b35db18 is described below commit c4b35db1813a7f6b2e6e7021c42e2dde44e66b3b Author: Brandon Williams AuthorDate: Wed Feb 14 10:29:13 2024 -0600 Minor release 4.1.4 ref: https://lists.apache.org/thread/r42ksoxt4kqfoxcok9r0pjy11w1lmd3l --- site-content/source/modules/ROOT/pages/download.adoc | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/site-content/source/modules/ROOT/pages/download.adoc b/site-content/source/modules/ROOT/pages/download.adoc index b7d35ee2d..17ebab8fb 100644 --- a/site-content/source/modules/ROOT/pages/download.adoc +++ b/site-content/source/modules/ROOT/pages/download.adoc @@ -36,15 +36,15 @@ https://www.apache.org/dyn/closer.lua/cassandra/5.0-beta1/apache-cassandra-5.0-b [discrete] Apache Cassandra 4.1 [discrete] - Latest release on 2023-07-24 + Latest release on 2024-02-14 [discrete] Maintained until 5.2.0 release (~July 2025) [.btn.btn--alt] -https://www.apache.org/dyn/closer.lua/cassandra/4.1.3/apache-cassandra-4.1.3-bin.tar.gz[4.1.3,window=blank] +https://www.apache.org/dyn/closer.lua/cassandra/4.1.4/apache-cassandra-4.1.4-bin.tar.gz[4.1.4,window=blank] -(https://downloads.apache.org/cassandra/4.1.3/apache-cassandra-4.1.3-bin.tar.gz.asc[pgp,window=blank], https://downloads.apache.org/cassandra/4.1.3/apache-cassandra-4.1.3-bin.tar.gz.sha256[sha256,window=blank], https://downloads.apache.org/cassandra/4.1.3/apache-cassandra-4.1.3-bin.tar.gz.sha512[sha512,window=blank]) + -(https://www.apache.org/dyn/closer.lua/cassandra/4.1.3/apache-cassandra-4.1.3-src.tar.gz[source,window=blank]: https://downloads.apache.org/cassandra/4.1.3/apache-cassandra-4.1.3-src.tar.gz.asc[pgp,window=blank], https://downloads.apache.org/cassandra/4.1.3/apache-cassandra-4.1.3-src.tar.gz.sha256[sha256,window=blank], https://downloads.apache.org/cassandra/4.1.3/apache-cassandra-4.1.3-src.tar.gz.sha512[sha512,window=blank]) +(https://downloads.apache.org/cassandra/4.1.4/apache-cassandra-4.1.4-bin.tar.gz.asc[pgp,window=blank], https://downloads.apache.org/cassandra/4.1.4/apache-cassandra-4.1.4-bin.tar.gz.sha256[sha256,window=blank], https://downloads.apache.org/cassandra/4.1.4/apache-cassandra-4.1.4-bin.tar.gz.sha512[sha512,window=blank]) + +(https://www.apache.org/dyn/closer.lua/cassandra/4.1.4/apache-cassandra-4.1.4-src.tar.gz[source,window=blank]: https://downloads.apache.org/cassandra/4.1.4/apache-cassandra-4.1.4-src.tar.gz.asc[pgp,window=blank], https://downloads.apache.org/cassandra/4.1.4/apache-cassandra-4.1.4-src.tar.gz.sha256[sha256,window=blank], https://downloads.apache.org/cassandra/4.1.4/apache-cassandra-4.1.4-src.tar.gz.sha512[sha512,window=blank]) -- [openblock, inline50 inline-top] - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
svn commit: r67347 - /release/cassandra/4.1.4/redhat/
Author: brandonwilliams Date: Wed Feb 14 16:27:11 2024 New Revision: 67347 Log: Apache Cassandra 4.1.4 redhat artifacts Removed: release/cassandra/4.1.4/redhat/ - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
svn commit: r67346 - /release/cassandra/4.1.4/debian/
Author: brandonwilliams Date: Wed Feb 14 16:23:48 2024 New Revision: 67346 Log: Apache Cassandra 4.1.4 debian artifacts Removed: release/cassandra/4.1.4/debian/ - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
svn commit: r67344 - /dev/cassandra/4.1.4/ /release/cassandra/4.1.4/
Author: brandonwilliams Date: Wed Feb 14 16:20:38 2024 New Revision: 67344 Log: Apache Cassandra 4.1.4 release Added: release/cassandra/4.1.4/ - copied from r67343, dev/cassandra/4.1.4/ Removed: dev/cassandra/4.1.4/ - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) tag 4.1.4-tentative deleted (was 99d9faeef5)
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a change to tag 4.1.4-tentative in repository https://gitbox.apache.org/repos/asf/cassandra.git *** WARNING: tag 4.1.4-tentative was deleted! *** was 99d9faeef5 Prepare debian changelog for 4.1.4 The revisions that were on this tag are still contained in other references; therefore, this change does not discard any commits from the repository. - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) annotated tag cassandra-4.1.4 created (now e7c2a5c1cb)
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a change to annotated tag cassandra-4.1.4 in repository https://gitbox.apache.org/repos/asf/cassandra.git at e7c2a5c1cb (tag) tagging 99d9faeef57c9cf5240d11eac9db5b283e45a4f9 (commit) replaces cassandra-4.0.12 by Brandon Williams on Wed Feb 14 10:20:25 2024 -0600 - Log - Apache Cassandra 4.1.4 release --- No new revisions were added by this update. - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19398) Test Failure: org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading
[ https://issues.apache.org/jira/browse/CASSANDRA-19398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-19398: Fix Version/s: 5.0.x > Test Failure: > org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading > -- > > Key: CASSANDRA-19398 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19398 > Project: Cassandra > Issue Type: Bug >Reporter: Ekaterina Dimitrova >Priority: Normal > Fix For: 5.0.x > > > [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2646/workflows/bc2bba74-9e56-4bea-8de7-4ff840c4f450/jobs/56028/tests#failed-test-0] > {code:java} > junit.framework.AssertionFailedError at > org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading(UpgradeSSTablesTest.java:220) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19398) Test Failure: org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading
Ekaterina Dimitrova created CASSANDRA-19398: --- Summary: Test Failure: org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading Key: CASSANDRA-19398 URL: https://issues.apache.org/jira/browse/CASSANDRA-19398 Project: Cassandra Issue Type: Bug Reporter: Ekaterina Dimitrova [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2646/workflows/bc2bba74-9e56-4bea-8de7-4ff840c4f450/jobs/56028/tests#failed-test-0] {code:java} junit.framework.AssertionFailedError at org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading(UpgradeSSTablesTest.java:220) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-19395) Warn when native_transport_port_ssl is set
[ https://issues.apache.org/jira/browse/CASSANDRA-19395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic reassigned CASSANDRA-19395: - Assignee: Stefan Miklosovic > Warn when native_transport_port_ssl is set > -- > > Key: CASSANDRA-19395 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19395 > Project: Cassandra > Issue Type: Bug > Components: Legacy/CQL >Reporter: Brandon Williams >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x > > > In CASSANDRA-19392 this was deprecated, however Stefan notes that if you set > this it will work in a single node cluster because the peers table isn't > needed to distribute the information. This sounds like a recipe for "this > worked when we tested in development, but not in production" so it would be > good to warn users when this is set to avoid future confusion. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19397) Remove all code around native_transport_port_ssl
[ https://issues.apache.org/jira/browse/CASSANDRA-19397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-19397: -- Change Category: Code Clarity Complexity: Normal Fix Version/s: 5.x Status: Open (was: Triage Needed) > Remove all code around native_transport_port_ssl > > > Key: CASSANDRA-19397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19397 > Project: Cassandra > Issue Type: Task > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > > We deprecated native_transport_port_ssl in CASSANDRA-19392 and we told we go > to remove it next. This ticket is about that removal. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19397) Remove all code around native_transport_port_ssl
Stefan Miklosovic created CASSANDRA-19397: - Summary: Remove all code around native_transport_port_ssl Key: CASSANDRA-19397 URL: https://issues.apache.org/jira/browse/CASSANDRA-19397 Project: Cassandra Issue Type: Task Components: Legacy/Core Reporter: Stefan Miklosovic Assignee: Stefan Miklosovic We deprecated native_transport_port_ssl in CASSANDRA-19392 and we told we go to remove it next. This ticket is about that removal. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817421#comment-17817421 ] Brad Schoening edited comment on CASSANDRA-18762 at 2/14/24 3:38 PM: - It seems setting -XX:MaxDirectMemorySize might be useful to prevent this. In [Java 17|https://docs.oracle.com/en/java/javase/17/docs/specs/man/java.html], the JVM picks something based on some opaque heuristic: {quote}By default, the size (MaxDirectMemorySize) is set to 0, meaning that the JVM chooses the size for NIO direct-buffer allocations automatically. {quote} was (Author: bschoeni): It seems setting -XX:MaxDirectMemorySize might be useful to prevent this. > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Attachments: Cluster-dm-metrics-1.PNG, > image-2023-12-06-15-28-05-459.png, image-2023-12-06-15-29-31-491.png, > image-2023-12-06-15-58-55-007.png > > > We are seeing repeated failures of nodes with 16GB of heap on a VM with 32GB > of physical RAM due to direct memory. This seems to be related to > CASSANDRA-15202 which moved Merkel trees off-heap in 4.0. Using Cassandra > 4.0.6 with Java 11. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at >
[jira] [Comment Edited] (CASSANDRA-19394) Rethink dumping of cluster metadata via CMSOperationsMBean
[ https://issues.apache.org/jira/browse/CASSANDRA-19394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817423#comment-17817423 ] Stefan Miklosovic edited comment on CASSANDRA-19394 at 2/14/24 3:25 PM: What I am afraid of is that we just have assumptions how this is going to be used and if you guys think that "this is escape hatch not meant to be abused", well, good for you, but this is going to be misused / abused. People forget stuff etc ... these dumps will be just rotting there until that node is restarted again. was (Author: smiklosovic): What I am afraid of is that we just have assumptions how this is going to be used and if you guys think that "this is escape hatch not meant to be abused", well, good for you, but this is going to be misused / abused. People forget stuff etc ... these dump will be just rotting there until that node is restarted again. > Rethink dumping of cluster metadata via CMSOperationsMBean > -- > > Key: CASSANDRA-19394 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19394 > Project: Cassandra > Issue Type: Improvement > Components: Tool/nodetool, Transactional Cluster Metadata >Reporter: Stefan Miklosovic >Priority: Normal > > I think there are two problems in the implementation of dumping > ClusterMetadata in CMSOperationsMBean > 1) A dump is saved in a file and dumpClusterMetadata methods will return just > a file name where that dump is. However, nodetool / JMX call to MBean (or any > place this method is invoked from, we would like to offer a command in > nodetool which returns the dump) is meant to be used from anywhere, remotely, > so what happens when we execute nodetool or call these methods on a machine > different from a machine a node runs on? E.g. admins can just have some > jumpbox to a cluster they manage, they do not necessarily have access to > nodes themselves. So they would not be able to read it. > 2) It creates temp file which is not deleted so /tmp will be populated with > these dumps until node is turned off which might take a lot of time and can > consume a lot of disk space if dumps are done frequently and they are big. An > adversary might just dump cluster metadata until no disk space is left. > What I propose is that we would return all dump string, not just a filename > where we save it. We can also format the output on the client or we can tell > server what format we want the dump to be returned in. > If there is a concern about size of data to be returned, we might optionally > allow dumps to be returned as compressed by simple zipping on server and > unzipping on client where "zipper" is a standard java.util.zip so it > basically doesn't matter what jvm runs on client and server. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19394) Rethink dumping of cluster metadata via CMSOperationsMBean
[ https://issues.apache.org/jira/browse/CASSANDRA-19394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817423#comment-17817423 ] Stefan Miklosovic commented on CASSANDRA-19394: --- What I am afraid of is that we just have assumptions how this is going to be used and if you guys think that "this is escape hatch not meant to be abused", well, good for you, but this is going to be misused / abused. People forget stuff etc ... these dump will be just rotting there until that node is restarted again. > Rethink dumping of cluster metadata via CMSOperationsMBean > -- > > Key: CASSANDRA-19394 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19394 > Project: Cassandra > Issue Type: Improvement > Components: Tool/nodetool, Transactional Cluster Metadata >Reporter: Stefan Miklosovic >Priority: Normal > > I think there are two problems in the implementation of dumping > ClusterMetadata in CMSOperationsMBean > 1) A dump is saved in a file and dumpClusterMetadata methods will return just > a file name where that dump is. However, nodetool / JMX call to MBean (or any > place this method is invoked from, we would like to offer a command in > nodetool which returns the dump) is meant to be used from anywhere, remotely, > so what happens when we execute nodetool or call these methods on a machine > different from a machine a node runs on? E.g. admins can just have some > jumpbox to a cluster they manage, they do not necessarily have access to > nodes themselves. So they would not be able to read it. > 2) It creates temp file which is not deleted so /tmp will be populated with > these dumps until node is turned off which might take a lot of time and can > consume a lot of disk space if dumps are done frequently and they are big. An > adversary might just dump cluster metadata until no disk space is left. > What I propose is that we would return all dump string, not just a filename > where we save it. We can also format the output on the client or we can tell > server what format we want the dump to be returned in. > If there is a concern about size of data to be returned, we might optionally > allow dumps to be returned as compressed by simple zipping on server and > unzipping on client where "zipper" is a standard java.util.zip so it > basically doesn't matter what jvm runs on client and server. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19394) Rethink dumping of cluster metadata via CMSOperationsMBean
[ https://issues.apache.org/jira/browse/CASSANDRA-19394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817422#comment-17817422 ] Sam Tunnicliffe commented on CASSANDRA-19394: - A simple visual representation of the current metadata, without any ability or expectation to be able to parse, roundtrip, or pipe it into tooling would be very useful right now for debugging and development, but that's a totally different use case than what the current \{{CMSOperations::dumpClusterMetadata}} is for. > Rethink dumping of cluster metadata via CMSOperationsMBean > -- > > Key: CASSANDRA-19394 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19394 > Project: Cassandra > Issue Type: Improvement > Components: Tool/nodetool, Transactional Cluster Metadata >Reporter: Stefan Miklosovic >Priority: Normal > > I think there are two problems in the implementation of dumping > ClusterMetadata in CMSOperationsMBean > 1) A dump is saved in a file and dumpClusterMetadata methods will return just > a file name where that dump is. However, nodetool / JMX call to MBean (or any > place this method is invoked from, we would like to offer a command in > nodetool which returns the dump) is meant to be used from anywhere, remotely, > so what happens when we execute nodetool or call these methods on a machine > different from a machine a node runs on? E.g. admins can just have some > jumpbox to a cluster they manage, they do not necessarily have access to > nodes themselves. So they would not be able to read it. > 2) It creates temp file which is not deleted so /tmp will be populated with > these dumps until node is turned off which might take a lot of time and can > consume a lot of disk space if dumps are done frequently and they are big. An > adversary might just dump cluster metadata until no disk space is left. > What I propose is that we would return all dump string, not just a filename > where we save it. We can also format the output on the client or we can tell > server what format we want the dump to be returned in. > If there is a concern about size of data to be returned, we might optionally > allow dumps to be returned as compressed by simple zipping on server and > unzipping on client where "zipper" is a standard java.util.zip so it > basically doesn't matter what jvm runs on client and server. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-19393) nodetool: group CMS-related commands into one command
[ https://issues.apache.org/jira/browse/CASSANDRA-19393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe reassigned CASSANDRA-19393: --- Assignee: Sam Tunnicliffe (was: n.v.harikrishna) > nodetool: group CMS-related commands into one command > - > > Key: CASSANDRA-19393 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19393 > Project: Cassandra > Issue Type: Improvement > Components: Tool/nodetool, Transactional Cluster Metadata >Reporter: n.v.harikrishna >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > The purpose of this ticket is to group all CMS-related commands under one > "nodetool cms" command where existing command would be subcommands of it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-19393) nodetool: group CMS-related commands into one command
[ https://issues.apache.org/jira/browse/CASSANDRA-19393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe reassigned CASSANDRA-19393: --- Assignee: n.v.harikrishna (was: Sam Tunnicliffe) > nodetool: group CMS-related commands into one command > - > > Key: CASSANDRA-19393 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19393 > Project: Cassandra > Issue Type: Improvement > Components: Tool/nodetool, Transactional Cluster Metadata >Reporter: n.v.harikrishna >Assignee: n.v.harikrishna >Priority: Normal > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > The purpose of this ticket is to group all CMS-related commands under one > "nodetool cms" command where existing command would be subcommands of it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19392) deprecate dual ports support (native_transport_port_ssl)
[ https://issues.apache.org/jira/browse/CASSANDRA-19392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-19392: -- Status: Review In Progress (was: Needs Committer) > deprecate dual ports support (native_transport_port_ssl) > - > > Key: CASSANDRA-19392 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19392 > Project: Cassandra > Issue Type: Task > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.0-beta > > Time Spent: 1h 40m > Remaining Estimate: 0h > > We decided (1) to deprecate dual ports support in 5.0 (and eventually remove > it in trunk). This ticket will track the work towards the deprecation for 5.0. > (1) https://lists.apache.org/thread/dow196gspwgp2og576zh3lotvt6mc3lv -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19392) deprecate dual ports support (native_transport_port_ssl)
[ https://issues.apache.org/jira/browse/CASSANDRA-19392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-19392: -- Fix Version/s: 5.0-beta2 5.1 (was: 5.0-beta) Source Control Link: https://github.com/apache/cassandra/commit/8b037a6c846402296a2984eb1fbbdd441bdece19 Resolution: Fixed Status: Resolved (was: Ready to Commit) > deprecate dual ports support (native_transport_port_ssl) > - > > Key: CASSANDRA-19392 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19392 > Project: Cassandra > Issue Type: Task > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.0-beta2, 5.1 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > We decided (1) to deprecate dual ports support in 5.0 (and eventually remove > it in trunk). This ticket will track the work towards the deprecation for 5.0. > (1) https://lists.apache.org/thread/dow196gspwgp2og576zh3lotvt6mc3lv -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817421#comment-17817421 ] Brad Schoening commented on CASSANDRA-18762: It seems setting -XX:MaxDirectMemorySize might be useful to prevent this. > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Attachments: Cluster-dm-metrics-1.PNG, > image-2023-12-06-15-28-05-459.png, image-2023-12-06-15-29-31-491.png, > image-2023-12-06-15-58-55-007.png > > > We are seeing repeated failures of nodes with 16GB of heap on a VM with 32GB > of physical RAM due to direct memory. This seems to be related to > CASSANDRA-15202 which moved Merkel trees off-heap in 4.0. Using Cassandra > 4.0.6 with Java 11. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is > done here{noformat} > > -XX:+AlwaysPreTouch > -XX:+CrashOnOutOfMemoryError > -XX:+ExitOnOutOfMemoryError > -XX:+HeapDumpOnOutOfMemoryError > -XX:+ParallelRefProcEnabled > -XX:+PerfDisableSharedMem > -XX:+ResizeTLAB > -XX:+UseG1GC > -XX:+UseNUMA > -XX:+UseTLAB >
[jira] [Updated] (CASSANDRA-19392) deprecate dual ports support (native_transport_port_ssl)
[ https://issues.apache.org/jira/browse/CASSANDRA-19392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-19392: -- Status: Ready to Commit (was: Review In Progress) > deprecate dual ports support (native_transport_port_ssl) > - > > Key: CASSANDRA-19392 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19392 > Project: Cassandra > Issue Type: Task > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.0-beta > > Time Spent: 1h 40m > Remaining Estimate: 0h > > We decided (1) to deprecate dual ports support in 5.0 (and eventually remove > it in trunk). This ticket will track the work towards the deprecation for 5.0. > (1) https://lists.apache.org/thread/dow196gspwgp2og576zh3lotvt6mc3lv -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) branch cassandra-5.0 updated (69f735d61f -> 8b037a6c84)
This is an automated email from the ASF dual-hosted git repository. smiklosovic pushed a change to branch cassandra-5.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git from 69f735d61f Update packaging shell includes for j17 add 8b037a6c84 Deprecate native_transport_port_ssl No new revisions were added by this update. Summary of changes: CHANGES.txt | 1 + NEWS.txt | 4 conf/cassandra.yaml | 1 + src/java/org/apache/cassandra/config/Config.java | 2 ++ .../org/apache/cassandra/config/DatabaseDescriptor.java | 16 5 files changed, 20 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) 01/01: Merge branch 'cassandra-5.0' into trunk
This is an automated email from the ASF dual-hosted git repository. smiklosovic pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git commit 8bdf2615bcca6eacc7fd9debc7a68a917048df83 Merge: 3acec3c28e 8b037a6c84 Author: Stefan Miklosovic AuthorDate: Wed Feb 14 15:53:38 2024 +0100 Merge branch 'cassandra-5.0' into trunk CHANGES.txt | 1 + NEWS.txt | 6 ++ conf/cassandra.yaml | 1 + src/java/org/apache/cassandra/config/Config.java | 2 ++ .../org/apache/cassandra/config/DatabaseDescriptor.java | 16 5 files changed, 22 insertions(+), 4 deletions(-) diff --cc CHANGES.txt index d73539808e,30413804a5..d470d8f813 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,22 -1,5 +1,23 @@@ -5.0-beta2 +5.1 + * Make nodetool reconfigurecms sync by default and add --cancel to be able to cancel ongoing reconfigurations (CASSANDRA-19216) + * Expose auth mode in system_views.clients, nodetool clientstats, metrics (CASSANDRA-19366) + * Remove sealed_periods and last_sealed_period tables (CASSANDRA-19189) + * Improve setup and initialisation of LocalLog/LogSpec (CASSANDRA-19271) + * Refactor structure of caching metrics and expose auth cache metrics via JMX (CASSANDRA-17062) + * Allow CQL client certificate authentication to work without sending an AUTHENTICATE request (CASSANDRA-18857) + * Extend nodetool tpstats and system_views.thread_pools with detailed pool parameters (CASSANDRA-19289) + * Remove dependency on Sigar in favor of OSHI (CASSANDRA-16565) + * Simplify the bind marker and Term logic (CASSANDRA-18813) + * Limit cassandra startup to supported JDKs, allow higher JDKs by setting CASSANDRA_JDK_UNSUPPORTED (CASSANDRA-18688) + * Standardize nodetool tablestats formatting of data units (CASSANDRA-19104) + * Make nodetool tablestats use number of significant digits for time and average values consistently (CASSANDRA-19015) + * Upgrade jackson to 2.15.3 and snakeyaml to 2.1 (CASSANDRA-18875) + * Transactional Cluster Metadata [CEP-21] (CASSANDRA-18330) + * Add ELAPSED command to cqlsh (CASSANDRA-18861) + * Add the ability to disable bulk loading of SSTables (CASSANDRA-18781) + * Clean up obsolete functions and simplify cql_version handling in cqlsh (CASSANDRA-18787) +Merged from 5.0: + * Deprecate native_transport_port_ssl (CASSANDRA-19392) * Update packaging shell includes (CASSANDRA-19283) * Fix data corruption in VectorCodec when using heap buffers (CASSANDRA-19167) * Avoid over-skipping of key iterators from static column indexes during mixed intersections (CASSANDRA-19278) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) branch trunk updated (3acec3c28e -> 8bdf2615bc)
This is an automated email from the ASF dual-hosted git repository. smiklosovic pushed a change to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git from 3acec3c28e Make nodetool reconfigurecms sync by default and add --cancel to be able to cancel ongoing reconfigurations add 8b037a6c84 Deprecate native_transport_port_ssl new 8bdf2615bc Merge branch 'cassandra-5.0' into trunk The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: CHANGES.txt | 1 + NEWS.txt | 6 ++ conf/cassandra.yaml | 1 + src/java/org/apache/cassandra/config/Config.java | 2 ++ .../org/apache/cassandra/config/DatabaseDescriptor.java | 16 5 files changed, 22 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brad Schoening updated CASSANDRA-18762: --- Description: We are seeing repeated failures of nodes with 16GB of heap on a VM with 32GB of physical RAM due to direct memory with Java 11. This seems to be related to CASSANDRA-15202 which moved merkel trees off-heap in 4.0. Using Cassandra 4.0.6. {noformat} 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 ip_address=169.0.0.1 RepairSession.java:202 - [repair #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from /169.102.200.241:7000 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 ip_address=169.0.0.1 RepairSession.java:202 - [repair #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from /169.93.192.29:7000 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 ip_address=169.0.0.1 RepairSession.java:202 - [repair #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from /169.104.171.134:7000 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 ip_address=169.0.0.1 RepairSession.java:202 - [repair #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from /169.79.232.67:7000 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; Metaspace: 80411136 -> 80176528 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Direct buffer memory at java.base/java.nio.Bits.reserveMemory(Bits.java:175) at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) at org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) at org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) at org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) at org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) at org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) at org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) at org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) at org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) at org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is done here{noformat} -XX:+AlwaysPreTouch -XX:+CrashOnOutOfMemoryError -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:+ParallelRefProcEnabled -XX:+PerfDisableSharedMem -XX:+ResizeTLAB -XX:+UseG1GC -XX:+UseNUMA -XX:+UseTLAB -XX:+UseThreadPriorities -XX:-UseBiasedLocking -XX:CompileCommandFile=/opt/nosql/clusters/cassandra-101/conf/hotspot_compiler -XX:G1RSetUpdatingPauseTimePercent=5 -XX:G1ReservePercent=20 -XX:HeapDumpPath=/opt/nosql/data/cluster_101/cassandra-1691623098-pid2804737.hprof -XX:InitiatingHeapOccupancyPercent=70 -XX:MaxGCPauseMillis=200 -XX:StringTableSize=60013 -Xlog:gc*:file=/opt/nosql/clusters/cassandra-101/logs/gc.log:time,uptime:filecount=10,filesize=10485760 -Xms16G -Xmx16G -Xss256k >From our Prometheus metrics, the behavior shows the direct buffer memory >ramping up until it reaches the max and then causes an OOM. It would appear >that direct memory is never being released by the JVM until its exhausted. !Cluster-dm-metrics.PNG! An Eclipse Memory Analyzer Class Histogram: ||Class Name||Objects||Shallow Heap||Retained Heap||
[jira] [Updated] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brad Schoening updated CASSANDRA-18762: --- Description: We are seeing repeated failures of nodes with 16GB of heap on a VM with 32GB of physical RAM due to direct memory. This seems to be related to CASSANDRA-15202 which moved Merkel trees off-heap in 4.0. Using Cassandra 4.0.6 with Java 11. {noformat} 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 ip_address=169.0.0.1 RepairSession.java:202 - [repair #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from /169.102.200.241:7000 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 ip_address=169.0.0.1 RepairSession.java:202 - [repair #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from /169.93.192.29:7000 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 ip_address=169.0.0.1 RepairSession.java:202 - [repair #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from /169.104.171.134:7000 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 ip_address=169.0.0.1 RepairSession.java:202 - [repair #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from /169.79.232.67:7000 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; Metaspace: 80411136 -> 80176528 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Direct buffer memory at java.base/java.nio.Bits.reserveMemory(Bits.java:175) at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) at org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) at org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) at org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) at org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) at org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) at org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) at org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) at org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) at org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is done here{noformat} -XX:+AlwaysPreTouch -XX:+CrashOnOutOfMemoryError -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:+ParallelRefProcEnabled -XX:+PerfDisableSharedMem -XX:+ResizeTLAB -XX:+UseG1GC -XX:+UseNUMA -XX:+UseTLAB -XX:+UseThreadPriorities -XX:-UseBiasedLocking -XX:CompileCommandFile=/opt/nosql/clusters/cassandra-101/conf/hotspot_compiler -XX:G1RSetUpdatingPauseTimePercent=5 -XX:G1ReservePercent=20 -XX:HeapDumpPath=/opt/nosql/data/cluster_101/cassandra-1691623098-pid2804737.hprof -XX:InitiatingHeapOccupancyPercent=70 -XX:MaxGCPauseMillis=200 -XX:StringTableSize=60013 -Xlog:gc*:file=/opt/nosql/clusters/cassandra-101/logs/gc.log:time,uptime:filecount=10,filesize=10485760 -Xms16G -Xmx16G -Xss256k >From our Prometheus metrics, the behavior shows the direct buffer memory >ramping up until it reaches the max and then causes an OOM. It would appear >that direct memory is never being released by the JVM until its exhausted. !Cluster-dm-metrics.PNG! An Eclipse Memory Analyzer Class Histogram: ||Class Name||Objects||Shallow Heap||Retained Heap||
[jira] [Commented] (CASSANDRA-19394) Rethink dumping of cluster metadata via CMSOperationsMBean
[ https://issues.apache.org/jira/browse/CASSANDRA-19394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817419#comment-17817419 ] Stefan Miklosovic commented on CASSANDRA-19394: --- Oh yeah, that is wrong too :D > Rethink dumping of cluster metadata via CMSOperationsMBean > -- > > Key: CASSANDRA-19394 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19394 > Project: Cassandra > Issue Type: Improvement > Components: Tool/nodetool, Transactional Cluster Metadata >Reporter: Stefan Miklosovic >Priority: Normal > > I think there are two problems in the implementation of dumping > ClusterMetadata in CMSOperationsMBean > 1) A dump is saved in a file and dumpClusterMetadata methods will return just > a file name where that dump is. However, nodetool / JMX call to MBean (or any > place this method is invoked from, we would like to offer a command in > nodetool which returns the dump) is meant to be used from anywhere, remotely, > so what happens when we execute nodetool or call these methods on a machine > different from a machine a node runs on? E.g. admins can just have some > jumpbox to a cluster they manage, they do not necessarily have access to > nodes themselves. So they would not be able to read it. > 2) It creates temp file which is not deleted so /tmp will be populated with > these dumps until node is turned off which might take a lot of time and can > consume a lot of disk space if dumps are done frequently and they are big. An > adversary might just dump cluster metadata until no disk space is left. > What I propose is that we would return all dump string, not just a filename > where we save it. We can also format the output on the client or we can tell > server what format we want the dump to be returned in. > If there is a concern about size of data to be returned, we might optionally > allow dumps to be returned as compressed by simple zipping on server and > unzipping on client where "zipper" is a standard java.util.zip so it > basically doesn't matter what jvm runs on client and server. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19394) Rethink dumping of cluster metadata via CMSOperationsMBean
[ https://issues.apache.org/jira/browse/CASSANDRA-19394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817415#comment-17817415 ] Stefan Miklosovic commented on CASSANDRA-19394: --- [~samt] what's wrong with nodetool cms dump > /tmp/dump.txt executed locally? The benefit is that we can also inspect and display it remotely for diagnostic purposes etc. > Rethink dumping of cluster metadata via CMSOperationsMBean > -- > > Key: CASSANDRA-19394 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19394 > Project: Cassandra > Issue Type: Improvement > Components: Tool/nodetool, Transactional Cluster Metadata >Reporter: Stefan Miklosovic >Priority: Normal > > I think there are two problems in the implementation of dumping > ClusterMetadata in CMSOperationsMBean > 1) A dump is saved in a file and dumpClusterMetadata methods will return just > a file name where that dump is. However, nodetool / JMX call to MBean (or any > place this method is invoked from, we would like to offer a command in > nodetool which returns the dump) is meant to be used from anywhere, remotely, > so what happens when we execute nodetool or call these methods on a machine > different from a machine a node runs on? E.g. admins can just have some > jumpbox to a cluster they manage, they do not necessarily have access to > nodes themselves. So they would not be able to read it. > 2) It creates temp file which is not deleted so /tmp will be populated with > these dumps until node is turned off which might take a lot of time and can > consume a lot of disk space if dumps are done frequently and they are big. An > adversary might just dump cluster metadata until no disk space is left. > What I propose is that we would return all dump string, not just a filename > where we save it. We can also format the output on the client or we can tell > server what format we want the dump to be returned in. > If there is a concern about size of data to be returned, we might optionally > allow dumps to be returned as compressed by simple zipping on server and > unzipping on client where "zipper" is a standard java.util.zip so it > basically doesn't matter what jvm runs on client and server. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19394) Rethink dumping of cluster metadata via CMSOperationsMBean
[ https://issues.apache.org/jira/browse/CASSANDRA-19394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817416#comment-17817416 ] Brandon Williams commented on CASSANDRA-19394: -- I'll just note there is precedence (for better or worse) for JMX dumping to local files: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/gms/FailureDetector.java#L272 > Rethink dumping of cluster metadata via CMSOperationsMBean > -- > > Key: CASSANDRA-19394 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19394 > Project: Cassandra > Issue Type: Improvement > Components: Tool/nodetool, Transactional Cluster Metadata >Reporter: Stefan Miklosovic >Priority: Normal > > I think there are two problems in the implementation of dumping > ClusterMetadata in CMSOperationsMBean > 1) A dump is saved in a file and dumpClusterMetadata methods will return just > a file name where that dump is. However, nodetool / JMX call to MBean (or any > place this method is invoked from, we would like to offer a command in > nodetool which returns the dump) is meant to be used from anywhere, remotely, > so what happens when we execute nodetool or call these methods on a machine > different from a machine a node runs on? E.g. admins can just have some > jumpbox to a cluster they manage, they do not necessarily have access to > nodes themselves. So they would not be able to read it. > 2) It creates temp file which is not deleted so /tmp will be populated with > these dumps until node is turned off which might take a lot of time and can > consume a lot of disk space if dumps are done frequently and they are big. An > adversary might just dump cluster metadata until no disk space is left. > What I propose is that we would return all dump string, not just a filename > where we save it. We can also format the output on the client or we can tell > server what format we want the dump to be returned in. > If there is a concern about size of data to be returned, we might optionally > allow dumps to be returned as compressed by simple zipping on server and > unzipping on client where "zipper" is a standard java.util.zip so it > basically doesn't matter what jvm runs on client and server. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19394) Rethink dumping of cluster metadata via CMSOperationsMBean
[ https://issues.apache.org/jira/browse/CASSANDRA-19394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817414#comment-17817414 ] Sam Tunnicliffe commented on CASSANDRA-19394: - Part of the benefit of dumping only to a binary format is precisely that it is opaque and has a very limited set of uses. For now these include reloading a binary dump into a new or existing cluster (e.g. for DR, debugging or cloning purposes), or writing low level custom code to explore and modify the metadata. Like Marcus said, this is really intended as an escape hatch for when (if) things go catastrophically wrong and I agree with him that we should not change this yet. {quote}consume a lot of disk space if dumps are done frequently and they are big. {quote} Dump files are current pretty tiny, even for clusters with many members and large schema. {quote}An adversary might just dump cluster metadata until no disk space is left. {quote} Nodetool / JMX should be properly secured to prevent this. An adversary could simply run {{nodetool assassinate}} if they had access. > Rethink dumping of cluster metadata via CMSOperationsMBean > -- > > Key: CASSANDRA-19394 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19394 > Project: Cassandra > Issue Type: Improvement > Components: Tool/nodetool, Transactional Cluster Metadata >Reporter: Stefan Miklosovic >Priority: Normal > > I think there are two problems in the implementation of dumping > ClusterMetadata in CMSOperationsMBean > 1) A dump is saved in a file and dumpClusterMetadata methods will return just > a file name where that dump is. However, nodetool / JMX call to MBean (or any > place this method is invoked from, we would like to offer a command in > nodetool which returns the dump) is meant to be used from anywhere, remotely, > so what happens when we execute nodetool or call these methods on a machine > different from a machine a node runs on? E.g. admins can just have some > jumpbox to a cluster they manage, they do not necessarily have access to > nodes themselves. So they would not be able to read it. > 2) It creates temp file which is not deleted so /tmp will be populated with > these dumps until node is turned off which might take a lot of time and can > consume a lot of disk space if dumps are done frequently and they are big. An > adversary might just dump cluster metadata until no disk space is left. > What I propose is that we would return all dump string, not just a filename > where we save it. We can also format the output on the client or we can tell > server what format we want the dump to be returned in. > If there is a concern about size of data to be returned, we might optionally > allow dumps to be returned as compressed by simple zipping on server and > unzipping on client where "zipper" is a standard java.util.zip so it > basically doesn't matter what jvm runs on client and server. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19394) Rethink dumping of cluster metadata via CMSOperationsMBean
[ https://issues.apache.org/jira/browse/CASSANDRA-19394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817413#comment-17817413 ] Stefan Miklosovic commented on CASSANDRA-19394: --- Also, if it is meant to be run locally, an operator can just do nodetool cms dump > /tmp/dump.txt on the very same machine a node runs at? I just dont see why it has to be persisted into /tmp by Cassandra. > Rethink dumping of cluster metadata via CMSOperationsMBean > -- > > Key: CASSANDRA-19394 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19394 > Project: Cassandra > Issue Type: Improvement > Components: Tool/nodetool, Transactional Cluster Metadata >Reporter: Stefan Miklosovic >Priority: Normal > > I think there are two problems in the implementation of dumping > ClusterMetadata in CMSOperationsMBean > 1) A dump is saved in a file and dumpClusterMetadata methods will return just > a file name where that dump is. However, nodetool / JMX call to MBean (or any > place this method is invoked from, we would like to offer a command in > nodetool which returns the dump) is meant to be used from anywhere, remotely, > so what happens when we execute nodetool or call these methods on a machine > different from a machine a node runs on? E.g. admins can just have some > jumpbox to a cluster they manage, they do not necessarily have access to > nodes themselves. So they would not be able to read it. > 2) It creates temp file which is not deleted so /tmp will be populated with > these dumps until node is turned off which might take a lot of time and can > consume a lot of disk space if dumps are done frequently and they are big. An > adversary might just dump cluster metadata until no disk space is left. > What I propose is that we would return all dump string, not just a filename > where we save it. We can also format the output on the client or we can tell > server what format we want the dump to be returned in. > If there is a concern about size of data to be returned, we might optionally > allow dumps to be returned as compressed by simple zipping on server and > unzipping on client where "zipper" is a standard java.util.zip so it > basically doesn't matter what jvm runs on client and server. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19394) Rethink dumping of cluster metadata via CMSOperationsMBean
[ https://issues.apache.org/jira/browse/CASSANDRA-19394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817411#comment-17817411 ] Stefan Miklosovic commented on CASSANDRA-19394: --- The existence of nodetool cms dump command is not necessary then? I do not like that we would make exceptions like "well but here nodetool just has to run on the same machine where your node runs". Could we at least provide some way how these files are cleaned up? Like they would be removed after 1 hour? Giving how busy operators are with other stuff, they will most probably just forget to remove it. Sure, you say that "well but node will be restarted so it will be removed" but in real life they might just inspect the file and never restart. We are clearly making assumptions around how this is going to be used and I think that it is safer to do it bullet-proof as possible. > Rethink dumping of cluster metadata via CMSOperationsMBean > -- > > Key: CASSANDRA-19394 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19394 > Project: Cassandra > Issue Type: Improvement > Components: Tool/nodetool, Transactional Cluster Metadata >Reporter: Stefan Miklosovic >Priority: Normal > > I think there are two problems in the implementation of dumping > ClusterMetadata in CMSOperationsMBean > 1) A dump is saved in a file and dumpClusterMetadata methods will return just > a file name where that dump is. However, nodetool / JMX call to MBean (or any > place this method is invoked from, we would like to offer a command in > nodetool which returns the dump) is meant to be used from anywhere, remotely, > so what happens when we execute nodetool or call these methods on a machine > different from a machine a node runs on? E.g. admins can just have some > jumpbox to a cluster they manage, they do not necessarily have access to > nodes themselves. So they would not be able to read it. > 2) It creates temp file which is not deleted so /tmp will be populated with > these dumps until node is turned off which might take a lot of time and can > consume a lot of disk space if dumps are done frequently and they are big. An > adversary might just dump cluster metadata until no disk space is left. > What I propose is that we would return all dump string, not just a filename > where we save it. We can also format the output on the client or we can tell > server what format we want the dump to be returned in. > If there is a concern about size of data to be returned, we might optionally > allow dumps to be returned as compressed by simple zipping on server and > unzipping on client where "zipper" is a standard java.util.zip so it > basically doesn't matter what jvm runs on client and server. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19396) Fix Contributing Code Changes page
[ https://issues.apache.org/jira/browse/CASSANDRA-19396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-19396: Change Category: Operability Complexity: Low Hanging Fruit Component/s: Legacy/Documentation and Website Fix Version/s: 4.0.x 4.1.x 5.0.x 5.x Priority: Low (was: Normal) Status: Open (was: Triage Needed) > Fix Contributing Code Changes page > -- > > Key: CASSANDRA-19396 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19396 > Project: Cassandra > Issue Type: Task > Components: Legacy/Documentation and Website >Reporter: Ekaterina Dimitrova >Priority: Low > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > > Fix "Choosing the Right Branches to Work on" section. Currently it says 2.1 > and 2.2 critical bug fixes when the community does not maintain at this point > those versions. Also, we already release 4.0 and 4.1, the code freeze should > move to 5.0. > I would like to suggest we update the page by saying - any version which is > not EOL and it is already GA - critical bug fixes, any post-alpha version > which is not GA yet - code freeze, no new features or improvements; > stabilization period. > We can also mention this info can be inferred from the Downloads page as that > one is updated on every release. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19396) Fix Contributing Code Changes page
Ekaterina Dimitrova created CASSANDRA-19396: --- Summary: Fix Contributing Code Changes page Key: CASSANDRA-19396 URL: https://issues.apache.org/jira/browse/CASSANDRA-19396 Project: Cassandra Issue Type: Task Reporter: Ekaterina Dimitrova Fix "Choosing the Right Branches to Work on" section. Currently it says 2.1 and 2.2 critical bug fixes when the community does not maintain at this point those versions. Also, we already release 4.0 and 4.1, the code freeze should move to 5.0. I would like to suggest we update the page by saying - any version which is not EOL and it is already GA - critical bug fixes, any post-alpha version which is not GA yet - code freeze, no new features or improvements; stabilization period. We can also mention this info can be inferred from the Downloads page as that one is updated on every release. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19392) deprecate dual ports support (native_transport_port_ssl)
[ https://issues.apache.org/jira/browse/CASSANDRA-19392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817406#comment-17817406 ] Brandon Williams commented on CASSANDRA-19392: -- I created CASSANDRA-19395 to add the warning. > deprecate dual ports support (native_transport_port_ssl) > - > > Key: CASSANDRA-19392 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19392 > Project: Cassandra > Issue Type: Task > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.0-beta > > Time Spent: 1h 40m > Remaining Estimate: 0h > > We decided (1) to deprecate dual ports support in 5.0 (and eventually remove > it in trunk). This ticket will track the work towards the deprecation for 5.0. > (1) https://lists.apache.org/thread/dow196gspwgp2og576zh3lotvt6mc3lv -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19395) Warn when native_transport_port_ssl is set
[ https://issues.apache.org/jira/browse/CASSANDRA-19395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-19395: - Bug Category: Parent values: Correctness(12982)Level 1 values: API / Semantic Definition(13162) Complexity: Normal Component/s: Legacy/CQL Discovered By: User Report Fix Version/s: 4.0.x 4.1.x 5.0.x Severity: Normal Status: Open (was: Triage Needed) > Warn when native_transport_port_ssl is set > -- > > Key: CASSANDRA-19395 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19395 > Project: Cassandra > Issue Type: Bug > Components: Legacy/CQL >Reporter: Brandon Williams >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x > > > In CASSANDRA-19392 this was deprecated, however Stefan notes that if you set > this it will work in a single node cluster because the peers table isn't > needed to distribute the information. This sounds like a recipe for "this > worked when we tested in development, but not in production" so it would be > good to warn users when this is set to avoid future confusion. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19395) Warn when native_transport_port_ssl is set
Brandon Williams created CASSANDRA-19395: Summary: Warn when native_transport_port_ssl is set Key: CASSANDRA-19395 URL: https://issues.apache.org/jira/browse/CASSANDRA-19395 Project: Cassandra Issue Type: Bug Reporter: Brandon Williams In CASSANDRA-19392 this was deprecated, however Stefan notes that if you set this it will work in a single node cluster because the peers table isn't needed to distribute the information. This sounds like a recipe for "this worked when we tested in development, but not in production" so it would be good to warn users when this is set to avoid future confusion. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19394) Rethink dumping of cluster metadata via CMSOperationsMBean
[ https://issues.apache.org/jira/browse/CASSANDRA-19394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817404#comment-17817404 ] Marcus Eriksson commented on CASSANDRA-19394: - This is used for emergencies - you dump a metadata, modify it and then boot an instance with it - it requires local access to the machine to be able to start with the modified cluster metadata. Don't think we should change this. We should at some point add a way to dump the cluster metadata in a human readable format though > Rethink dumping of cluster metadata via CMSOperationsMBean > -- > > Key: CASSANDRA-19394 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19394 > Project: Cassandra > Issue Type: Improvement > Components: Tool/nodetool, Transactional Cluster Metadata >Reporter: Stefan Miklosovic >Priority: Normal > > I think there are two problems in the implementation of dumping > ClusterMetadata in CMSOperationsMBean > 1) A dump is saved in a file and dumpClusterMetadata methods will return just > a file name where that dump is. However, nodetool / JMX call to MBean (or any > place this method is invoked from, we would like to offer a command in > nodetool which returns the dump) is meant to be used from anywhere, remotely, > so what happens when we execute nodetool or call these methods on a machine > different from a machine a node runs on? E.g. admins can just have some > jumpbox to a cluster they manage, they do not necessarily have access to > nodes themselves. So they would not be able to read it. > 2) It creates temp file which is not deleted so /tmp will be populated with > these dumps until node is turned off which might take a lot of time and can > consume a lot of disk space if dumps are done frequently and they are big. An > adversary might just dump cluster metadata until no disk space is left. > What I propose is that we would return all dump string, not just a filename > where we save it. We can also format the output on the client or we can tell > server what format we want the dump to be returned in. > If there is a concern about size of data to be returned, we might optionally > allow dumps to be returned as compressed by simple zipping on server and > unzipping on client where "zipper" is a standard java.util.zip so it > basically doesn't matter what jvm runs on client and server. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19392) deprecate dual ports support (native_transport_port_ssl)
[ https://issues.apache.org/jira/browse/CASSANDRA-19392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817402#comment-17817402 ] Brandon Williams commented on CASSANDRA-19392: -- +1 here too just in case, heh > deprecate dual ports support (native_transport_port_ssl) > - > > Key: CASSANDRA-19392 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19392 > Project: Cassandra > Issue Type: Task > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.0-beta > > Time Spent: 1h 40m > Remaining Estimate: 0h > > We decided (1) to deprecate dual ports support in 5.0 (and eventually remove > it in trunk). This ticket will track the work towards the deprecation for 5.0. > (1) https://lists.apache.org/thread/dow196gspwgp2og576zh3lotvt6mc3lv -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19365) invalid EstimatedHistogramReservoirSnapshot::getValue values due to race condition in DecayingEstimatedHistogramReservoir
[ https://issues.apache.org/jira/browse/CASSANDRA-19365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakub Zytka updated CASSANDRA-19365: Description: `DecayingEstimatedHistogramReservoir` has a race condition between `update` and `rescaleIfNeeded`. A sample which ends up (`update`) in an already scaled decayingBucket (`rescaleIfNeeded`) may still use a non-scaled weight because `decayLandmark` has not been updated yet at the moment of `update`. The observed consequence was flooding of the cluster with speculative retries (we happened to hit low-percentile buckets with overweight samples, which drove p99 below true p50 for a long time). Please note that despite the manifestation being similar to CASSANDRA-19330, these are two distinct bugs in their own right. This bug affects versions 4.0+ On 3.11 there's locking in DEHR. I did not check earlier versions. was: `DecayingEstimatedHistogramReservoir` has a race condition between `update` and `rescaleIfNeeded`. A sample which ends up (`update`) in an already scaled decayingBucket (`rescaleIfNeeded`) may still use a non-scaled weight because `decayLandmark` has not been updated yet at the moment of `update`. The observed consequence was flooding of the cluster with speculative retries (we happened to hit low-percentile buckets with overweight samples, which drove p99 below true p50 for a long time). Please note that despite the manifestation being similar to CASSANDRA-19330, these are two distinct bugs in their own right. This bug affects > invalid EstimatedHistogramReservoirSnapshot::getValue values due to race > condition in DecayingEstimatedHistogramReservoir > - > > Key: CASSANDRA-19365 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19365 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Jakub Zytka >Assignee: Jakub Zytka >Priority: Normal > Fix For: 5.x > > Time Spent: 20m > Remaining Estimate: 0h > > `DecayingEstimatedHistogramReservoir` has a race condition between `update` > and `rescaleIfNeeded`. > A sample which ends up (`update`) in an already scaled decayingBucket > (`rescaleIfNeeded`) may still use a non-scaled weight because `decayLandmark` > has not been updated yet at the moment of `update`. > > The observed consequence was flooding of the cluster with speculative retries > (we happened to hit low-percentile buckets with overweight samples, which > drove p99 below true p50 for a long time). > Please note that despite the manifestation being similar to CASSANDRA-19330, > these are two distinct bugs in their own right. > This bug affects versions 4.0+ > On 3.11 there's locking in DEHR. I did not check earlier versions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19365) invalid EstimatedHistogramReservoirSnapshot::getValue values due to race condition in DecayingEstimatedHistogramReservoir
[ https://issues.apache.org/jira/browse/CASSANDRA-19365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakub Zytka updated CASSANDRA-19365: Fix Version/s: 5.x (was: 5.1) > invalid EstimatedHistogramReservoirSnapshot::getValue values due to race > condition in DecayingEstimatedHistogramReservoir > - > > Key: CASSANDRA-19365 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19365 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Jakub Zytka >Assignee: Jakub Zytka >Priority: Normal > Fix For: 5.x > > Time Spent: 20m > Remaining Estimate: 0h > > `DecayingEstimatedHistogramReservoir` has a race condition between `update` > and `rescaleIfNeeded`. > A sample which ends up (`update`) in an already scaled decayingBucket > (`rescaleIfNeeded`) may still use a non-scaled weight because `decayLandmark` > has not been updated yet at the moment of `update`. > > The observed consequence was flooding of the cluster with speculative retries > (we happened to hit low-percentile buckets with overweight samples, which > drove p99 below true p50 for a long time). > Please note that despite the manifestation being similar to CASSANDRA-19330, > these are two distinct bugs in their own right. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19392) deprecate dual ports support (native_transport_port_ssl)
[ https://issues.apache.org/jira/browse/CASSANDRA-19392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817399#comment-17817399 ] Stefan Miklosovic commented on CASSANDRA-19392: --- Brandon +1ed privately. I am going to send it. > deprecate dual ports support (native_transport_port_ssl) > - > > Key: CASSANDRA-19392 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19392 > Project: Cassandra > Issue Type: Task > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.0-beta > > Time Spent: 1h 40m > Remaining Estimate: 0h > > We decided (1) to deprecate dual ports support in 5.0 (and eventually remove > it in trunk). This ticket will track the work towards the deprecation for 5.0. > (1) https://lists.apache.org/thread/dow196gspwgp2og576zh3lotvt6mc3lv -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19365) invalid EstimatedHistogramReservoirSnapshot::getValue values due to race condition in DecayingEstimatedHistogramReservoir
[ https://issues.apache.org/jira/browse/CASSANDRA-19365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakub Zytka updated CASSANDRA-19365: Fix Version/s: 5.1 > invalid EstimatedHistogramReservoirSnapshot::getValue values due to race > condition in DecayingEstimatedHistogramReservoir > - > > Key: CASSANDRA-19365 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19365 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Jakub Zytka >Assignee: Jakub Zytka >Priority: Normal > Fix For: 5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > `DecayingEstimatedHistogramReservoir` has a race condition between `update` > and `rescaleIfNeeded`. > A sample which ends up (`update`) in an already scaled decayingBucket > (`rescaleIfNeeded`) may still use a non-scaled weight because `decayLandmark` > has not been updated yet at the moment of `update`. > > The observed consequence was flooding of the cluster with speculative retries > (we happened to hit low-percentile buckets with overweight samples, which > drove p99 below true p50 for a long time). > Please note that despite the manifestation being similar to CASSANDRA-19330, > these are two distinct bugs in their own right. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19394) Rethink dumping of cluster metadata via CMSOperationsMBean
Stefan Miklosovic created CASSANDRA-19394: - Summary: Rethink dumping of cluster metadata via CMSOperationsMBean Key: CASSANDRA-19394 URL: https://issues.apache.org/jira/browse/CASSANDRA-19394 Project: Cassandra Issue Type: Improvement Components: Tool/nodetool, Transactional Cluster Metadata Reporter: Stefan Miklosovic I think there are two problems in the implementation of dumping ClusterMetadata in CMSOperationsMBean 1) A dump is saved in a file and dumpClusterMetadata methods will return just a file name where that dump is. However, nodetool / JMX call to MBean (or any place this method is invoked from, we would like to offer a command in nodetool which returns the dump) is meant to be used from anywhere, remotely, so what happens when we execute nodetool or call these methods on a machine different from a machine a node runs on? E.g. admins can just have some jumpbox to a cluster they manage, they do not necessarily have access to nodes themselves. So they would not be able to read it. 2) It creates temp file which is not deleted so /tmp will be populated with these dumps until node is turned off which might take a lot of time and can consume a lot of disk space if dumps are done frequently and they are big. An adversary might just dump cluster metadata until no disk space is left. What I propose is that we would return all dump string, not just a filename where we save it. We can also format the output on the client or we can tell server what format we want the dump to be returned in. If there is a concern about size of data to be returned, we might optionally allow dumps to be returned as compressed by simple zipping on server and unzipping on client where "zipper" is a standard java.util.zip so it basically doesn't matter what jvm runs on client and server. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra-website) branch asf-staging updated (8a641fca7 -> 367c839bb)
This is an automated email from the ASF dual-hosted git repository. git-site-role pushed a change to branch asf-staging in repository https://gitbox.apache.org/repos/asf/cassandra-website.git discard 8a641fca7 generate docs for aa8a03c7 new 367c839bb generate docs for aa8a03c7 This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (8a641fca7) \ N -- N -- N refs/heads/asf-staging (367c839bb) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../managing/tools/nodetool/reconfigurecms.html| 11 +-- .../managing/tools/nodetool/reconfigurecms.html| 11 +-- site-ui/build/ui-bundle.zip| Bin 4883646 -> 4883646 bytes 3 files changed, 10 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19365) invalid EstimatedHistogramReservoirSnapshot::getValue values due to race condition in DecayingEstimatedHistogramReservoir
[ https://issues.apache.org/jira/browse/CASSANDRA-19365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817370#comment-17817370 ] Jakub Zytka edited comment on CASSANDRA-19365 at 2/14/24 1:04 PM: -- [https://github.com/apache/cassandra/pull/3102/files] The proposed PR keeps changes to almost a minimum to stay consistent with the current state of DEHR. The solution doesn't introduce synchronization between updates and rescales for the sake of update performance. Instead, it introduces an atomic change of decay landmark and decaying buckets together. This lets us keep updates non-synchronized at the price of letting some updates be missed during rescale. It also prevents the creation of snapshots that are half-rescaled. was (Author: jakubzytka): [https://github.com/apache/cassandra/pull/3102/files] The proposed PR keeps changes to almost a minimum to stay consistent with the current state of DEHR. The solution doesn't introduce synchronization between updates and rescales for the sake of update performance. Instead, it introduces an atomic change of decay landmark and decaying buckets together. This lets us keep updates non-synchronized at the price of letting some updates be missed during rescale. It also allows the creation of snapshots that are not half-rescaled. > invalid EstimatedHistogramReservoirSnapshot::getValue values due to race > condition in DecayingEstimatedHistogramReservoir > - > > Key: CASSANDRA-19365 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19365 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Jakub Zytka >Assignee: Jakub Zytka >Priority: Normal > Time Spent: 20m > Remaining Estimate: 0h > > `DecayingEstimatedHistogramReservoir` has a race condition between `update` > and `rescaleIfNeeded`. > A sample which ends up (`update`) in an already scaled decayingBucket > (`rescaleIfNeeded`) may still use a non-scaled weight because `decayLandmark` > has not been updated yet at the moment of `update`. > > The observed consequence was flooding of the cluster with speculative retries > (we happened to hit low-percentile buckets with overweight samples, which > drove p99 below true p50 for a long time). > Please note that despite the manifestation being similar to CASSANDRA-19330, > these are two distinct bugs in their own right. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19365) invalid EstimatedHistogramReservoirSnapshot::getValue values due to race condition in DecayingEstimatedHistogramReservoir
[ https://issues.apache.org/jira/browse/CASSANDRA-19365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817370#comment-17817370 ] Jakub Zytka edited comment on CASSANDRA-19365 at 2/14/24 1:02 PM: -- [https://github.com/apache/cassandra/pull/3102/files] The proposed PR keeps changes to almost a minimum to stay consistent with the current state of DEHR. The solution doesn't introduce synchronization between updates and rescales for the sake of update performance. Instead, it introduces an atomic change of decay landmark and decaying buckets together. This lets us keep updates non-synchronized at the price of letting some updates be missed during rescale. It also allows the creation of snapshots that are not half-rescaled. was (Author: jakubzytka): [https://github.com/apache/cassandra/pull/3102/files] The proposed PR keeps changes to almost a minimum to stay consistent with the current state of DEHR. > invalid EstimatedHistogramReservoirSnapshot::getValue values due to race > condition in DecayingEstimatedHistogramReservoir > - > > Key: CASSANDRA-19365 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19365 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Jakub Zytka >Assignee: Jakub Zytka >Priority: Normal > Time Spent: 20m > Remaining Estimate: 0h > > `DecayingEstimatedHistogramReservoir` has a race condition between `update` > and `rescaleIfNeeded`. > A sample which ends up (`update`) in an already scaled decayingBucket > (`rescaleIfNeeded`) may still use a non-scaled weight because `decayLandmark` > has not been updated yet at the moment of `update`. > > The observed consequence was flooding of the cluster with speculative retries > (we happened to hit low-percentile buckets with overweight samples, which > drove p99 below true p50 for a long time). > Please note that despite the manifestation being similar to CASSANDRA-19330, > these are two distinct bugs in their own right. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19365) invalid EstimatedHistogramReservoirSnapshot::getValue values due to race condition in DecayingEstimatedHistogramReservoir
[ https://issues.apache.org/jira/browse/CASSANDRA-19365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817370#comment-17817370 ] Jakub Zytka edited comment on CASSANDRA-19365 at 2/14/24 12:58 PM: --- [https://github.com/apache/cassandra/pull/3102/files] The proposed PR keeps changes to almost a minimum to stay consistent with the current state of DEHR. was (Author: jakubzytka): https://github.com/apache/cassandra/pull/3102/files > invalid EstimatedHistogramReservoirSnapshot::getValue values due to race > condition in DecayingEstimatedHistogramReservoir > - > > Key: CASSANDRA-19365 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19365 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Jakub Zytka >Assignee: Jakub Zytka >Priority: Normal > Time Spent: 20m > Remaining Estimate: 0h > > `DecayingEstimatedHistogramReservoir` has a race condition between `update` > and `rescaleIfNeeded`. > A sample which ends up (`update`) in an already scaled decayingBucket > (`rescaleIfNeeded`) may still use a non-scaled weight because `decayLandmark` > has not been updated yet at the moment of `update`. > > The observed consequence was flooding of the cluster with speculative retries > (we happened to hit low-percentile buckets with overweight samples, which > drove p99 below true p50 for a long time). > Please note that despite the manifestation being similar to CASSANDRA-19330, > these are two distinct bugs in their own right. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19365) invalid EstimatedHistogramReservoirSnapshot::getValue values due to race condition in DecayingEstimatedHistogramReservoir
[ https://issues.apache.org/jira/browse/CASSANDRA-19365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakub Zytka updated CASSANDRA-19365: Bug Category: Parent values: Correctness(12982) Complexity: Normal Component/s: Observability/Metrics Discovered By: Adhoc Test Severity: Normal Status: Open (was: Triage Needed) > invalid EstimatedHistogramReservoirSnapshot::getValue values due to race > condition in DecayingEstimatedHistogramReservoir > - > > Key: CASSANDRA-19365 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19365 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Jakub Zytka >Assignee: Jakub Zytka >Priority: Normal > Time Spent: 20m > Remaining Estimate: 0h > > `DecayingEstimatedHistogramReservoir` has a race condition between `update` > and `rescaleIfNeeded`. > A sample which ends up (`update`) in an already scaled decayingBucket > (`rescaleIfNeeded`) may still use a non-scaled weight because `decayLandmark` > has not been updated yet at the moment of `update`. > > The observed consequence was flooding of the cluster with speculative retries > (we happened to hit low-percentile buckets with overweight samples, which > drove p99 below true p50 for a long time). > Please note that despite the manifestation being similar to CASSANDRA-19330, > these are two distinct bugs in their own right. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19365) invalid EstimatedHistogramReservoirSnapshot::getValue values due to race condition in DecayingEstimatedHistogramReservoir
[ https://issues.apache.org/jira/browse/CASSANDRA-19365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakub Zytka updated CASSANDRA-19365: Test and Documentation Plan: unit test Status: Patch Available (was: Open) https://github.com/apache/cassandra/pull/3102/files > invalid EstimatedHistogramReservoirSnapshot::getValue values due to race > condition in DecayingEstimatedHistogramReservoir > - > > Key: CASSANDRA-19365 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19365 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Jakub Zytka >Assignee: Jakub Zytka >Priority: Normal > Time Spent: 20m > Remaining Estimate: 0h > > `DecayingEstimatedHistogramReservoir` has a race condition between `update` > and `rescaleIfNeeded`. > A sample which ends up (`update`) in an already scaled decayingBucket > (`rescaleIfNeeded`) may still use a non-scaled weight because `decayLandmark` > has not been updated yet at the moment of `update`. > > The observed consequence was flooding of the cluster with speculative retries > (we happened to hit low-percentile buckets with overweight samples, which > drove p99 below true p50 for a long time). > Please note that despite the manifestation being similar to CASSANDRA-19330, > these are two distinct bugs in their own right. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19393) nodetool: group CMS-related commands into one command
[ https://issues.apache.org/jira/browse/CASSANDRA-19393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817363#comment-17817363 ] n.v.harikrishna commented on CASSANDRA-19393: - Thanks you all for the inputs! I will update the PR as per the discussion. > nodetool: group CMS-related commands into one command > - > > Key: CASSANDRA-19393 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19393 > Project: Cassandra > Issue Type: Improvement > Components: Tool/nodetool, Transactional Cluster Metadata >Reporter: n.v.harikrishna >Assignee: n.v.harikrishna >Priority: Normal > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > The purpose of this ticket is to group all CMS-related commands under one > "nodetool cms" command where existing command would be subcommands of it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19393) nodetool: group CMS-related commands into one command
[ https://issues.apache.org/jira/browse/CASSANDRA-19393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817360#comment-17817360 ] Brandon Williams commented on CASSANDRA-19393: -- bq. While it does make sense to group this, there is some kind of a habit in nodetool that each command is standalone. I agree with Marcus, we have repair_admin already and I think going to git-style subcommands is the inevitable evolution as the number of commands we have grows. > nodetool: group CMS-related commands into one command > - > > Key: CASSANDRA-19393 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19393 > Project: Cassandra > Issue Type: Improvement > Components: Tool/nodetool, Transactional Cluster Metadata >Reporter: n.v.harikrishna >Assignee: n.v.harikrishna >Priority: Normal > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > The purpose of this ticket is to group all CMS-related commands under one > "nodetool cms" command where existing command would be subcommands of it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19335) Default nodetool tablestats to Human-Readable Output
[ https://issues.apache.org/jira/browse/CASSANDRA-19335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817357#comment-17817357 ] Brandon Williams edited comment on CASSANDRA-19335 at 2/14/24 12:09 PM: The dtests are broken as a consequence of CCM being broken. I thought we were on the same page there earlier when you indicated your preference is to use the JSON output in CCM, but in any case CCM's [data_size|https://github.com/riptano/ccm/blob/master/ccmlib/node.py#L1536] function being broken is the crux of the problems here, which trickles down into the dtests. was (Author: brandon.williams): The dtests are broken as a consequence of CCM being broken. I thought we were on the same page there earlier when you indicated your preference is to use the JSON output in CCM, but in any case CCM's data_size function being broken is the crux of the problems here, which trickles down into the dtests. > Default nodetool tablestats to Human-Readable Output > > > Key: CASSANDRA-19335 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19335 > Project: Cassandra > Issue Type: Improvement > Components: Tool/nodetool >Reporter: Leo Toff >Assignee: Leo Toff >Priority: Low > Fix For: 5.x > > Time Spent: 50m > Remaining Estimate: 0h > > *Current Behavior* > The current implementation of nodetool tablestats in Apache Cassandra outputs > statistics in a format that is not immediately human-readable. This output > primarily includes raw byte counts, which require additional calculation or > conversion to be easily understood by users. This can be inefficient and > time-consuming, especially for users who frequently monitor these statistics > for performance tuning or maintenance purposes. > *Proposed Change* > We propose that nodetool tablestats should, by default, provide its output in > a human-readable format. This change would involve converting byte counts > into more understandable units (KiB, MiB, GiB). The tool could still retain > the option to display raw data for those who need it, perhaps through a flag > such as --no-human-readable or --raw. > *Considerations* > The change should maintain backward compatibility, ensuring that scripts or > tools relying on the current output format can continue to function correctly. > We should provide adequate documentation and examples of both the new default > output and how to access the raw data format, if needed. > *Alignment* > Discussion in the dev mailing list: > [https://lists.apache.org/thread/mlp715kxho5b6f1ql9omlzmmnh4qfby9] > *Related work* > Previous work in the series: > # https://issues.apache.org/jira/browse/CASSANDRA-19015 > # https://issues.apache.org/jira/browse/CASSANDRA-19104 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org