[jira] [Updated] (CASSANDRA-19221) CMS: Nodes can restart with new ipaddress already defined in the cluster

2024-04-25 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19221:

Status: Ready to Commit  (was: Review In Progress)

+1. Agreed about the splitting of {{MetadataChangeSimulationTest}}, 
CASSANDRA-19344 added another dimension to the tests, so it's not super 
surprising if that sometimes pushes run time over the timeout.

> CMS: Nodes can restart with new ipaddress already defined in the cluster
> 
>
> Key: CASSANDRA-19221
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19221
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Paul Chandler
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> I am simulating running a cluster in Kubernetes and testing what happens when 
> several pods go down and  ip addresses are swapped between nodes. In 4.0 this 
> is blocked and the node cannot be restarted.
> To simulate this I create a 3 node cluster on a local machine using 3 
> loopback addresses
> {code}
> 127.0.0.1
> 127.0.0.2
> 127.0.0.3
> {code}
> The nodes are created correctly and the first node is assigned as a CMS node 
> as shown:
> {code}
> bin/nodetool -p 7199 describecms
> {code}
> Cluster Metadata Service:
> {code}
> Members: /127.0.0.1:7000
> Is Member: true
> Service State: LOCAL
> {code}
> At this point I bring down the nodes 127.0.0.2 and 127.0.0.3 and swap the ip 
> addresses for the rpc_address and listen_address 
>  
> The nodes come back as normal, but the nodeid has now been swapped against 
> the ip address:
> Before:
> {code}
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load       Tokens  Owns (effective)  Host ID                   
>             Rack
> UN  127.0.0.3  75.2 KiB   16      76.0%             
> 6d194555-f6eb-41d0-c000-0003  rack1
> UN  127.0.0.2  86.77 KiB  16      59.3%             
> 6d194555-f6eb-41d0-c000-0002  rack1
> UN  127.0.0.1  80.88 KiB  16      64.7%             
> 6d194555-f6eb-41d0-c000-0001  rack1
> {code}
> After:
> {code}
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load        Tokens  Owns (effective)  Host ID                  
>              Rack
> UN  127.0.0.3  149.62 KiB  16      76.0%             
> 6d194555-f6eb-41d0-c000-0003  rack1
> UN  127.0.0.2  155.48 KiB  16      59.3%             
> 6d194555-f6eb-41d0-c000-0002  rack1
> UN  127.0.0.1  75.74 KiB   16      64.7%             
> 6d194555-f6eb-41d0-c000-0001  rack1
> {code}
> On previous tests of this I have created a table with a replication factor of 
> 1, inserted some data before the swap.   After the swap the data on nodes 2 
> and 3 is now missing. 
> One theory I have is that I am using different port numbers for the different 
> nodes, and I am only swapping the ip addresses and not the port numbers, so 
> the ip:port still looks unique
> i.e. 127.0.0.2:9043 becomes 127.0.0.2:9044
> and 127.0.0.3:9044 becomes 127.0.0.3:9043
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19587) Remove leftover period column from system.metadata_snapshots

2024-04-25 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19587:

Status: Review In Progress  (was: Patch Available)

> Remove leftover period column from system.metadata_snapshots
> 
>
> Key: CASSANDRA-19587
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19587
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Seems we left a period column in metadata_snapshots in 
> CASSANDRA-19189/CASSANDRA-19482 - it should be removed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19587) Remove leftover period column from system.metadata_snapshots

2024-04-25 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19587:

Status: Ready to Commit  (was: Review In Progress)

+1

> Remove leftover period column from system.metadata_snapshots
> 
>
> Key: CASSANDRA-19587
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19587
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Seems we left a period column in metadata_snapshots in 
> CASSANDRA-19189/CASSANDRA-19482 - it should be removed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19191) Optimisations to PlacementForRange, improve lookup on r/w path

2024-04-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19191:

Status: Needs Committer  (was: Patch Available)

> Optimisations to PlacementForRange, improve lookup on r/w path
> --
>
> Key: CASSANDRA-19191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19191
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary-1.html, ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The lookup used when selecting the appropriate replica group for a range or 
> token while peforming reads and writes is extremely simplistic and 
> inefficient. There is plenty of scope to improve {{PlacementsForRange}} to by 
> replacing the current naive iteration with use a more efficient lookup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19191) Optimisations to PlacementForRange, improve lookup on r/w path

2024-04-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19191:

Status: Review In Progress  (was: Needs Committer)

> Optimisations to PlacementForRange, improve lookup on r/w path
> --
>
> Key: CASSANDRA-19191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19191
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary-1.html, ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The lookup used when selecting the appropriate replica group for a range or 
> token while peforming reads and writes is extremely simplistic and 
> inefficient. There is plenty of scope to improve {{PlacementsForRange}} to by 
> replacing the current naive iteration with use a more efficient lookup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19191) Optimisations to PlacementForRange, improve lookup on r/w path

2024-04-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19191:

Status: Ready to Commit  (was: Review In Progress)

+1 

> Optimisations to PlacementForRange, improve lookup on r/w path
> --
>
> Key: CASSANDRA-19191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19191
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary-1.html, ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The lookup used when selecting the appropriate replica group for a range or 
> token while peforming reads and writes is extremely simplistic and 
> inefficient. There is plenty of scope to improve {{PlacementsForRange}} to by 
> replacing the current naive iteration with use a more efficient lookup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19132) Update use of transition plan in PrepareReplace

2024-04-19 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19132:

Attachment: ci_summary.html

> Update use of transition plan in PrepareReplace
> ---
>
> Key: CASSANDRA-19132
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19132
> Project: Cassandra
>  Issue Type: Task
>  Components: Cluster/Membership
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When PlacementTransitionPlan was reworked to make its use more consistent 
> across join and leave operations, PrepareReplace was not updated. This could 
> now be simplified in line with the other operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19132) Update use of transition plan in PrepareReplace

2024-04-19 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19132:

Status: Ready to Commit  (was: Review In Progress)

+1 LGTM. I rebased and added a second commit with a slight tweak to 
{{PlacementTransitionPlan}}. CI looks reasonable, 2 previously known failures + 
1 {{Port already in use}} which I believe is an infra problem

> Update use of transition plan in PrepareReplace
> ---
>
> Key: CASSANDRA-19132
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19132
> Project: Cassandra
>  Issue Type: Task
>  Components: Cluster/Membership
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When PlacementTransitionPlan was reworked to make its use more consistent 
> across join and leave operations, PrepareReplace was not updated. This could 
> now be simplified in line with the other operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets

2024-04-19 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19344:

  Fix Version/s: 5.1
 (was: 5.x)
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra/commit/dabcb175527d3c2daef54c6ce029b3c3054b2a77
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed, thanks!

> Range movements involving transient replicas must safely enact changes to 
> read and write replica sets
> -
>
> Key: CASSANDRA-19344
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19344
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary-1.html, ci_summary.html, 
> remove-n4-post-19344.txt, remove-n4-pre-19344.txt, result_details.tar.gz
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> (edit) This was originally opened due to a flaky test 
> {{org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17}}
> The test can fail in two different ways:
> {code:java}
> junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), 
> (31,50)] at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203)
>  at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> as in here - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0]
> and
> {code:java}
> junit.framework.AssertionFailedError: nodetool command [removenode, 
> 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: 
> stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use 
> decommission command to remove it from the ring -- StackTrace -- 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at 

[jira] [Updated] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets

2024-04-19 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19344:

Status: Ready to Commit  (was: Review In Progress)

> Range movements involving transient replicas must safely enact changes to 
> read and write replica sets
> -
>
> Key: CASSANDRA-19344
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19344
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary-1.html, ci_summary.html, 
> remove-n4-post-19344.txt, remove-n4-pre-19344.txt, result_details.tar.gz
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> (edit) This was originally opened due to a flaky test 
> {{org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17}}
> The test can fail in two different ways:
> {code:java}
> junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), 
> (31,50)] at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203)
>  at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> as in here - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0]
> and
> {code:java}
> junit.framework.AssertionFailedError: nodetool command [removenode, 
> 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: 
> stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use 
> decommission command to remove it from the ring -- StackTrace -- 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> 

[jira] [Updated] (CASSANDRA-19132) Update use of transition plan in PrepareReplace

2024-04-19 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19132:

Status: Review In Progress  (was: Patch Available)

> Update use of transition plan in PrepareReplace
> ---
>
> Key: CASSANDRA-19132
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19132
> Project: Cassandra
>  Issue Type: Task
>  Components: Cluster/Membership
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When PlacementTransitionPlan was reworked to make its use more consistent 
> across join and leave operations, PrepareReplace was not updated. This could 
> now be simplified in line with the other operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19221) CMS: Nodes can restart with new ipaddress already defined in the cluster

2024-04-18 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19221:

Reviewers: Sam Tunnicliffe, Sam Tunnicliffe
   Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Status: Review In Progress  (was: Patch Available)

> CMS: Nodes can restart with new ipaddress already defined in the cluster
> 
>
> Key: CASSANDRA-19221
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19221
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Paul Chandler
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary.html
>
>
> I am simulating running a cluster in Kubernetes and testing what happens when 
> several pods go down and  ip addresses are swapped between nodes. In 4.0 this 
> is blocked and the node cannot be restarted.
> To simulate this I create a 3 node cluster on a local machine using 3 
> loopback addresses
> {code}
> 127.0.0.1
> 127.0.0.2
> 127.0.0.3
> {code}
> The nodes are created correctly and the first node is assigned as a CMS node 
> as shown:
> {code}
> bin/nodetool -p 7199 describecms
> {code}
> Cluster Metadata Service:
> {code}
> Members: /127.0.0.1:7000
> Is Member: true
> Service State: LOCAL
> {code}
> At this point I bring down the nodes 127.0.0.2 and 127.0.0.3 and swap the ip 
> addresses for the rpc_address and listen_address 
>  
> The nodes come back as normal, but the nodeid has now been swapped against 
> the ip address:
> Before:
> {code}
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load       Tokens  Owns (effective)  Host ID                   
>             Rack
> UN  127.0.0.3  75.2 KiB   16      76.0%             
> 6d194555-f6eb-41d0-c000-0003  rack1
> UN  127.0.0.2  86.77 KiB  16      59.3%             
> 6d194555-f6eb-41d0-c000-0002  rack1
> UN  127.0.0.1  80.88 KiB  16      64.7%             
> 6d194555-f6eb-41d0-c000-0001  rack1
> {code}
> After:
> {code}
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load        Tokens  Owns (effective)  Host ID                  
>              Rack
> UN  127.0.0.3  149.62 KiB  16      76.0%             
> 6d194555-f6eb-41d0-c000-0003  rack1
> UN  127.0.0.2  155.48 KiB  16      59.3%             
> 6d194555-f6eb-41d0-c000-0002  rack1
> UN  127.0.0.1  75.74 KiB   16      64.7%             
> 6d194555-f6eb-41d0-c000-0001  rack1
> {code}
> On previous tests of this I have created a table with a replication factor of 
> 1, inserted some data before the swap.   After the swap the data on nodes 2 
> and 3 is now missing. 
> One theory I have is that I am using different port numbers for the different 
> nodes, and I am only swapping the ip addresses and not the port numbers, so 
> the ip:port still looks unique
> i.e. 127.0.0.2:9043 becomes 127.0.0.2:9044
> and 127.0.0.3:9044 becomes 127.0.0.3:9043
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19221) CMS: Nodes can restart with new ipaddress already defined in the cluster

2024-04-18 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838761#comment-17838761
 ] 

Sam Tunnicliffe commented on CASSANDRA-19221:
-

+1. I left a couple of minor suggestions on the PR, feel free to accept or 
ignore them.

> CMS: Nodes can restart with new ipaddress already defined in the cluster
> 
>
> Key: CASSANDRA-19221
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19221
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Paul Chandler
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary.html
>
>
> I am simulating running a cluster in Kubernetes and testing what happens when 
> several pods go down and  ip addresses are swapped between nodes. In 4.0 this 
> is blocked and the node cannot be restarted.
> To simulate this I create a 3 node cluster on a local machine using 3 
> loopback addresses
> {code}
> 127.0.0.1
> 127.0.0.2
> 127.0.0.3
> {code}
> The nodes are created correctly and the first node is assigned as a CMS node 
> as shown:
> {code}
> bin/nodetool -p 7199 describecms
> {code}
> Cluster Metadata Service:
> {code}
> Members: /127.0.0.1:7000
> Is Member: true
> Service State: LOCAL
> {code}
> At this point I bring down the nodes 127.0.0.2 and 127.0.0.3 and swap the ip 
> addresses for the rpc_address and listen_address 
>  
> The nodes come back as normal, but the nodeid has now been swapped against 
> the ip address:
> Before:
> {code}
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load       Tokens  Owns (effective)  Host ID                   
>             Rack
> UN  127.0.0.3  75.2 KiB   16      76.0%             
> 6d194555-f6eb-41d0-c000-0003  rack1
> UN  127.0.0.2  86.77 KiB  16      59.3%             
> 6d194555-f6eb-41d0-c000-0002  rack1
> UN  127.0.0.1  80.88 KiB  16      64.7%             
> 6d194555-f6eb-41d0-c000-0001  rack1
> {code}
> After:
> {code}
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load        Tokens  Owns (effective)  Host ID                  
>              Rack
> UN  127.0.0.3  149.62 KiB  16      76.0%             
> 6d194555-f6eb-41d0-c000-0003  rack1
> UN  127.0.0.2  155.48 KiB  16      59.3%             
> 6d194555-f6eb-41d0-c000-0002  rack1
> UN  127.0.0.1  75.74 KiB   16      64.7%             
> 6d194555-f6eb-41d0-c000-0001  rack1
> {code}
> On previous tests of this I have created a table with a replication factor of 
> 1, inserted some data before the swap.   After the swap the data on nodes 2 
> and 3 is now missing. 
> One theory I have is that I am using different port numbers for the different 
> nodes, and I am only swapping the ip addresses and not the port numbers, so 
> the ip:port still looks unique
> i.e. 127.0.0.2:9043 becomes 127.0.0.2:9044
> and 127.0.0.3:9044 becomes 127.0.0.3:9043
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets

2024-04-18 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838739#comment-17838739
 ] 

Sam Tunnicliffe commented on CASSANDRA-19344:
-

Rebased and attached an updated {{ci_summary-1.html}}

> Range movements involving transient replicas must safely enact changes to 
> read and write replica sets
> -
>
> Key: CASSANDRA-19344
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19344
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary-1.html, ci_summary.html, 
> remove-n4-post-19344.txt, remove-n4-pre-19344.txt, result_details.tar.gz
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> (edit) This was originally opened due to a flaky test 
> {{org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17}}
> The test can fail in two different ways:
> {code:java}
> junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), 
> (31,50)] at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203)
>  at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> as in here - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0]
> and
> {code:java}
> junit.framework.AssertionFailedError: nodetool command [removenode, 
> 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: 
> stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use 
> decommission command to remove it from the ring -- StackTrace -- 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)

[jira] [Updated] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets

2024-04-18 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19344:

Attachment: ci_summary-1.html

> Range movements involving transient replicas must safely enact changes to 
> read and write replica sets
> -
>
> Key: CASSANDRA-19344
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19344
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary-1.html, ci_summary.html, 
> remove-n4-post-19344.txt, remove-n4-pre-19344.txt, result_details.tar.gz
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> (edit) This was originally opened due to a flaky test 
> {{org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17}}
> The test can fail in two different ways:
> {code:java}
> junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), 
> (31,50)] at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203)
>  at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> as in here - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0]
> and
> {code:java}
> junit.framework.AssertionFailedError: nodetool command [removenode, 
> 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: 
> stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use 
> decommission command to remove it from the ring -- StackTrace -- 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> 

[jira] [Updated] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail

2024-04-18 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19514:

Source Control Link: 
https://github.com/apache/cassandra/commit/cbf4dcb3345c7e2f42f6a897c66b6460b7acc2ca
  (was: 
https://github.com/apache/cassandra/commit/a5b8c06bb925905719261b1f449fffb049f54d1b)
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed, thanks!

> When jvm-dtest is shutting down an instance TCM retries block the shutdown 
> causing the test to fail
> ---
>
> Key: CASSANDRA-19514
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19514
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership, Test/dtest/java
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks
> {code}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException
>org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540)
>
> org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098)
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77)
>java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  Caused by: java.util.concurrent.TimeoutException
>
> org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) 
> Suppressed: java.util.concurrent.TimeoutException
> {code}
> In debugger I found the blocked future and it was 
> src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on 
> src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail

2024-04-18 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19514:

Reviewers: Alex Petrov, Blake Eggleston, Marcus Eriksson, Sam Tunnicliffe  
(was: Alex Petrov, Blake Eggleston, Marcus Eriksson)
   Status: Review In Progress  (was: Patch Available)

> When jvm-dtest is shutting down an instance TCM retries block the shutdown 
> causing the test to fail
> ---
>
> Key: CASSANDRA-19514
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19514
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership, Test/dtest/java
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks
> {code}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException
>org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540)
>
> org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098)
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77)
>java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  Caused by: java.util.concurrent.TimeoutException
>
> org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) 
> Suppressed: java.util.concurrent.TimeoutException
> {code}
> In debugger I found the blocked future and it was 
> src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on 
> src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail

2024-04-18 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19514:

Status: Ready to Commit  (was: Review In Progress)

> When jvm-dtest is shutting down an instance TCM retries block the shutdown 
> causing the test to fail
> ---
>
> Key: CASSANDRA-19514
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19514
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership, Test/dtest/java
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks
> {code}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException
>org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540)
>
> org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098)
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77)
>java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  Caused by: java.util.concurrent.TimeoutException
>
> org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) 
> Suppressed: java.util.concurrent.TimeoutException
> {code}
> In debugger I found the blocked future and it was 
> src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on 
> src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19538) Test Failure: test_assassinate_valid_node

2024-04-18 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19538:

  Fix Version/s: 5.1
 (was: 5.x)
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra/commit/80971709b983566a3f2dbfc189dfa1c5367d69bb
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Merged to trunk (with ninja followup because *someone* forgot to add 
{{CHANGES.txt}})

> Test Failure: test_assassinate_valid_node
> -
>
> Key: CASSANDRA-19538
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19538
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI, Test/dtest/python
>Reporter: Ekaterina Dimitrova
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Failing consistently on trunk:
> {code:java}
> ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after 120.11/120 
> seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log:
>  Head: INFO  [Messaging-EventLoop-3-1] 2024-04-03 19:37:3
>  Tail: ... some nodes were not ready
> INFO  [OptionalTasks:1] 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 
> - Setup task failed with error, rescheduling
> self = 
> def test_assassinate_valid_node(self):
> """
> @jira_ticket CASSANDRA-16588
> Test that after taking two non-seed nodes down and assassinating
> one of them, the other can come back up.
> """
> cluster = self.cluster
> 
> cluster.populate(5).start()
> node1 = cluster.nodelist()[0]
> node3 = cluster.nodelist()[2]
> 
> self.cluster.set_configuration_options({
> 'seed_provider': [{'class_name': 
> 'org.apache.cassandra.locator.SimpleSeedProvider',
>'parameters': [{'seeds': node1.address()}]
>   }]
> })
> 
> non_seed_nodes = cluster.nodelist()[-2:]
> for node in non_seed_nodes:
> node.stop()
> 
> assassination_target = non_seed_nodes[0]
> logger.debug("Assassinating non-seed node 
> {}".format(assassination_target.address()))
> out, err, _ = node1.nodetool("assassinate 
> {}".format(assassination_target.address()))
> assert_stderr_clean(err)
> 
> logger.debug("Starting non-seed nodes")
> for node in non_seed_nodes:
> >   node.start()
> gossip_test.py:78: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:915: in start
> node.watch_log_for_alive(self, from_mark=mark)
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:684: in 
> watch_log_for_alive
> self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, 
> filename=filename)
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:608: in watch_log_for
> TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name,
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> start = 1712173052.8186479, timeout = 120
> msg = "Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log:\n 
> Head: INFO  [Messaging-EventLoop-3-1] 2024-04-03 1...[OptionalTasks:1] 
> 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 - Setup task failed 
> with error, rescheduling\n"
> node = 'node1'
> @staticmethod
> def raise_if_passed(start, timeout, msg, node=None):
> if start + timeout < time.time():
> >   raise TimeoutError.create(start, timeout, msg, node)
> E   ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after 
> 120.11/120 seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in 
> system.log:
> EHead: INFO  [Messaging-EventLoop-3-1] 2024-04-03 19:37:3
> ETail: ... some nodes were not ready
> E   INFO  [OptionalTasks:1] 2024-04-03 19:39:30,454 
> CassandraRoleManager.java:484 - Setup task failed with error, rescheduling
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:56: TimeoutError
> {code}
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2680/workflows/8b1c0d0a-7458-4b43-9bba-ac96b9bfe64f/jobs/58929/tests#failed-test-0
> https://ci-cassandra.apache.org/job/Cassandra-trunk/1859/#showFailuresLink



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19538) Test Failure: test_assassinate_valid_node

2024-04-18 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838624#comment-17838624
 ] 

Sam Tunnicliffe commented on CASSANDRA-19538:
-

+1 to the followup commit too

> Test Failure: test_assassinate_valid_node
> -
>
> Key: CASSANDRA-19538
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19538
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI, Test/dtest/python
>Reporter: Ekaterina Dimitrova
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Failing consistently on trunk:
> {code:java}
> ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after 120.11/120 
> seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log:
>  Head: INFO  [Messaging-EventLoop-3-1] 2024-04-03 19:37:3
>  Tail: ... some nodes were not ready
> INFO  [OptionalTasks:1] 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 
> - Setup task failed with error, rescheduling
> self = 
> def test_assassinate_valid_node(self):
> """
> @jira_ticket CASSANDRA-16588
> Test that after taking two non-seed nodes down and assassinating
> one of them, the other can come back up.
> """
> cluster = self.cluster
> 
> cluster.populate(5).start()
> node1 = cluster.nodelist()[0]
> node3 = cluster.nodelist()[2]
> 
> self.cluster.set_configuration_options({
> 'seed_provider': [{'class_name': 
> 'org.apache.cassandra.locator.SimpleSeedProvider',
>'parameters': [{'seeds': node1.address()}]
>   }]
> })
> 
> non_seed_nodes = cluster.nodelist()[-2:]
> for node in non_seed_nodes:
> node.stop()
> 
> assassination_target = non_seed_nodes[0]
> logger.debug("Assassinating non-seed node 
> {}".format(assassination_target.address()))
> out, err, _ = node1.nodetool("assassinate 
> {}".format(assassination_target.address()))
> assert_stderr_clean(err)
> 
> logger.debug("Starting non-seed nodes")
> for node in non_seed_nodes:
> >   node.start()
> gossip_test.py:78: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:915: in start
> node.watch_log_for_alive(self, from_mark=mark)
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:684: in 
> watch_log_for_alive
> self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, 
> filename=filename)
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:608: in watch_log_for
> TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name,
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> start = 1712173052.8186479, timeout = 120
> msg = "Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log:\n 
> Head: INFO  [Messaging-EventLoop-3-1] 2024-04-03 1...[OptionalTasks:1] 
> 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 - Setup task failed 
> with error, rescheduling\n"
> node = 'node1'
> @staticmethod
> def raise_if_passed(start, timeout, msg, node=None):
> if start + timeout < time.time():
> >   raise TimeoutError.create(start, timeout, msg, node)
> E   ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after 
> 120.11/120 seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in 
> system.log:
> EHead: INFO  [Messaging-EventLoop-3-1] 2024-04-03 19:37:3
> ETail: ... some nodes were not ready
> E   INFO  [OptionalTasks:1] 2024-04-03 19:39:30,454 
> CassandraRoleManager.java:484 - Setup task failed with error, rescheduling
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:56: TimeoutError
> {code}
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2680/workflows/8b1c0d0a-7458-4b43-9bba-ac96b9bfe64f/jobs/58929/tests#failed-test-0
> https://ci-cassandra.apache.org/job/Cassandra-trunk/1859/#showFailuresLink



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19567) Minimize the heap consumption when registering metrics

2024-04-18 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838546#comment-17838546
 ] 

Sam Tunnicliffe commented on CASSANDRA-19567:
-

bq. The problem is only reproducible on the x86 machine, the problem is not 
reproducible on the arm64. 

We've observed the increased heap usage on apple silicon, so I don't believe 
this is entirely true. 

> Minimize the heap consumption when registering metrics
> --
>
> Key: CASSANDRA-19567
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19567
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Maxim Muzafarov
>Assignee: Maxim Muzafarov
>Priority: Normal
> Fix For: 5.x
>
>
> The problem is only reproducible on the x86 machine, the problem is not 
> reproducible on the arm64. A quick analysis showed a lot of MetricName 
> objects stored in the heap, although the real cause could be related to 
> something else, the MetricName object requires extra attention.
> To reproduce run the command run locally:
> {code}
> ant test-jvm-dtest-some 
> -Dtest.name=org.apache.cassandra.distributed.test.ReadRepairTest
> {code}
> The error:
> {code:java}
> [junit-timeout] Exception in thread "main" java.lang.OutOfMemoryError: Java 
> heap space
> [junit-timeout]     at 
> java.base/java.lang.StringLatin1.newString(StringLatin1.java:769)
> [junit-timeout]     at 
> java.base/java.lang.StringBuffer.toString(StringBuffer.java:716)
> [junit-timeout]     at 
> org.apache.cassandra.CassandraBriefJUnitResultFormatter.endTestSuite(CassandraBriefJUnitResultFormatter.java:191)
> [junit-timeout]     at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.fireEndTestSuite(JUnitTestRunner.java:854)
> [junit-timeout]     at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:578)
> [junit-timeout]     at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1197)
> [junit-timeout]     at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1042)
> [junit-timeout] Testsuite: 
> org.apache.cassandra.distributed.test.ReadRepairTest-cassandra.testtag_IS_UNDEFINED
> [junit-timeout] Testsuite: 
> org.apache.cassandra.distributed.test.ReadRepairTest-cassandra.testtag_IS_UNDEFINED
>  Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0 sec
> [junit-timeout] 
> [junit-timeout] Testcase: 
> org.apache.cassandra.distributed.test.ReadRepairTest:readRepairRTRangeMovementTest-cassandra.testtag_IS_UNDEFINED:
>     Caused an ERROR
> [junit-timeout] Forked Java VM exited abnormally. Please note the time in the 
> report does not reflect the time until the VM exit.
> [junit-timeout] junit.framework.AssertionFailedError: Forked Java VM exited 
> abnormally. Please note the time in the report does not reflect the time 
> until the VM exit.
> [junit-timeout]     at 
> jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> [junit-timeout]     at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [junit-timeout]     at java.base/java.util.Vector.forEach(Vector.java:1365)
> [junit-timeout]     at 
> jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> [junit-timeout]     at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [junit-timeout]     at 
> jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> [junit-timeout]     at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [junit-timeout]     at java.base/java.util.Vector.forEach(Vector.java:1365)
> [junit-timeout]     at 
> jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> [junit-timeout]     at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [junit-timeout]     at 
> jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> [junit-timeout]     at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [junit-timeout] 
> [junit-timeout] 
> [junit-timeout] Test org.apache.cassandra.distributed.test.ReadRepairTest 
> FAILED (crashed)BUILD FAILED
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19538) Test Failure: test_assassinate_valid_node

2024-04-17 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19538:

Status: Ready to Commit  (was: Review In Progress)

+1 LGTM. 

Using an incorrect {{lastModified}} was causing updates to gossip state not to 
fire, because it looked to the listener like nothing had changed.

> Test Failure: test_assassinate_valid_node
> -
>
> Key: CASSANDRA-19538
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19538
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI, Test/dtest/python
>Reporter: Ekaterina Dimitrova
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Failing consistently on trunk:
> {code:java}
> ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after 120.11/120 
> seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log:
>  Head: INFO  [Messaging-EventLoop-3-1] 2024-04-03 19:37:3
>  Tail: ... some nodes were not ready
> INFO  [OptionalTasks:1] 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 
> - Setup task failed with error, rescheduling
> self = 
> def test_assassinate_valid_node(self):
> """
> @jira_ticket CASSANDRA-16588
> Test that after taking two non-seed nodes down and assassinating
> one of them, the other can come back up.
> """
> cluster = self.cluster
> 
> cluster.populate(5).start()
> node1 = cluster.nodelist()[0]
> node3 = cluster.nodelist()[2]
> 
> self.cluster.set_configuration_options({
> 'seed_provider': [{'class_name': 
> 'org.apache.cassandra.locator.SimpleSeedProvider',
>'parameters': [{'seeds': node1.address()}]
>   }]
> })
> 
> non_seed_nodes = cluster.nodelist()[-2:]
> for node in non_seed_nodes:
> node.stop()
> 
> assassination_target = non_seed_nodes[0]
> logger.debug("Assassinating non-seed node 
> {}".format(assassination_target.address()))
> out, err, _ = node1.nodetool("assassinate 
> {}".format(assassination_target.address()))
> assert_stderr_clean(err)
> 
> logger.debug("Starting non-seed nodes")
> for node in non_seed_nodes:
> >   node.start()
> gossip_test.py:78: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:915: in start
> node.watch_log_for_alive(self, from_mark=mark)
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:684: in 
> watch_log_for_alive
> self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, 
> filename=filename)
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:608: in watch_log_for
> TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name,
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> start = 1712173052.8186479, timeout = 120
> msg = "Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log:\n 
> Head: INFO  [Messaging-EventLoop-3-1] 2024-04-03 1...[OptionalTasks:1] 
> 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 - Setup task failed 
> with error, rescheduling\n"
> node = 'node1'
> @staticmethod
> def raise_if_passed(start, timeout, msg, node=None):
> if start + timeout < time.time():
> >   raise TimeoutError.create(start, timeout, msg, node)
> E   ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after 
> 120.11/120 seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in 
> system.log:
> EHead: INFO  [Messaging-EventLoop-3-1] 2024-04-03 19:37:3
> ETail: ... some nodes were not ready
> E   INFO  [OptionalTasks:1] 2024-04-03 19:39:30,454 
> CassandraRoleManager.java:484 - Setup task failed with error, rescheduling
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:56: TimeoutError
> {code}
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2680/workflows/8b1c0d0a-7458-4b43-9bba-ac96b9bfe64f/jobs/58929/tests#failed-test-0
> https://ci-cassandra.apache.org/job/Cassandra-trunk/1859/#showFailuresLink



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19538) Test Failure: test_assassinate_valid_node

2024-04-17 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19538:

Reviewers: Sam Tunnicliffe, Sam Tunnicliffe
   Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Status: Review In Progress  (was: Patch Available)

> Test Failure: test_assassinate_valid_node
> -
>
> Key: CASSANDRA-19538
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19538
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI, Test/dtest/python
>Reporter: Ekaterina Dimitrova
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Failing consistently on trunk:
> {code:java}
> ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after 120.11/120 
> seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log:
>  Head: INFO  [Messaging-EventLoop-3-1] 2024-04-03 19:37:3
>  Tail: ... some nodes were not ready
> INFO  [OptionalTasks:1] 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 
> - Setup task failed with error, rescheduling
> self = 
> def test_assassinate_valid_node(self):
> """
> @jira_ticket CASSANDRA-16588
> Test that after taking two non-seed nodes down and assassinating
> one of them, the other can come back up.
> """
> cluster = self.cluster
> 
> cluster.populate(5).start()
> node1 = cluster.nodelist()[0]
> node3 = cluster.nodelist()[2]
> 
> self.cluster.set_configuration_options({
> 'seed_provider': [{'class_name': 
> 'org.apache.cassandra.locator.SimpleSeedProvider',
>'parameters': [{'seeds': node1.address()}]
>   }]
> })
> 
> non_seed_nodes = cluster.nodelist()[-2:]
> for node in non_seed_nodes:
> node.stop()
> 
> assassination_target = non_seed_nodes[0]
> logger.debug("Assassinating non-seed node 
> {}".format(assassination_target.address()))
> out, err, _ = node1.nodetool("assassinate 
> {}".format(assassination_target.address()))
> assert_stderr_clean(err)
> 
> logger.debug("Starting non-seed nodes")
> for node in non_seed_nodes:
> >   node.start()
> gossip_test.py:78: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:915: in start
> node.watch_log_for_alive(self, from_mark=mark)
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:684: in 
> watch_log_for_alive
> self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, 
> filename=filename)
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:608: in watch_log_for
> TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name,
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> start = 1712173052.8186479, timeout = 120
> msg = "Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log:\n 
> Head: INFO  [Messaging-EventLoop-3-1] 2024-04-03 1...[OptionalTasks:1] 
> 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 - Setup task failed 
> with error, rescheduling\n"
> node = 'node1'
> @staticmethod
> def raise_if_passed(start, timeout, msg, node=None):
> if start + timeout < time.time():
> >   raise TimeoutError.create(start, timeout, msg, node)
> E   ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after 
> 120.11/120 seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in 
> system.log:
> EHead: INFO  [Messaging-EventLoop-3-1] 2024-04-03 19:37:3
> ETail: ... some nodes were not ready
> E   INFO  [OptionalTasks:1] 2024-04-03 19:39:30,454 
> CassandraRoleManager.java:484 - Setup task failed with error, rescheduling
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:56: TimeoutError
> {code}
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2680/workflows/8b1c0d0a-7458-4b43-9bba-ac96b9bfe64f/jobs/58929/tests#failed-test-0
> https://ci-cassandra.apache.org/job/Cassandra-trunk/1859/#showFailuresLink



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail

2024-04-17 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19514:

Reviewers: Alex Petrov, Blake Eggleston, Marcus Eriksson  (was: Blake 
Eggleston)

> When jvm-dtest is shutting down an instance TCM retries block the shutdown 
> causing the test to fail
> ---
>
> Key: CASSANDRA-19514
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19514
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership, Test/dtest/java
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks
> {code}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException
>org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540)
>
> org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098)
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77)
>java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  Caused by: java.util.concurrent.TimeoutException
>
> org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) 
> Suppressed: java.util.concurrent.TimeoutException
> {code}
> In debugger I found the blocked future and it was 
> src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on 
> src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail

2024-04-17 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838035#comment-17838035
 ] 

Sam Tunnicliffe commented on CASSANDRA-19514:
-

Trunk PR and CI results attached. The python dtest failure is CASSANDRA-19538, 
of the 2 python upgrade test failures, one looks like CASSANDRA-19520 and the 
other may be a timeout related.

> When jvm-dtest is shutting down an instance TCM retries block the shutdown 
> causing the test to fail
> ---
>
> Key: CASSANDRA-19514
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19514
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership, Test/dtest/java
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks
> {code}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException
>org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540)
>
> org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098)
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77)
>java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  Caused by: java.util.concurrent.TimeoutException
>
> org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) 
> Suppressed: java.util.concurrent.TimeoutException
> {code}
> In debugger I found the blocked future and it was 
> src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on 
> src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail

2024-04-17 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19514:

Attachment: ci_summary.html
result_details.tar.gz

> When jvm-dtest is shutting down an instance TCM retries block the shutdown 
> causing the test to fail
> ---
>
> Key: CASSANDRA-19514
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19514
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership, Test/dtest/java
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks
> {code}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException
>org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540)
>
> org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098)
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77)
>java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  Caused by: java.util.concurrent.TimeoutException
>
> org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) 
> Suppressed: java.util.concurrent.TimeoutException
> {code}
> In debugger I found the blocked future and it was 
> src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on 
> src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail

2024-04-17 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19514:

Status: Patch Available  (was: In Progress)

> When jvm-dtest is shutting down an instance TCM retries block the shutdown 
> causing the test to fail
> ---
>
> Key: CASSANDRA-19514
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19514
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership, Test/dtest/java
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks
> {code}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException
>org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540)
>
> org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098)
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77)
>java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  Caused by: java.util.concurrent.TimeoutException
>
> org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) 
> Suppressed: java.util.concurrent.TimeoutException
> {code}
> In debugger I found the blocked future and it was 
> src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on 
> src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2024-04-15 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837201#comment-17837201
 ] 

Sam Tunnicliffe commented on CASSANDRA-12937:
-

So you're suggesting that we could have different default values in yaml across 
the cluster, but that all nodes actually apply the same value regardless of 
their own configured default? Which specific value makes it into schema just 
depends on which instance acts as the coordinator for a given DCL statement? It 
seems like if we actually want these to be cluster wide values and not 
configurable on a per-node basis the defaults themselves should be in TCM, 
independently of the schema transformations. In the past, we've used per-node 
configuration like this to experiment with new compression algorithms on a 
per-node basis and I can imagine potentially wanting to do the same with things 
like compaction, so I'm not entirely convinced that this assumption is correct. 

As far as the serialization format, schema transformations have to be 
round-trippable via CQL for the purposes of recreating a schema from a 
snapshot. So I don't think that using the CQL itself as the format is 
inherently flawed and it does have a couple of big positives, namely that it's 
great for visibility (for operators or when debugging) and that it doesn't 
invent a new format, that we have version and manage as new 
flags/features/defaults are added. 
   
We should just need fully resolve and expand a DCL statement before serializing 
it in {{SchemaAlteringStatement}} which would be entirely possible, but I 
remain unconvinced that just picking the defaults from whatever node happens to 
be coordinating is the right way to go.   

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Stefan Miklosovic
>Priority: Low
>  Labels: AdventCalendar2021
> Fix For: 5.x
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2024-04-15 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837179#comment-17837179
 ] 

Sam Tunnicliffe edited comment on CASSANDRA-12937 at 4/15/24 11:11 AM:
---

The problem with that is that the defaults may be different on every instance, 
so what exactly should be stored in the TCM log? Ideally we should store the 
value that is actually resolved during initial execution on each node so that 
it can be re-used if/when the transformation is reapplied. That should probably 
be in a parallel local datastructure though, not in the node's local log table 
as we don't want to ship those local defaults to peers when providing log 
catchup (because they should use their own defaults).  


was (Author: beobal):
The problem with that is that the defaults may be different on every instance, 
so what exactly should be stored in the TCM log? Ideally we should store the 
value that is actually resolved during initial execution on each node to be 
persisted locally so that it can be re-used if/when the transformation is 
reapplied. That should probably be in a parallel local datastructure though, 
not in the node's local log table as we don't want to ship those local defaults 
to peers when providing log catchup (because they should use their own 
defaults).  

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Stefan Miklosovic
>Priority: Low
>  Labels: AdventCalendar2021
> Fix For: 5.x
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2024-04-15 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837179#comment-17837179
 ] 

Sam Tunnicliffe commented on CASSANDRA-12937:
-

The problem with that is that the defaults may be different on every instance, 
so what exactly should be stored in the TCM log? Ideally we should store the 
value that is actually resolved during initial execution on each node to be 
persisted locally so that it can be re-used if/when the transformation is 
reapplied. That should probably be in a parallel local datastructure though, 
not in the node's local log table as we don't want to ship those local defaults 
to peers when providing log catchup (because they should use their own 
defaults).  

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Stefan Miklosovic
>Priority: Low
>  Labels: AdventCalendar2021
> Fix For: 5.x
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18954) Transformations should be pure so that replaying them results in the same outcome regardless of the node state or configuration

2024-04-15 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837177#comment-17837177
 ] 

Sam Tunnicliffe commented on CASSANDRA-18954:
-

[~jlewandowski] 
bq.  CASSANDRA-12937 problem is caused by the fact that the transformations are 
not pure. It is not enough that they are side-effects-free, they also cannot 
depend on any external properties other than the current cluster state and the 
store transformation data.

Yes, this is exactly what I mean and we're looking into a proper fix for this 
now
 

> Transformations should be pure so that replaying them results in the same 
> outcome regardless of the node state or configuration
> ---
>
> Key: CASSANDRA-18954
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18954
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
>
> Discussed on Slack



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail

2024-04-12 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe reassigned CASSANDRA-19514:
---

Assignee: Sam Tunnicliffe  (was: David Capwell)

> When jvm-dtest is shutting down an instance TCM retries block the shutdown 
> causing the test to fail
> ---
>
> Key: CASSANDRA-19514
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19514
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership, Test/dtest/java
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks
> {code}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException
>org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540)
>
> org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098)
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77)
>java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  Caused by: java.util.concurrent.TimeoutException
>
> org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) 
> Suppressed: java.util.concurrent.TimeoutException
> {code}
> In debugger I found the blocked future and it was 
> src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on 
> src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets

2024-04-11 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19344:

Attachment: remove-n4-pre-19344.txt
remove-n4-post-19344.txt

> Range movements involving transient replicas must safely enact changes to 
> read and write replica sets
> -
>
> Key: CASSANDRA-19344
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19344
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html, remove-n4-post-19344.txt, 
> remove-n4-pre-19344.txt, result_details.tar.gz
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> (edit) This was originally opened due to a flaky test 
> {{org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17}}
> The test can fail in two different ways:
> {code:java}
> junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), 
> (31,50)] at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203)
>  at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> as in here - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0]
> and
> {code:java}
> junit.framework.AssertionFailedError: nodetool command [removenode, 
> 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: 
> stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use 
> decommission command to remove it from the ring -- StackTrace -- 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> 

[jira] [Commented] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets

2024-04-11 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836302#comment-17836302
 ] 

Sam Tunnicliffe commented on CASSANDRA-19344:
-

The actual cause was that the way we construct placement deltas for a 
PlacementTransitionPlan did not properly consider transientness. Multi-step 
operations always follow the pattern:

* add new write replicas
* add new read replicas/remove old read replicas
* remove old write replicas

So when an operation causes a replica to transition from TRANSIENT to FULL for 
the same range (or part of a range), it could become a FULL read replica before 
becoming a FULL write replica.
Consider this simplified example where we remove N4 and the effect on N2:

{code}
RF=3/1
At START 
  10203040 
+-+-+-+-+-+
  N1N2N3N4

N2 replicates:
  (10,20]   - FULL (Primary Range)
  (,10] + (40,] - FULL
  (30,40]   - TRANSIENT

After FINISH

  102030
+-+-+-+---+
  N1N2N3

N2 replicates:
  (10,20]   - FULL (Primary Range)
  (,10] + (30,] - FULL
  (20,30]   - TRANSIENT

In removing N4, N2 gains (20,30] TRANSIENT and (30,40] TRANSIENT -> FULL 
Potential problem -> 
for READS N2 becomes FULL(30,40] after MID_LEAVE 
for WRITES N2 only becomes FULL(30,40] after FINISH_LEAVE
so between the 2 events, coordinators will not send writes to N2 unless one 
of the other replicas is unresponsive. 
Coordinators will send reads to N2 during this window though. 
If cleanup is run before N2 becomes a FULL replica for (30,40], any data 
for that range (including that which was 
just streamed to it) will be purged.
{code}

Below is an illustration of the ranges replicated by N2 at each step:

{code}
+-+--+
|EPOCH| STATE  | RANGES REPLICATED BY N2
 |
|-++-|
|0| START STATE| WRITES -> FULL: [(40,], (,10], (10,20]] 
TRANSIENT: [(30,40]]|
| ||  READS -> FULL: [(40,], (,10], (10,20]] 
TRANSIENT: [(30,40]]|
|-++-|
|1| ENACT START_LEAVE(N4)  | WRITES -> FULL: [(40,], (,10], (10,20]] 
TRANSIENT: [(20,30], (30,40]]   |
| ||  READS -> FULL: [(40,], (,10], (10,20]] 
TRANSIENT: [(30,40]]|
|-++-|
|2| ENACT MID_LEAVE(N4)| WRITES -> FULL: [(40,], (,10], (10,20]]
  TRANSIENT: [(20,30], (30,40]]  |
| ||  READS -> FULL: [(40,], (,10], (10,20], 
(30,40]] TRANSIENT: [(20,30]]   |
|-++-|
|3| ENACT FINISH_LEAVE(N4) | WRITES -> FULL: [(30,], (,10], (10,20]] 
TRANSIENT: [(20,30]]|
| ||  READS -> FULL: [(30,], (,10], (10,20]] 
TRANSIENT: [(20,30]]|
+-++-+
{code}

After applying the fix here, these are changed so that the {{(30,40]}} changing 
from {{TRANSIENT}} to {{FULL}} for 
writes is part of enacting the {{START_LEAVE(N4)}} in epoch 1, i.e. before N2 
becomes a FULL replica for reads of
{{(30,40]}} when {{MID_LEAVE(N4)}} is enacted in epoch 2. 

{code}
+-+--+
|EPOCH| STATE  | RANGES REPLICATED BY N2
 |
|-++-|
|0| START STATE|  WRITES -> FULL: [(40,], (,10], (10,20]] 
TRANSIENT: [(30,40]]   |
| ||   READS -> FULL: [(40,], (,10], (10,20]] 
TRANSIENT: [(30,40]]   |
|-++-|
|1| ENACT START_LEAVE(N4)  | WRITES -> FULL: [(40,], (,10], (10,20], 
(30,40]] TRANSIENT: [(20,30]]   |
| ||  READS -> FULL: [(40,], (,10], (10,20]]
  TRANSIENT: [(30,40]]   |

[jira] [Commented] (CASSANDRA-19516) Use Transformation.Kind.id in local and distributed log tables

2024-04-11 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836280#comment-17836280
 ] 

Sam Tunnicliffe commented on CASSANDRA-19516:
-

+1

> Use Transformation.Kind.id in local and distributed log tables
> --
>
> Key: CASSANDRA-19516
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19516
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should store {{Kind.id}} added in CASSANDRA-19390 in the local and 
> distributed log tables. Virtual table will still do the id -> string lookup 
> for easier reading



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19516) Use Transformation.Kind.id in local and distributed log tables

2024-04-11 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19516:

Status: Review In Progress  (was: Patch Available)

> Use Transformation.Kind.id in local and distributed log tables
> --
>
> Key: CASSANDRA-19516
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19516
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should store {{Kind.id}} added in CASSANDRA-19390 in the local and 
> distributed log tables. Virtual table will still do the id -> string lookup 
> for easier reading



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner

2024-04-11 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19482:

Source Control Link: 
https://github.com/apache/cassandra/commit/728b9ec4c604f6939facf62a261ca795ef6dbf0c
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Simplify metadata log implementation using custom partitioner
> -
>
> Key: CASSANDRA-19482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19482
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The distributed metadata log table can be simplified by leveraging the fact 
> that replicas are all responsible for the entire token range. Given this 
> assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in 
> CASSANDRA-19391 to make it much easier to append to/read from the tail of the 
> log, effectively removing the need for the {{Period}} construct. This will 
> also improve apply to the local metadata log used at startup.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner

2024-04-11 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19482:

Fix Version/s: 5.1-alpha1
   (was: 5.x)

> Simplify metadata log implementation using custom partitioner
> -
>
> Key: CASSANDRA-19482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19482
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The distributed metadata log table can be simplified by leveraging the fact 
> that replicas are all responsible for the entire token range. Given this 
> assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in 
> CASSANDRA-19391 to make it much easier to append to/read from the tail of the 
> log, effectively removing the need for the {{Period}} construct. This will 
> also improve apply to the local metadata log used at startup.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner

2024-04-11 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19482:

Status: Ready to Commit  (was: Review In Progress)

+1 from Alex on the PR

> Simplify metadata log implementation using custom partitioner
> -
>
> Key: CASSANDRA-19482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19482
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The distributed metadata log table can be simplified by leveraging the fact 
> that replicas are all responsible for the entire token range. Given this 
> assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in 
> CASSANDRA-19391 to make it much easier to append to/read from the tail of the 
> log, effectively removing the need for the {{Period}} construct. This will 
> also improve apply to the local metadata log used at startup.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner

2024-04-11 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19482:

Status: Needs Committer  (was: Patch Available)

> Simplify metadata log implementation using custom partitioner
> -
>
> Key: CASSANDRA-19482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19482
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The distributed metadata log table can be simplified by leveraging the fact 
> that replicas are all responsible for the entire token range. Given this 
> assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in 
> CASSANDRA-19391 to make it much easier to append to/read from the tail of the 
> log, effectively removing the need for the {{Period}} construct. This will 
> also improve apply to the local metadata log used at startup.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner

2024-04-11 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19482:

Reviewers: Alex Petrov
   Status: Review In Progress  (was: Needs Committer)

> Simplify metadata log implementation using custom partitioner
> -
>
> Key: CASSANDRA-19482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19482
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The distributed metadata log table can be simplified by leveraging the fact 
> that replicas are all responsible for the entire token range. Given this 
> assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in 
> CASSANDRA-19391 to make it much easier to append to/read from the tail of the 
> log, effectively removing the need for the {{Period}} construct. This will 
> also improve apply to the local metadata log used at startup.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18954) Transformations should be pure so that replaying them results in the same outcome regardless of the node state or configuration

2024-04-10 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835805#comment-17835805
 ] 

Sam Tunnicliffe commented on CASSANDRA-18954:
-

Yep, fair enough. I wouldn't expect you to have followed those things, my 
intention was really to give a heads up that I don't think this is an issue 
anymore and that I'd probably close it soon. That was the thinking when I made 
the original comment, before CASSANDRA-12937 highlighted the issue with 
mutating local config between restarts. I'll leave this alone until we resolve 
that, which we're working towards now.

> Transformations should be pure so that replaying them results in the same 
> outcome regardless of the node state or configuration
> ---
>
> Key: CASSANDRA-18954
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18954
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
>
> Discussed on Slack



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner

2024-04-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19482:

Attachment: ci_summary.html
result_details.tar.gz

> Simplify metadata log implementation using custom partitioner
> -
>
> Key: CASSANDRA-19482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19482
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The distributed metadata log table can be simplified by leveraging the fact 
> that replicas are all responsible for the entire token range. Given this 
> assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in 
> CASSANDRA-19391 to make it much easier to append to/read from the tail of the 
> log, effectively removing the need for the {{Period}} construct. This will 
> also improve apply to the local metadata log used at startup.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner

2024-04-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19482:

Attachment: (was: result_details.tar.gz)

> Simplify metadata log implementation using custom partitioner
> -
>
> Key: CASSANDRA-19482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19482
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The distributed metadata log table can be simplified by leveraging the fact 
> that replicas are all responsible for the entire token range. Given this 
> assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in 
> CASSANDRA-19391 to make it much easier to append to/read from the tail of the 
> log, effectively removing the need for the {{Period}} construct. This will 
> also improve apply to the local metadata log used at startup.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner

2024-04-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19482:

Attachment: (was: ci_summary.html)

> Simplify metadata log implementation using custom partitioner
> -
>
> Key: CASSANDRA-19482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19482
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The distributed metadata log table can be simplified by leveraging the fact 
> that replicas are all responsible for the entire token range. Given this 
> assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in 
> CASSANDRA-19391 to make it much easier to append to/read from the tail of the 
> log, effectively removing the need for the {{Period}} construct. This will 
> also improve apply to the local metadata log used at startup.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18954) Transformations should be pure so that replaying them results in the same outcome regardless of the node state or configuration

2024-04-09 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835304#comment-17835304
 ] 

Sam Tunnicliffe commented on CASSANDRA-18954:
-

Actually, scratch that just for now - I think we need to address 
CASSANDRA-12937 first

> Transformations should be pure so that replaying them results in the same 
> outcome regardless of the node state or configuration
> ---
>
> Key: CASSANDRA-18954
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18954
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
>
> Discussed on Slack



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail

2024-04-03 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19514:

Resolution: (was: Fixed)
Status: Open  (was: Resolved)

Reopening this as the problem is entirely relevant to trunk so we should apply 
the patch there too.

> When jvm-dtest is shutting down an instance TCM retries block the shutdown 
> causing the test to fail
> ---
>
> Key: CASSANDRA-19514
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19514
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership, Test/dtest/java
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks
> {code}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException
>org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540)
>
> org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098)
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77)
>java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  Caused by: java.util.concurrent.TimeoutException
>
> org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) 
> Suppressed: java.util.concurrent.TimeoutException
> {code}
> In debugger I found the blocked future and it was 
> src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on 
> src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18954) Transformations should be pure so that replaying them results in the same outcome regardless of the node state or configuration

2024-04-03 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833530#comment-17833530
 ] 

Sam Tunnicliffe commented on CASSANDRA-18954:
-

[~jlewandowski] I think that this might be obsolete now, following a few 
changes that landed before {{cep-21-tcm}} was merged, plus CASSANDRA-19271 and 
CASSANDRA-19384. Replay of log entries during startup don't enact any side 
effects now as they are replayed. Would you mind if I closed this?

> Transformations should be pure so that replaying them results in the same 
> outcome regardless of the node state or configuration
> ---
>
> Key: CASSANDRA-18954
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18954
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
>
> Discussed on Slack



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19271) Improve setup and initialisation of LocalLog/LogSpec

2024-04-03 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19271:

Epic Link: CASSANDRA-19055

> Improve setup and initialisation of LocalLog/LogSpec
> 
>
> Key: CASSANDRA-19271
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19271
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Urgent
> Fix For: 5.1
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13855) Implement Http Seed provider

2024-03-22 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829941#comment-17829941
 ] 

Sam Tunnicliffe edited comment on CASSANDRA-13855 at 3/22/24 5:27 PM:
--

Not completely relevant to the discussion here, but slightly adjacent... the 
roles of both snitches and seeds have changed slightly in trunk. See 
CASSANDRA-19488 which I finally remembered to file for more detail on snitches. 
Seeds are now really only necessary as initial contact points for joining 
nodes. Technically, they do still perform the same function regarding gossip 
convergence, but that it is way less important/relevant now as we don't rely on 
gossip state for correctness. Of course, this doesn't make the goal of this 
JIRA any less valid, I just thought I should mention it. 



was (Author: beobal):
Not completely relevant to the discussion here, but slightly adjacent... the 
roles of both snitches and seeds have changed slightly in trunk. See 
CASSANDRA-XXX which I finally remembered to file for more detail on snitches. 
Seeds are now really only necessary as initial contact points for joining 
nodes. Technically, they do still perform the same function regarding gossip 
convergence, but that it is way less important/relevant now as we don't rely on 
gossip state for correctness. Of course, this doesn't make the goal of this 
JIRA any less valid, I just thought I should mention it. 


> Implement Http Seed provider
> 
>
> Key: CASSANDRA-13855
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13855
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Coordination, Legacy/Core
>Reporter: Jon Haddad
>Assignee: Claude Warren
>Priority: Low
>  Labels: lhf
> Fix For: 5.x
>
> Attachments: 0001-Add-URL-Seed-Provider-trunk.txt, signature.asc, 
> signature.asc, signature.asc
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Seems like including a dead simple seed provider that can fetch from a URL, 1 
> line per seed, would be useful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13855) Implement Http Seed provider

2024-03-22 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829941#comment-17829941
 ] 

Sam Tunnicliffe commented on CASSANDRA-13855:
-

Not completely relevant to the discussion here, but slightly adjacent... the 
roles of both snitches and seeds have changed slightly in trunk. See 
CASSANDRA-XXX which I finally remembered to file for more detail on snitches. 
Seeds are now really only necessary as initial contact points for joining 
nodes. Technically, they do still perform the same function regarding gossip 
convergence, but that it is way less important/relevant now as we don't rely on 
gossip state for correctness. Of course, this doesn't make the goal of this 
JIRA any less valid, I just thought I should mention it. 


> Implement Http Seed provider
> 
>
> Key: CASSANDRA-13855
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13855
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Coordination, Legacy/Core
>Reporter: Jon Haddad
>Assignee: Claude Warren
>Priority: Low
>  Labels: lhf
> Fix For: 5.x
>
> Attachments: 0001-Add-URL-Seed-Provider-trunk.txt, signature.asc, 
> signature.asc, signature.asc
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Seems like including a dead simple seed provider that can fetch from a URL, 1 
> line per seed, would be useful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19488) Ensure snitches always defer to ClusterMetadata

2024-03-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19488:

Change Category: Operability
 Complexity: Normal
  Fix Version/s: 5.x
 Status: Open  (was: Triage Needed)

> Ensure snitches always defer to ClusterMetadata
> ---
>
> Key: CASSANDRA-19488
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19488
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership, Messaging/Internode, Transactional 
> Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Internally, C* always uses {{ClusterMetadata}} as the source of topology 
> information when calculating data placements, replica plans etc and as such 
> the role of the snitch has been somewhat reduced. 
> Sorting and comparison functions as provided by specialisations like 
> {{DynamicEndpointSnitch}} are still used, but the snitch should only be 
> responsible for providing the DC and rack for a new node when it first joins 
> a cluster.
> Aside from initial startup and registration, snitch implementations should 
> always defer to {{{}ClusterMetadata{}}}, for DC and rack otherwise there is a 
> risk that the snitch config drifts out of sync with TCM and output from tools 
> like {{nodetool ring}} and {{gossipinfo}} becomes incorrect.
> A complication is that topology is used when opening connections to peers as 
> certain internode connection settings are variable at the DC level, so at the 
> time of connecting we want to check the location of the remote peer. Usually, 
> this is available from {{{}ClusterMetadata{}}}, but in the case of a brand 
> new node joining the cluster nothing is known a priori. The current 
> implementation assumes that the snitch will know the location of the new node 
> ahead of time, but in practice this is often not the case (though with 
> variants of {{PropertyFileSnitch}} it _should_ be), and the remote node is 
> temporarily assigned a default DC. This is problematic as it can cause the 
> internode connection settings which depend on DC to be incorrectly set. 
> Internode connections are long lived and any established while the DC is 
> unknown (potentially with incorrect config) will persist indefinitely. This 
> particular issue is not directly related to TCM and is present in earlier 
> versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19488) Ensure snitches always defer to ClusterMetadata

2024-03-22 Thread Sam Tunnicliffe (Jira)
Sam Tunnicliffe created CASSANDRA-19488:
---

 Summary: Ensure snitches always defer to ClusterMetadata
 Key: CASSANDRA-19488
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19488
 Project: Cassandra
  Issue Type: Improvement
  Components: Cluster/Membership, Messaging/Internode, Transactional 
Cluster Metadata
Reporter: Sam Tunnicliffe
Assignee: Sam Tunnicliffe


Internally, C* always uses {{ClusterMetadata}} as the source of topology 
information when calculating data placements, replica plans etc and as such the 
role of the snitch has been somewhat reduced. 

Sorting and comparison functions as provided by specialisations like 
{{DynamicEndpointSnitch}} are still used, but the snitch should only be 
responsible for providing the DC and rack for a new node when it first joins a 
cluster.

Aside from initial startup and registration, snitch implementations should 
always defer to {{{}ClusterMetadata{}}}, for DC and rack otherwise there is a 
risk that the snitch config drifts out of sync with TCM and output from tools 
like {{nodetool ring}} and {{gossipinfo}} becomes incorrect.

A complication is that topology is used when opening connections to peers as 
certain internode connection settings are variable at the DC level, so at the 
time of connecting we want to check the location of the remote peer. Usually, 
this is available from {{{}ClusterMetadata{}}}, but in the case of a brand new 
node joining the cluster nothing is known a priori. The current implementation 
assumes that the snitch will know the location of the new node ahead of time, 
but in practice this is often not the case (though with variants of 
{{PropertyFileSnitch}} it _should_ be), and the remote node is temporarily 
assigned a default DC. This is problematic as it can cause the internode 
connection settings which depend on DC to be incorrectly set. Internode 
connections are long lived and any established while the DC is unknown 
(potentially with incorrect config) will persist indefinitely. This particular 
issue is not directly related to TCM and is present in earlier versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19130) Implement transactional table truncation

2024-03-22 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829898#comment-17829898
 ] 

Sam Tunnicliffe commented on CASSANDRA-19130:
-

The way truncation works is that it writes a timestamp into a system table on 
each node, associated with the table being truncated (and a commitlog 
position). Then, when local reads and writes are done against that table, any 
cells with a timestamp earlier than the truncation is essentially discarded. If 
any node misses that message and so doesn't write the timestamp it won't do 
this filtering and so data can be resurrected. This is a strictly one time 
operation and there's no way for a node which does miss such a message to catch 
it up later, which is why {{TRUNCATE}} currently requires all nodes to be up.

With TCM, we can improve this by having an entry in the log which contains the 
truncation timestamp.  Then it can be distributed to peers the same way as any 
other log entry, allowing them to catch up if they miss it. Replicas and 
coordinators participating in a read already check that they're all up to date 
with each other attempt to catch up if not.

We shouldn't have to change how truncation works on the local level, just have 
{{TruncateStatement}} work by committing a new transform to the CMS. The 
trickiest bit will be to make sure that the {{execute}} method itself is 
side-effect free (i.e. it only produces a new ClusterMetadata). The way to do 
that is with a {{ChangeListener}} which implements a post-commit event to do 
the work of {{CFS::truncateBlocking}}

> Implement transactional table truncation
> 
>
> Key: CASSANDRA-19130
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19130
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Consistency/Coordination
>Reporter: Marcus Eriksson
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TRUNCATE table should leverage cluster metadata to ensure consistent 
> truncation timestamps across all replicas. The current implementation depends 
> on all nodes being available, but this could be reimplemented as a 
> {{Transformation}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19255) StorageService.getRangeToEndpointMap() MBean operation is running into NPE for LocalStrategy keysapces

2024-03-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19255:

Attachment: 19482_patch_for_19255.diff

> StorageService.getRangeToEndpointMap() MBean operation is running into NPE 
> for LocalStrategy keysapces
> --
>
> Key: CASSANDRA-19255
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19255
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: n.v.harikrishna
>Assignee: n.v.harikrishna
>Priority: Normal
> Fix For: 5.x
>
> Attachments: 19482_patch_for_19255.diff, ci_summary.html, 
> result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the StorageService's MBean operation getRangeToEndpointMap is called for 
> LocalStrategy keyspaces, then it is running into NPE. It is working in 
> earlier major version, but failing in trunk. It can be reproduced in local 
> using JConsole or using a tool like `jmxterm` (unfortunately these tools are 
> not giving full stacktrace). Observed the same behavior with 
> getRangeToEndpointWithPortMap operation too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19255) StorageService.getRangeToEndpointMap() MBean operation is running into NPE for LocalStrategy keysapces

2024-03-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19255:

Status: Needs Committer  (was: Patch Available)

> StorageService.getRangeToEndpointMap() MBean operation is running into NPE 
> for LocalStrategy keysapces
> --
>
> Key: CASSANDRA-19255
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19255
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: n.v.harikrishna
>Assignee: n.v.harikrishna
>Priority: Normal
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the StorageService's MBean operation getRangeToEndpointMap is called for 
> LocalStrategy keyspaces, then it is running into NPE. It is working in 
> earlier major version, but failing in trunk. It can be reproduced in local 
> using JConsole or using a tool like `jmxterm` (unfortunately these tools are 
> not giving full stacktrace). Observed the same behavior with 
> getRangeToEndpointWithPortMap operation too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19255) StorageService.getRangeToEndpointMap() MBean operation is running into NPE for LocalStrategy keysapces

2024-03-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19255:

  Fix Version/s: 5.x
  Since Version: 5.x
Source Control Link: 
https://github.com/apache/cassandra/commit/a69c8657d75de627fb1fe518bfe1d657add11740
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

LGTM too, committed as {{a69c8657d75de627fb1fe518bfe1d657add11740}} (with a 
couple of extremely minor changes).
One thing to note is that this will need slight readjustment if CASSANDRA-19482 
is committed as the way we handle the {{MetaStrategy}} keyspace is changed. 
The attached diff will fix that if/when necessary.


> StorageService.getRangeToEndpointMap() MBean operation is running into NPE 
> for LocalStrategy keysapces
> --
>
> Key: CASSANDRA-19255
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19255
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: n.v.harikrishna
>Assignee: n.v.harikrishna
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the StorageService's MBean operation getRangeToEndpointMap is called for 
> LocalStrategy keyspaces, then it is running into NPE. It is working in 
> earlier major version, but failing in trunk. It can be reproduced in local 
> using JConsole or using a tool like `jmxterm` (unfortunately these tools are 
> not giving full stacktrace). Observed the same behavior with 
> getRangeToEndpointWithPortMap operation too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19255) StorageService.getRangeToEndpointMap() MBean operation is running into NPE for LocalStrategy keysapces

2024-03-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19255:

Status: Ready to Commit  (was: Review In Progress)

> StorageService.getRangeToEndpointMap() MBean operation is running into NPE 
> for LocalStrategy keysapces
> --
>
> Key: CASSANDRA-19255
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19255
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: n.v.harikrishna
>Assignee: n.v.harikrishna
>Priority: Normal
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the StorageService's MBean operation getRangeToEndpointMap is called for 
> LocalStrategy keyspaces, then it is running into NPE. It is working in 
> earlier major version, but failing in trunk. It can be reproduced in local 
> using JConsole or using a tool like `jmxterm` (unfortunately these tools are 
> not giving full stacktrace). Observed the same behavior with 
> getRangeToEndpointWithPortMap operation too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19255) StorageService.getRangeToEndpointMap() MBean operation is running into NPE for LocalStrategy keysapces

2024-03-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19255:

Status: Review In Progress  (was: Needs Committer)

> StorageService.getRangeToEndpointMap() MBean operation is running into NPE 
> for LocalStrategy keysapces
> --
>
> Key: CASSANDRA-19255
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19255
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: n.v.harikrishna
>Assignee: n.v.harikrishna
>Priority: Normal
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the StorageService's MBean operation getRangeToEndpointMap is called for 
> LocalStrategy keyspaces, then it is running into NPE. It is working in 
> earlier major version, but failing in trunk. It can be reproduced in local 
> using JConsole or using a tool like `jmxterm` (unfortunately these tools are 
> not giving full stacktrace). Observed the same behavior with 
> getRangeToEndpointWithPortMap operation too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets

2024-03-22 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829794#comment-17829794
 ] 

Sam Tunnicliffe commented on CASSANDRA-19344:
-

The linked PR modifies a number existing tests to make the original failure 
mode deterministic. It also adds support for transient replication to  
{{PlacementSimulator}}  {{MetadataChangeSimulationTest}} and the associated 
{{{}TokenPlacementModel{}}}. Finally, it modifies the way the 
{{PlacementTransitionPlan}} is prepared for operations involving range 
movements to ensure that any transition from a transient to full replica 
happens safely.

> Range movements involving transient replicas must safely enact changes to 
> read and write replica sets
> -
>
> Key: CASSANDRA-19344
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19344
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> (edit) This was originally opened due to a flaky test 
> {{org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17}}
> The test can fail in two different ways:
> {code:java}
> junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), 
> (31,50)] at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203)
>  at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> as in here - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0]
> and
> {code:java}
> junit.framework.AssertionFailedError: nodetool command [removenode, 
> 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: 
> stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use 
> decommission command to remove it from the ring -- StackTrace -- 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> 

[jira] [Updated] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets

2024-03-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19344:

Description: 
(edit) This was originally opened due to a flaky test 
{{org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17}}

The test can fail in two different ways:
{code:java}
junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), (31,50)] 
at 
org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203)
 at 
org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183)
 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method) at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
 at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
as in here - 
[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0]
and
{code:java}
junit.framework.AssertionFailedError: nodetool command [removenode, 
6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: 
stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use decommission 
command to remove it from the ring -- StackTrace -- 
java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and owns 
this ID. Use decommission command to remove it from the ring at 
org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
 at 
org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
 at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388) 
at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
 at 
org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
 at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
 at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: 
java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and owns 
this ID. Use decommission command to remove it from the ring at 
org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
 at 
org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
 at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388) 
at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
 at 
org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
 at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
 at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 at java.base/java.lang.Thread.run(Thread.java:833) at 
org.apache.cassandra.distributed.api.NodeToolResult$Asserts.fail(NodeToolResult.java:214)
 at 
org.apache.cassandra.distributed.api.NodeToolResult$Asserts.success(NodeToolResult.java:97)
 at 
org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:173)
 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method) at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
 at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
as in here - 

[jira] [Updated] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets

2024-03-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19344:

Summary: Range movements involving transient replicas must safely enact 
changes to read and write replica sets  (was: Test Failure: 
org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17)

> Range movements involving transient replicas must safely enact changes to 
> read and write replica sets
> -
>
> Key: CASSANDRA-19344
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19344
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The test can fail in two different ways:
> {code:java}
> junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), 
> (31,50)] at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203)
>  at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> as in here - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0]
> and
> {code:java}
> junit.framework.AssertionFailedError: nodetool command [removenode, 
> 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: 
> stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use 
> decommission command to remove it from the ring -- StackTrace -- 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> 

[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner

2024-03-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19482:

Attachment: ci_summary.html
result_details.tar.gz

> Simplify metadata log implementation using custom partitioner
> -
>
> Key: CASSANDRA-19482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19482
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The distributed metadata log table can be simplified by leveraging the fact 
> that replicas are all responsible for the entire token range. Given this 
> assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in 
> CASSANDRA-19391 to make it much easier to append to/read from the tail of the 
> log, effectively removing the need for the {{Period}} construct. This will 
> also improve apply to the local metadata log used at startup.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner

2024-03-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19482:

Authors: Marcus Eriksson, Sam Tunnicliffe  (was: Sam 
Tunnicliffe)
Test and Documentation Plan: 
New and existing tests in attached CI results.
The few failures/errors in the attached results are known (CASSANDRA-19343) or 
look to be infra-related.
 Status: Patch Available  (was: Open)

> Simplify metadata log implementation using custom partitioner
> -
>
> Key: CASSANDRA-19482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19482
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The distributed metadata log table can be simplified by leveraging the fact 
> that replicas are all responsible for the entire token range. Given this 
> assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in 
> CASSANDRA-19391 to make it much easier to append to/read from the tail of the 
> log, effectively removing the need for the {{Period}} construct. This will 
> also improve apply to the local metadata log used at startup.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner

2024-03-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19482:

Change Category: Code Clarity
 Complexity: Normal
  Fix Version/s: 5.x
 Status: Open  (was: Triage Needed)

> Simplify metadata log implementation using custom partitioner
> -
>
> Key: CASSANDRA-19482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19482
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The distributed metadata log table can be simplified by leveraging the fact 
> that replicas are all responsible for the entire token range. Given this 
> assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in 
> CASSANDRA-19391 to make it much easier to append to/read from the tail of the 
> log, effectively removing the need for the {{Period}} construct. This will 
> also improve apply to the local metadata log used at startup.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19344) Test Failure: org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17

2024-03-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19344:

Authors: Marcus Eriksson, Sam Tunnicliffe  (was: Sam 
Tunnicliffe)
Test and Documentation Plan: 
New and existing tests in attached CI results.
The few failures/errors in the attached results are known (CASSANDRA-19343) or 
look to be infra-related.

 Status: Patch Available  (was: In Progress)

When an instance moves from a transient to a full replica for a given range, it 
must begin acting as a full replica for writes before it does so for reads. 
Otherwise, consistency can be violated as data streamed to the instance early 
in the operation can be removed by cleanup if it occurs before the instance 
assumes responsibility for full writes. Also, coordinators will route read 
requests to instances which may not have received all preceding writes, causing 
unnecessary read repair or potentially inconsistent results. 

The root cause of the flaky test failure which originally produced this issue 
was that in {{TransientRangeMovementTest::testRemoveNode}} cleanup happened to 
run on one instance before it had enacted the final step of the removal 
operation, leading it to remove more data than it should. 

> Test Failure: 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17
> 
>
> Key: CASSANDRA-19344
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19344
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The test can fail in two different ways:
> {code:java}
> junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), 
> (31,50)] at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203)
>  at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> as in here - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0]
> and
> {code:java}
> junit.framework.AssertionFailedError: nodetool command [removenode, 
> 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: 
> stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use 
> decommission command to remove it from the ring -- StackTrace -- 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at 

[jira] [Updated] (CASSANDRA-19344) Test Failure: org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17

2024-03-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19344:

Attachment: ci_summary.html
result_details.tar.gz

> Test Failure: 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17
> 
>
> Key: CASSANDRA-19344
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19344
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The test can fail in two different ways:
> {code:java}
> junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), 
> (31,50)] at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203)
>  at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> as in here - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0]
> and
> {code:java}
> junit.framework.AssertionFailedError: nodetool command [removenode, 
> 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: 
> stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use 
> decommission command to remove it from the ring -- StackTrace -- 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) at 
> 

[jira] [Created] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner

2024-03-20 Thread Sam Tunnicliffe (Jira)
Sam Tunnicliffe created CASSANDRA-19482:
---

 Summary: Simplify metadata log implementation using custom 
partitioner
 Key: CASSANDRA-19482
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19482
 Project: Cassandra
  Issue Type: Improvement
  Components: Transactional Cluster Metadata
Reporter: Sam Tunnicliffe
Assignee: Sam Tunnicliffe


The distributed metadata log table can be simplified by leveraging the fact 
that replicas are all responsible for the entire token range. Given this 
assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in 
CASSANDRA-19391 to make it much easier to append to/read from the tail of the 
log, effectively removing the need for the {{Period}} construct. This will also 
improve apply to the local metadata log used at startup.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19374) CompactionStress tool write data cannot work because of new TCM feature

2024-03-14 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19374:

 Bug Category: Parent values: Code(13163)Level 1 values: Bug - Unclear 
Impact(13164)
   Complexity: Normal
Discovered By: User Report
 Severity: Normal
   Status: Open  (was: Triage Needed)

> CompactionStress tool write data cannot work because of new TCM feature
> ---
>
> Key: CASSANDRA-19374
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19374
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/stress, Transactional Cluster Metadata
>Reporter: Ling Mao
>Assignee: Ling Mao
>Priority: Normal
> Fix For: 5.1
>
> Attachments: compactstree_tcm_code_diff.txt
>
>
> CompactionStress tool write data cannot work anymore because of new TCM 
> feature
> {code:java}
> ./compaction-stress write -d /tmp/compaction -g 5 -p 
> ../cqlstress-example.yaml -t 4 
> Cannot write anything to the disk{code}
> It was introduced by the Patch: CEP-21. After applying that patch, this tool 
> cannot work
> anymore
> {code:java}
> Implementation of Transactional Cluster Metadata as described in CEP-21
> Hash: ae084237
> {code}
> I attach some key code related diff for further debugging



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19348) Fix serialization version check in InProgressSequences

2024-03-08 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19348:

Status: Ready to Commit  (was: Review In Progress)

+1 thanks!

> Fix serialization version check in InProgressSequences
> --
>
> Key: CASSANDRA-19348
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19348
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19391) Flush metadata snapshot table on every write

2024-03-06 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823906#comment-17823906
 ] 

Sam Tunnicliffe commented on CASSANDRA-19391:
-

+1 new CI looks good

> Flush metadata snapshot table on every write
> 
>
> Key: CASSANDRA-19391
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19391
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>
> We depend on the latest snapshot when starting up, flushing avoids gaps 
> between latest snapshot and the most recent local log entry



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19390) Transformation.Kind should contain an explicit integer id

2024-03-06 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823905#comment-17823905
 ] 

Sam Tunnicliffe commented on CASSANDRA-19390:
-

+1 new CI looks good

> Transformation.Kind should contain an explicit integer id
> -
>
> Key: CASSANDRA-19390
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19390
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19384) Avoid exposing intermediate node state during startup

2024-03-04 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19384:

Status: Ready to Commit  (was: Changes Suggested)

+1

> Avoid exposing intermediate node state during startup
> -
>
> Key: CASSANDRA-19384
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19384
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Attachments: ci_summary-1.html, ci_summary.html, 
> result_details.tar-1.gz, result_details.tar.gz
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During startup we replay the local log, during this time we might expose 
> intermediate node states (via JMX for example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19417) LIST SUPERUSERS cql command

2024-02-27 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821047#comment-17821047
 ] 

Sam Tunnicliffe commented on CASSANDRA-19417:
-

Sure, I don't have any objection to this in principle

> LIST SUPERUSERS cql command
> ---
>
> Key: CASSANDRA-19417
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19417
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/cqlsh
>Reporter: Shailaja Koppu
>Assignee: Shailaja Koppu
>Priority: Normal
>  Labels: CQL
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Developing a new CQL command LIST SUPERUSERS to return list of roles with 
> superuser privilege. This includes roles who acquired superuser privilege in 
> the hierarchy. 
> Context: LIST ROLES cql command lists roles, their membership details and 
> displays super=true for immediate superusers. But there can be roles who 
> acquired superuser privilege due to a grant. LIST ROLES command won't display 
> super=true for such roles and the only way to recognize such roles is to look 
> for atleast one row with super=true in the output of LIST ROLES OF  name> command. While this works to check is a given role has superuser 
> privilege, there may be services (for example, Sidecar) working with C* and 
> may need to maintain list of roles with superuser privilege. There is no 
> existing command/tool to retrieve such roles details. Hence developing this 
> command which returns all roles having superuser privilege.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19426) Fix Double Type issues in the Gossiper#maybeGossipToCMS

2024-02-27 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821046#comment-17821046
 ] 

Sam Tunnicliffe commented on CASSANDRA-19426:
-

It looks like the logic was largely copied from {{maybeGossipToSeed}}, so maybe 
adding a {{if (liveEndpoints.size() == 0)}} clause like that method would be 
the way to go. 


> Fix Double Type issues in the Gossiper#maybeGossipToCMS
> ---
>
> Key: CASSANDRA-19426
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19426
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Gossip, Transactional Cluster Metadata
>Reporter: Ling Mao
>Assignee: Ling Mao
>Priority: Low
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> _*issue-1:*_
> if liveEndpoints.size()=unreachableEndpoints.size()=0; probability will be 
> {*}_Infinity_{*}.
> randDbl <= probability will always be true, then sendGossip
> _*issue-2:*_ 
> comparing two double is safe by using *<* or {*}>{*}. However missing 
> accuracy will happen if we compare the equality of two double by 
> intuition({*}={*}). For example:
> {code:java}
> double probability = 0.1;
> double randDbl = 0.10001; // Slightly greater than probability
> if (randDbl <= probability)
> {
>     System.out.println("randDbl <= probability(always here)");
> }
> else
> {
>     System.out.println("randDbl > probability");
> }
> {code}
> A good example from: _*Gossiper#maybeGossipToUnreachableMember*_
> {code:java}
> if (randDbl < prob)
> {
> sendGossip(message, Sets.filter(unreachableEndpoints.keySet(),
>                                 ep -> 
> !isDeadState(getEndpointStateMap().get(ep;
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19344) Test Failure: org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17

2024-02-21 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19344:

Epic Link: CASSANDRA-19055

> Test Failure: 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17
> 
>
> Key: CASSANDRA-19344
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19344
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>
> The test can fail in two different ways:
> {code:java}
> junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), 
> (31,50)] at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203)
>  at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> as in here - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0]
> and
> {code:java}
> junit.framework.AssertionFailedError: nodetool command [removenode, 
> 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: 
> stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use 
> decommission command to remove it from the ring -- StackTrace -- 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) at 
> org.apache.cassandra.distributed.api.NodeToolResult$Asserts.fail(NodeToolResult.java:214)
>  at 
> 

[jira] [Commented] (CASSANDRA-19394) Rethink dumping of cluster metadata via CMSOperationsMBean

2024-02-14 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817422#comment-17817422
 ] 

Sam Tunnicliffe commented on CASSANDRA-19394:
-

A simple visual representation of the current metadata, without any ability or 
expectation to be able to parse, roundtrip, or pipe it into tooling would be 
very useful right now for debugging and development, but that's a totally 
different use case than what the current 
\{{CMSOperations::dumpClusterMetadata}} is for. 

> Rethink dumping of cluster metadata via CMSOperationsMBean
> --
>
> Key: CASSANDRA-19394
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19394
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/nodetool, Transactional Cluster Metadata
>Reporter: Stefan Miklosovic
>Priority: Normal
>
> I think there are two problems in the implementation of dumping 
> ClusterMetadata in CMSOperationsMBean
> 1) A dump is saved in a file and dumpClusterMetadata methods will return just 
> a file name where that dump is. However, nodetool / JMX call to MBean (or any 
> place this method is invoked from, we would like to offer a command in 
> nodetool which returns the dump) is meant to be used from anywhere, remotely, 
> so what happens when we execute nodetool or call these methods on a machine 
> different from a machine a node runs on? E.g. admins can just have some 
> jumpbox to a cluster they manage, they do not necessarily have access to 
> nodes themselves. So they would not be able to read it.
> 2) It creates temp file which is not deleted so /tmp will be populated with 
> these dumps until node is turned off which might take a lot of time and can 
> consume a lot of disk space if dumps are done frequently and they are big. An 
> adversary might just dump cluster metadata until no disk space is left.
> What I propose is that we would return all dump string, not just a filename 
> where we save it. We can also format the output on the client or we can tell 
> server what format we want the dump to be returned in. 
> If there is a concern about size of data to be returned, we might optionally 
> allow dumps to be returned as compressed by simple zipping on server and 
> unzipping on client where "zipper" is a standard java.util.zip so it 
> basically doesn't matter what jvm runs on client and server.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-19393) nodetool: group CMS-related commands into one command

2024-02-14 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe reassigned CASSANDRA-19393:
---

Assignee: Sam Tunnicliffe  (was: n.v.harikrishna)

> nodetool: group CMS-related commands into one command
> -
>
> Key: CASSANDRA-19393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19393
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/nodetool, Transactional Cluster Metadata
>Reporter: n.v.harikrishna
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The purpose of this ticket is to group all CMS-related commands under one 
> "nodetool cms" command where existing command would be subcommands of it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-19393) nodetool: group CMS-related commands into one command

2024-02-14 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe reassigned CASSANDRA-19393:
---

Assignee: n.v.harikrishna  (was: Sam Tunnicliffe)

> nodetool: group CMS-related commands into one command
> -
>
> Key: CASSANDRA-19393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19393
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/nodetool, Transactional Cluster Metadata
>Reporter: n.v.harikrishna
>Assignee: n.v.harikrishna
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The purpose of this ticket is to group all CMS-related commands under one 
> "nodetool cms" command where existing command would be subcommands of it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19394) Rethink dumping of cluster metadata via CMSOperationsMBean

2024-02-14 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817414#comment-17817414
 ] 

Sam Tunnicliffe commented on CASSANDRA-19394:
-

Part of the benefit of dumping only to a binary format is precisely that it is 
opaque and has a very limited set of uses. For now these include reloading a 
binary dump into a new or existing cluster (e.g. for DR, debugging or cloning 
purposes), or writing low level custom code to explore and modify the metadata. 
Like Marcus said, this is really intended as an escape hatch for when (if) 
things go catastrophically wrong and I agree with him that we should not change 
this yet.
{quote}consume a lot of disk space if dumps are done frequently and they are 
big.
{quote}
Dump files are current pretty tiny, even for clusters with many members and 
large schema.
{quote}An adversary might just dump cluster metadata until no disk space is 
left.
{quote}
Nodetool / JMX should be properly secured to prevent this. An adversary could 
simply run {{nodetool assassinate}} if they had access.

> Rethink dumping of cluster metadata via CMSOperationsMBean
> --
>
> Key: CASSANDRA-19394
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19394
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/nodetool, Transactional Cluster Metadata
>Reporter: Stefan Miklosovic
>Priority: Normal
>
> I think there are two problems in the implementation of dumping 
> ClusterMetadata in CMSOperationsMBean
> 1) A dump is saved in a file and dumpClusterMetadata methods will return just 
> a file name where that dump is. However, nodetool / JMX call to MBean (or any 
> place this method is invoked from, we would like to offer a command in 
> nodetool which returns the dump) is meant to be used from anywhere, remotely, 
> so what happens when we execute nodetool or call these methods on a machine 
> different from a machine a node runs on? E.g. admins can just have some 
> jumpbox to a cluster they manage, they do not necessarily have access to 
> nodes themselves. So they would not be able to read it.
> 2) It creates temp file which is not deleted so /tmp will be populated with 
> these dumps until node is turned off which might take a lot of time and can 
> consume a lot of disk space if dumps are done frequently and they are big. An 
> adversary might just dump cluster metadata until no disk space is left.
> What I propose is that we would return all dump string, not just a filename 
> where we save it. We can also format the output on the client or we can tell 
> server what format we want the dump to be returned in. 
> If there is a concern about size of data to be returned, we might optionally 
> allow dumps to be returned as compressed by simple zipping on server and 
> unzipping on client where "zipper" is a standard java.util.zip so it 
> basically doesn't matter what jvm runs on client and server.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19353) Cancel Signal when unused in Local Log

2024-02-13 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817048#comment-17817048
 ] 

Sam Tunnicliffe commented on CASSANDRA-19353:
-

+1

> Cancel Signal when unused in Local Log
> --
>
> Key: CASSANDRA-19353
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19353
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary.html, result_details.tar.gz
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19384) Avoid exposing intermediate node state during startup

2024-02-13 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19384:

Status: Changes Suggested  (was: Review In Progress)

left a few minor comments on the PR

> Avoid exposing intermediate node state during startup
> -
>
> Key: CASSANDRA-19384
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19384
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During startup we replay the local log, during this time we might expose 
> intermediate node states (via JMX for example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19384) Avoid exposing intermediate node state during startup

2024-02-13 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19384:

Reviewers: Alex Petrov, Sam Tunnicliffe, Sam Tunnicliffe  (was: Alex 
Petrov, Sam Tunnicliffe)
   Alex Petrov, Sam Tunnicliffe, Sam Tunnicliffe  (was: Alex 
Petrov, Sam Tunnicliffe)
   Status: Review In Progress  (was: Patch Available)

> Avoid exposing intermediate node state during startup
> -
>
> Key: CASSANDRA-19384
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19384
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> During startup we replay the local log, during this time we might expose 
> intermediate node states (via JMX for example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16999) add native_port_ssl to system_views.peers table

2024-02-13 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816980#comment-17816980
 ] 

Sam Tunnicliffe commented on CASSANDRA-16999:
-

bq. We should not leave features (dual port) broken even single port is 
preferable.

So let's deprecate it before 5.0 comes out and remove it in 5.x? 

> add native_port_ssl to system_views.peers table
> ---
>
> Key: CASSANDRA-16999
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16999
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Steve Lacerda
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> system.peers_v2 includes a “native_port” but has no notion of 
> native_transport_port vs. native_transport_port_ssl.  Given this limited 
> information, there’s no clear way for the driver to know that different ports 
> are being used for SSL vs. non-SSL or which of those two ports is identified 
> by “native_port”.
>  
> The issue we ran into is that the java driver, since it has no notion of the 
> transport port SSL, the driver was only using the contact points and was not 
> load balancing.
>  
> The customer had both set:
> native_transport_port: 9042
> native_transport_port_ssl: 9142
>  
> They were attempting to connect to 9142, but that was failing. They could 
> only use 9042, and so their applications load balancing was failing. We found 
> that any node that was a contact point was connecting, but the other nodes 
> were never acting as coordinators.
>  
> There are still issues in the driver, for which I have created JAVA-2967, 
> which also refers to JAVA-2638, but the system.peers and system.peers_v2 
> tables should both contain native_transport_port and 
> native_transport_port_ssl.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-19344) Test Failure: org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17

2024-02-13 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe reassigned CASSANDRA-19344:
---

Assignee: Sam Tunnicliffe

> Test Failure: 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17
> 
>
> Key: CASSANDRA-19344
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19344
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>
> The test can fail in two different ways:
> {code:java}
> junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), 
> (31,50)] at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203)
>  at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> as in here - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0]
> and
> {code:java}
> junit.framework.AssertionFailedError: nodetool command [removenode, 
> 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: 
> stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use 
> decommission command to remove it from the ring -- StackTrace -- 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) at 
> org.apache.cassandra.distributed.api.NodeToolResult$Asserts.fail(NodeToolResult.java:214)
>  at 
> 

[jira] [Updated] (CASSANDRA-19348) Fix serialization version check in InProgressSequences

2024-02-13 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19348:

Reviewers: Alex Petrov, Sam Tunnicliffe, Sam Tunnicliffe  (was: Alex 
Petrov, Sam Tunnicliffe)
   Alex Petrov, Sam Tunnicliffe, Sam Tunnicliffe  (was: Alex 
Petrov, Sam Tunnicliffe)
   Status: Review In Progress  (was: Patch Available)

+1 pending CI

> Fix serialization version check in InProgressSequences
> --
>
> Key: CASSANDRA-19348
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19348
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19348) Fix serialization version check in InProgressSequences

2024-02-13 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19348:

Test and Documentation Plan: CI
 Status: Patch Available  (was: In Progress)

> Fix serialization version check in InProgressSequences
> --
>
> Key: CASSANDRA-19348
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19348
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19390) Transformation.Kind should contain an explicit integer id

2024-02-13 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19390:

Status: Ready to Commit  (was: Review In Progress)

+1 pending CI

> Transformation.Kind should contain an explicit integer id
> -
>
> Key: CASSANDRA-19390
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19390
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
> Fix For: 5.x
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19391) Flush metadata snapshot table on every write

2024-02-13 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19391:

Status: Ready to Commit  (was: Review In Progress)

> Flush metadata snapshot table on every write
> 
>
> Key: CASSANDRA-19391
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19391
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
> Fix For: 5.x
>
>
> We depend on the latest snapshot when starting up, flushing avoids gaps 
> between latest snapshot and the most recent local log entry



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19391) Flush metadata snapshot table on every write

2024-02-13 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19391:

Status: Review In Progress  (was: Patch Available)

+1 pending CI

> Flush metadata snapshot table on every write
> 
>
> Key: CASSANDRA-19391
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19391
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
> Fix For: 5.x
>
>
> We depend on the latest snapshot when starting up, flushing avoids gaps 
> between latest snapshot and the most recent local log entry



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19390) Transformation.Kind should contain an explicit integer id

2024-02-13 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19390:

Status: Review In Progress  (was: Patch Available)

> Transformation.Kind should contain an explicit integer id
> -
>
> Key: CASSANDRA-19390
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19390
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
> Fix For: 5.x
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19216) CMS: Additional nodes are not added to CMS

2024-02-13 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816920#comment-17816920
 ] 

Sam Tunnicliffe commented on CASSANDRA-19216:
-

Sorry for taking so long to review. The patch LGTM and as far as I can see, any 
failures in the CI results appear unrelated. Might be worth rebasing and 
re-running CI though before commit. 
+1 from me if that looks ok.

> CMS: Additional nodes are not added to CMS
> --
>
> Key: CASSANDRA-19216
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19216
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Paul Chandler
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>
> When creating a 3 node cluster on a local machine using 3 loopback addresses
> 127.0.0.1
> 127.0.0.2
> 127.0.0.3
> The nodes are created correctly and the first node is assigned as a CMS node 
> as shown:
> {{bin/nodetool p 7199 status{-}{-}}}
> {{{}Datacenter: 
> datacenter1{}}}{{{}==={}}}{{{}Status=Up/Down{}}}{{{}|/ 
> State=Normal/Leaving/Joining/Moving{}}}{{{}-  Address    Load   Tokens  
> Owns (effective)  Host ID   Rack{}}}
> {{UN  127.0.0.3  75.55 KiB  16  76.0% 
> 6d194555-f6eb-41d0-c000-0003  rack1}}
> {{UN  127.0.0.2  67.97 KiB  16  59.3% 
> 6d194555-f6eb-41d0-c000-0002  rack1}}
> {{UN  127.0.0.1  81 KiB 16  64.7% 
> 6d194555-f6eb-41d0-c000-0001  rack1}}
> {{bin/nodetool -p 7199 describecms}}
> {{Cluster Metadata Service:}}
> {{Members: /127.0.0.1:7000}}
> {{Is Member: true}}
> {{Service State: LOCAL}}
> {{Is Migrating: false}}
> {{Epoch: 14}}
> {{Local Pending Count: 0}}
> {{Commits Paused: false}}
> {{{}Replication factor: 
> ReplicationParams{class=org.apache.cassandra.locator.MetaStrategy, 
> datacenter1=1{
> {{However after doing a reconfigurecms to create a replication factor of 3, 
> it seems that there is still only one member of cms.}}
> {{bin/nodetool -p 7199 reconfigurecms datacenter1:3}}
> {{bin/nodetool -p 7199 describecms}}
> {{Cluster Metadata Service:}}
> {{Members: /127.0.0.1:7000}}
> {{Is Member: true}}
> {{Service State: LOCAL}}
> Is Migrating: false
> Epoch: 16
> Local Pending Count: 0
> Commits Paused: false
> Replication factor: 
> ReplicationParams\{class=org.apache.cassandra.locator.MetaStrategy, 
> datacenter1=3}
> Is this correct, should all 3 nodes be shown in the Members section ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19247) Minor corrections to TCM Implementation documentation

2024-02-12 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19247:

Fix Version/s: 5.x
   (was: NA)

> Minor corrections to TCM Implementation documentation
> -
>
> Key: CASSANDRA-19247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19247
> Project: Cassandra
>  Issue Type: Bug
>  Components: Documentation, Transactional Cluster Metadata
>Reporter: n.v.harikrishna
>Assignee: n.v.harikrishna
>Priority: Normal
> Fix For: 5.x
>
>
> In the "[Bootstrap: InProgressSequence, PrepareJoin, 
> BootstrapAndJoin|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tcm/TCM_implementation.md#bootstrap-inprogresssequence-preparejoin-bootstrapandjoin]”
>  section of TCM Implementation doc, InProgressSequence transformations are 
> mentioned as "{_}InProgressSequence`, holding the three transformations 
> (`PrepareJoin`, `MinJoin` `FinishJoin`){_}”. As per 
> [PrepareJoin|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tcm/transformations/PrepareJoin.java#L154],
>  these are {*}StartJoin{*}, Mi{*}d{*}Join & FinishJoin. Doc needs to be 
> updated.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19342) Test Failure: org.apache.cassandra.distributed.test.RemoveNodeTest.testAbort

2024-02-12 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19342:

  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra/commit/61aabdfb4426296e9924e2959baa14fe744fb362
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed, thanks!

> Test Failure: org.apache.cassandra.distributed.test.RemoveNodeTest.testAbort
> 
>
> Key: CASSANDRA-19342
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19342
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:java}
> junit.framework.AssertionFailedError: "RemovalStatus: No removals in 
> progress. " does not contain MID_LEAVE at 
> org.apache.cassandra.distributed.test.RemoveNodeTest.testAbort(RemoveNodeTest.java:54)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=testAbort-trunk



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19247) Minor corrections to TCM Implementation documentation

2024-02-12 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19247:

  Fix Version/s: NA
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra/commit/bec6bfde1f3b6a782f123f9f9ff18072a97e379f
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

committed, thanks!

> Minor corrections to TCM Implementation documentation
> -
>
> Key: CASSANDRA-19247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19247
> Project: Cassandra
>  Issue Type: Bug
>  Components: Documentation, Transactional Cluster Metadata
>Reporter: n.v.harikrishna
>Assignee: n.v.harikrishna
>Priority: Normal
> Fix For: NA
>
>
> In the "[Bootstrap: InProgressSequence, PrepareJoin, 
> BootstrapAndJoin|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tcm/TCM_implementation.md#bootstrap-inprogresssequence-preparejoin-bootstrapandjoin]”
>  section of TCM Implementation doc, InProgressSequence transformations are 
> mentioned as "{_}InProgressSequence`, holding the three transformations 
> (`PrepareJoin`, `MinJoin` `FinishJoin`){_}”. As per 
> [PrepareJoin|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tcm/transformations/PrepareJoin.java#L154],
>  these are {*}StartJoin{*}, Mi{*}d{*}Join & FinishJoin. Doc needs to be 
> updated.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19247) Minor corrections to TCM Implementation documentation

2024-02-12 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19247:

Status: Ready to Commit  (was: Review In Progress)

+1

> Minor corrections to TCM Implementation documentation
> -
>
> Key: CASSANDRA-19247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19247
> Project: Cassandra
>  Issue Type: Bug
>  Components: Documentation, Transactional Cluster Metadata
>Reporter: n.v.harikrishna
>Assignee: n.v.harikrishna
>Priority: Normal
>
> In the "[Bootstrap: InProgressSequence, PrepareJoin, 
> BootstrapAndJoin|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tcm/TCM_implementation.md#bootstrap-inprogresssequence-preparejoin-bootstrapandjoin]”
>  section of TCM Implementation doc, InProgressSequence transformations are 
> mentioned as "{_}InProgressSequence`, holding the three transformations 
> (`PrepareJoin`, `MinJoin` `FinishJoin`){_}”. As per 
> [PrepareJoin|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tcm/transformations/PrepareJoin.java#L154],
>  these are {*}StartJoin{*}, Mi{*}d{*}Join & FinishJoin. Doc needs to be 
> updated.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19247) Minor corrections to TCM Implementation documentation

2024-02-12 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19247:

Reviewers: Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Status: Review In Progress  (was: Patch Available)

> Minor corrections to TCM Implementation documentation
> -
>
> Key: CASSANDRA-19247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19247
> Project: Cassandra
>  Issue Type: Bug
>  Components: Documentation, Transactional Cluster Metadata
>Reporter: n.v.harikrishna
>Assignee: n.v.harikrishna
>Priority: Normal
>
> In the "[Bootstrap: InProgressSequence, PrepareJoin, 
> BootstrapAndJoin|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tcm/TCM_implementation.md#bootstrap-inprogresssequence-preparejoin-bootstrapandjoin]”
>  section of TCM Implementation doc, InProgressSequence transformations are 
> mentioned as "{_}InProgressSequence`, holding the three transformations 
> (`PrepareJoin`, `MinJoin` `FinishJoin`){_}”. As per 
> [PrepareJoin|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tcm/transformations/PrepareJoin.java#L154],
>  these are {*}StartJoin{*}, Mi{*}d{*}Join & FinishJoin. Doc needs to be 
> updated.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19347) Improve Could not perform commit after 4089/10 tries log message

2024-02-12 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816536#comment-17816536
 ] 

Sam Tunnicliffe commented on CASSANDRA-19347:
-

+1

> Improve Could not perform commit after 4089/10 tries log message
> 
>
> Key: CASSANDRA-19347
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19347
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 5.1
>
>
> Improve retry logging for the local processor when using deadline and 
> indefinite strategies:
> Currently, it is sometimes possible to get into the situation where we get 
> the following log message:
> {code}
> java.lang.IllegalStateException: Can not commit transformation: 
> "SERVER_ERROR"(Could not perform commit after 4089/10 tries. Time remaining: 
> 0ms). 
> at 
> org.apache.cassandra.tcm.ClusterMetadataService.lambda$commit$6(ClusterMetadataService.java:470)
>  
> at 
> org.apache.cassandra.tcm.ClusterMetadataService.commit(ClusterMetadataService.java:514)
>  
> at 
> org.apache.cassandra.tcm.ClusterMetadataService.commit(ClusterMetadataService.java:467)
>  
> at 
> org.apache.cassandra.tcm.Startup.initializeAsFirstCMSNode(Startup.java:139) 
> at org.apache.cassandra.tcm.migration.Election.finish(Election.java:141) 
> at org.apache.cassandra.tcm.migration.Election.nominateSelf(Election.java:94) 
> at 
> org.apache.cassandra.tcm.ClusterMetadataService.upgradeFromGossip(ClusterMetadataService.java:345)
>  
> at 
> org.apache.cassandra.tcm.CMSOperations.initializeCMS(CMSOperations.java:65) 
> at 
> org.apache.cassandra.tools.nodetool.InitializeCMS.execute(InitializeCMS.java:37)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19342) Test Failure: org.apache.cassandra.distributed.test.RemoveNodeTest.testAbort

2024-02-09 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815966#comment-17815966
 ] 

Sam Tunnicliffe commented on CASSANDRA-19342:
-

bq. we might want to run it again 1000 times, not only 200. To be on the safe 
side.

Yep, that would be great. Unfortunately, I could only manage to get 200 
iterations using the circle free tier config as the jobs time out after an 
hour. 


> Test Failure: org.apache.cassandra.distributed.test.RemoveNodeTest.testAbort
> 
>
> Key: CASSANDRA-19342
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19342
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:java}
> junit.framework.AssertionFailedError: "RemovalStatus: No removals in 
> progress. " does not contain MID_LEAVE at 
> org.apache.cassandra.distributed.test.RemoveNodeTest.testAbort(RemoveNodeTest.java:54)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=testAbort-trunk



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19342) Test Failure: org.apache.cassandra.distributed.test.RemoveNodeTest.testAbort

2024-02-08 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19342:

Test and Documentation Plan: Test-only change, repeated runs in circle
 Status: Patch Available  (was: Open)

This seems to be due to the test not controlling execution of the steps in the 
removal process closely enough. I've rewritten it to be more deterministic and 
see no failures in 200 repeated runs in circle.

https://app.circleci.com/pipelines/github/beobal/cassandra/444/workflows/9c47e2bf-e95a-46c4-80dc-71912d372d00/jobs/7858
  

[~e.dimitrova] any chance you would have time to review the test changes 
please? 
https://github.com/apache/cassandra/pull/3093



> Test Failure: org.apache.cassandra.distributed.test.RemoveNodeTest.testAbort
> 
>
> Key: CASSANDRA-19342
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19342
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:java}
> junit.framework.AssertionFailedError: "RemovalStatus: No removals in 
> progress. " does not contain MID_LEAVE at 
> org.apache.cassandra.distributed.test.RemoveNodeTest.testAbort(RemoveNodeTest.java:54)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=testAbort-trunk



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



  1   2   3   4   5   6   7   8   9   10   >