[jira] [Updated] (CASSANDRA-17667) Text value containing "/*" interpreted as multiline comment in cqlsh

2024-09-20 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-17667:
-
Reviewers: Brandon Williams, Brandon Williams  (was: Brandon Williams)
   Brandon Williams, Brandon Williams  (was: Brandon Williams)
   Status: Review In Progress  (was: Patch Available)

> Text value containing "/*" interpreted as multiline comment in cqlsh
> 
>
> Key: CASSANDRA-17667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17667
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter
>Reporter: ANOOP THOMAS
>Assignee: Brad Schoening
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> I use CQLSH command line utility to load some DDLs. The version of utility I 
> use is this:
> {noformat}
> [cqlsh 6.0.0 | Cassandra 4.0.0.47 | CQL spec 3.4.5 | Native protocol 
> v5]{noformat}
> Command that loads DDL.cql:
> {noformat}
> cqlsh -u username -p password cassandra.example.com 65503 --ssl -f DDL.cql
> {noformat}
> I have a line in CQL script that breaks the syntax.
> {noformat}
> INSERT into tablename (key,columnname1,columnname2) VALUES 
> ('keyName','value1','/value2/*/value3');{noformat}
> {{/*}} here is interpreted as start of multi-line comment. It used to work on 
> older versions of cqlsh. The error I see looks like this:
> {noformat}
> SyntaxException: line 4:2 mismatched input 'Update' expecting ')' 
> (...,'value1','/value2INSERT into tablename(INSERT into tablename 
> (key,columnname1,columnname2)) VALUES ('[Update]-...) SyntaxException: line 
> 1:0 no viable alternative at input '(' ([(]...)
> {noformat}
> Same behavior while running in interactive mode too. {{/*}} inside a CQL 
> statement should not be interpreted as start of multi-line comment.
> With schema:
> {code:java}
> CREATE TABLE tablename ( key text primary key, columnname1 text, columnname2 
> text);{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-17667) Text value containing "/*" interpreted as multiline comment in cqlsh

2024-09-20 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-17667:
-
Status: Ready to Commit  (was: Review In Progress)

> Text value containing "/*" interpreted as multiline comment in cqlsh
> 
>
> Key: CASSANDRA-17667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17667
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter
>Reporter: ANOOP THOMAS
>Assignee: Brad Schoening
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> I use CQLSH command line utility to load some DDLs. The version of utility I 
> use is this:
> {noformat}
> [cqlsh 6.0.0 | Cassandra 4.0.0.47 | CQL spec 3.4.5 | Native protocol 
> v5]{noformat}
> Command that loads DDL.cql:
> {noformat}
> cqlsh -u username -p password cassandra.example.com 65503 --ssl -f DDL.cql
> {noformat}
> I have a line in CQL script that breaks the syntax.
> {noformat}
> INSERT into tablename (key,columnname1,columnname2) VALUES 
> ('keyName','value1','/value2/*/value3');{noformat}
> {{/*}} here is interpreted as start of multi-line comment. It used to work on 
> older versions of cqlsh. The error I see looks like this:
> {noformat}
> SyntaxException: line 4:2 mismatched input 'Update' expecting ')' 
> (...,'value1','/value2INSERT into tablename(INSERT into tablename 
> (key,columnname1,columnname2)) VALUES ('[Update]-...) SyntaxException: line 
> 1:0 no viable alternative at input '(' ([(]...)
> {noformat}
> Same behavior while running in interactive mode too. {{/*}} inside a CQL 
> statement should not be interpreted as start of multi-line comment.
> With schema:
> {code:java}
> CREATE TABLE tablename ( key text primary key, columnname1 text, columnname2 
> text);{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17667) Text value containing "/*" interpreted as multiline comment in cqlsh

2024-09-20 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883391#comment-17883391
 ] 

Brandon Williams commented on CASSANDRA-17667:
--

4.0 looks like it has codestyle problems.

> Text value containing "/*" interpreted as multiline comment in cqlsh
> 
>
> Key: CASSANDRA-17667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17667
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter
>Reporter: ANOOP THOMAS
>Assignee: Brad Schoening
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> I use CQLSH command line utility to load some DDLs. The version of utility I 
> use is this:
> {noformat}
> [cqlsh 6.0.0 | Cassandra 4.0.0.47 | CQL spec 3.4.5 | Native protocol 
> v5]{noformat}
> Command that loads DDL.cql:
> {noformat}
> cqlsh -u username -p password cassandra.example.com 65503 --ssl -f DDL.cql
> {noformat}
> I have a line in CQL script that breaks the syntax.
> {noformat}
> INSERT into tablename (key,columnname1,columnname2) VALUES 
> ('keyName','value1','/value2/*/value3');{noformat}
> {{/*}} here is interpreted as start of multi-line comment. It used to work on 
> older versions of cqlsh. The error I see looks like this:
> {noformat}
> SyntaxException: line 4:2 mismatched input 'Update' expecting ')' 
> (...,'value1','/value2INSERT into tablename(INSERT into tablename 
> (key,columnname1,columnname2)) VALUES ('[Update]-...) SyntaxException: line 
> 1:0 no viable alternative at input '(' ([(]...)
> {noformat}
> Same behavior while running in interactive mode too. {{/*}} inside a CQL 
> statement should not be interpreted as start of multi-line comment.
> With schema:
> {code:java}
> CREATE TABLE tablename ( key text primary key, columnname1 text, columnname2 
> text);{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19925) should show error when altering table with Non-frozen UDTs with nested non-frozen collections

2024-09-17 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882456#comment-17882456
 ] 

Brandon Williams commented on CASSANDRA-19925:
--

This also appears to affect 4.0 and 4.1

> should show error when altering table with Non-frozen UDTs with nested 
> non-frozen collections
> -
>
> Key: CASSANDRA-19925
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19925
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Soheil Rahsaz
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Cassandra version: 5.0.0 GA
> Given this type
> {code:java}
> CREATE TYPE testType
> (
> userids SET
> );
> {code}
> If I try to create this table
> {code:java}
> CREATE TABLE test
> (
> id INT PRIMARY KEY,
> myType testType
> );
> {code}
> It shows an error: "Non-frozen UDTs with nested non-frozen collections are 
> not supported"
> But if I create the table without the type and then alter the table:
> {code:java}
> CREATE TABLE test
> (
> id INT PRIMARY KEY
> );
> alter TABLE test add myType testType;
> {code}
> It does not show that error, which it should.
> ---
> Also the output of `describe test;` shows that it successfully altered the 
> table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-17667) Text value containing "/*" interpreted as multiline comment in cqlsh

2024-09-17 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-17667:
-
Reviewers: Brandon Williams

> Text value containing "/*" interpreted as multiline comment in cqlsh
> 
>
> Key: CASSANDRA-17667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17667
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter
>Reporter: ANOOP THOMAS
>Assignee: Brad Schoening
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> I use CQLSH command line utility to load some DDLs. The version of utility I 
> use is this:
> {noformat}
> [cqlsh 6.0.0 | Cassandra 4.0.0.47 | CQL spec 3.4.5 | Native protocol 
> v5]{noformat}
> Command that loads DDL.cql:
> {noformat}
> cqlsh -u username -p password cassandra.example.com 65503 --ssl -f DDL.cql
> {noformat}
> I have a line in CQL script that breaks the syntax.
> {noformat}
> INSERT into tablename (key,columnname1,columnname2) VALUES 
> ('keyName','value1','/value2/*/value3');{noformat}
> {{/*}} here is interpreted as start of multi-line comment. It used to work on 
> older versions of cqlsh. The error I see looks like this:
> {noformat}
> SyntaxException: line 4:2 mismatched input 'Update' expecting ')' 
> (...,'value1','/value2INSERT into tablename(INSERT into tablename 
> (key,columnname1,columnname2)) VALUES ('[Update]-...) SyntaxException: line 
> 1:0 no viable alternative at input '(' ([(]...)
> {noformat}
> Same behavior while running in interactive mode too. {{/*}} inside a CQL 
> statement should not be interpreted as start of multi-line comment.
> With schema:
> {code:java}
> CREATE TABLE tablename ( key text primary key, columnname1 text, columnname2 
> text);{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19785) Possible memory leak in BTree.FastBuilder

2024-09-16 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19785:
-
  Fix Version/s: 4.0.14
 4.1.7
 5.0.1
 5.1-alpha1
 5.1
 (was: 5.x)
 (was: 4.0.x)
 (was: 4.1.x)
 (was: 5.0.x)
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra/commit/2842c01ce7eaeb6e72a63435075cc282a1cdd2db
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed, thanks everyone.

> Possible memory leak in BTree.FastBuilder 
> --
>
> Key: CASSANDRA-19785
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19785
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Paul Chandler
>Assignee: Benedict Elliott Smith
>Priority: Normal
> Fix For: 4.0.14, 4.1.7, 5.0.1, 5.1-alpha1, 5.1
>
> Attachments: image-2024-07-19-08-44-56-714.png, 
> image-2024-07-19-08-45-17-289.png, image-2024-07-19-08-45-33-933.png, 
> image-2024-07-19-08-45-50-383.png, image-2024-07-19-08-46-06-919.png, 
> image-2024-07-19-08-46-42-979.png, image-2024-07-19-08-46-56-594.png, 
> image-2024-07-19-08-47-19-517.png, image-2024-07-19-08-47-34-582.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are having a problem with the heap growing in size, This is a large 
> cluster > 1,000 nodes across a large number of dc’s. This is running version 
> 4.0.11.
>  
> Each node has a 32GB heap, and the amount used continues to grow until it 
> reaches 30GB, it then struggles with multiple Full GC pauses, as can be seen 
> here:
> !image-2024-07-19-08-44-56-714.png!
> We took 2 heap dumps on one node a few days after it was restarted, and the 
> heap had grown by 2.7GB
>  
> 9{^}th{^} July
> !image-2024-07-19-08-45-17-289.png!
> 11{^}th{^} July
> !image-2024-07-19-08-45-33-933.png!
> This can be seen as mainly an increase of memory used by 
> FastThreadLocalThread, increasing from 5.92GB to 8.53GB
> !image-2024-07-19-08-45-50-383.png!
> !image-2024-07-19-08-46-06-919.png!
> Looking deeper into this it can be seen that the growing heap is contained 
> within the threads for the MutationStage, Native-transport-Requests, 
> ReadStage etc. We would expect the memory used within these threads to be 
> short lived, and not grow as time goes on.  We recently increased the size of 
> theses threadpools, and that has increased the size of the problem.
>  
> Top memory usage for FastThreadLocalThread
> 9{^}th{^} July
> !image-2024-07-19-08-46-42-979.png!
> 11{^}th{^} July
> !image-2024-07-19-08-46-56-594.png!
> This has led us to investigate whether there could be a memory leak, and we 
> have found the following issues within the retained references in 
> BTree.FastBuilder objects. The issue appears to stem from the reset() method, 
> which does not properly clear all buffers.  We are not really sure how the 
> BTree.FastBuilder works, but this this is our analysis of where a leak might 
> occur.
>  
> Specifically:
> Leaf Buffer Not Being Cleared:
> When leaf().count is 0, the statement Arrays.fill(leaf().buffer, 0, 
> leaf().count, null); does not clear the buffer because the end index is 0. 
> This leaves the buffer with references to potentially large objects, 
> preventing garbage collection and increasing heap usage.
> Branch inUse Property:
> If the inUse property of the branch is set to false elsewhere in the code, 
> the while loop while (branch != null && branch.inUse) does not execute, 
> resulting in uncleared branch buffers and retained references.
>  
> This is based on the following observations:
>     Heap Dumps: Analysis of heap dumps shows that leaf().count is often 0, 
> and as a result, the buffer is not being cleared, leading to high heap 
> utilization.
> !image-2024-07-19-08-47-19-517.png!
>     Remote Debugging: Debugging sessions indicate that the drain() method 
> sets count to 0, and the inUse flag for the parent branch is set to false, 
> preventing the while loop in reset() from clearing the branch buffers.
> !image-2024-07-19-08-47-34-582.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19785) Possible memory leak in BTree.FastBuilder

2024-09-16 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19785:
-
Status: Review In Progress  (was: Needs Committer)

> Possible memory leak in BTree.FastBuilder 
> --
>
> Key: CASSANDRA-19785
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19785
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Paul Chandler
>Assignee: Benedict Elliott Smith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
> Attachments: image-2024-07-19-08-44-56-714.png, 
> image-2024-07-19-08-45-17-289.png, image-2024-07-19-08-45-33-933.png, 
> image-2024-07-19-08-45-50-383.png, image-2024-07-19-08-46-06-919.png, 
> image-2024-07-19-08-46-42-979.png, image-2024-07-19-08-46-56-594.png, 
> image-2024-07-19-08-47-19-517.png, image-2024-07-19-08-47-34-582.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are having a problem with the heap growing in size, This is a large 
> cluster > 1,000 nodes across a large number of dc’s. This is running version 
> 4.0.11.
>  
> Each node has a 32GB heap, and the amount used continues to grow until it 
> reaches 30GB, it then struggles with multiple Full GC pauses, as can be seen 
> here:
> !image-2024-07-19-08-44-56-714.png!
> We took 2 heap dumps on one node a few days after it was restarted, and the 
> heap had grown by 2.7GB
>  
> 9{^}th{^} July
> !image-2024-07-19-08-45-17-289.png!
> 11{^}th{^} July
> !image-2024-07-19-08-45-33-933.png!
> This can be seen as mainly an increase of memory used by 
> FastThreadLocalThread, increasing from 5.92GB to 8.53GB
> !image-2024-07-19-08-45-50-383.png!
> !image-2024-07-19-08-46-06-919.png!
> Looking deeper into this it can be seen that the growing heap is contained 
> within the threads for the MutationStage, Native-transport-Requests, 
> ReadStage etc. We would expect the memory used within these threads to be 
> short lived, and not grow as time goes on.  We recently increased the size of 
> theses threadpools, and that has increased the size of the problem.
>  
> Top memory usage for FastThreadLocalThread
> 9{^}th{^} July
> !image-2024-07-19-08-46-42-979.png!
> 11{^}th{^} July
> !image-2024-07-19-08-46-56-594.png!
> This has led us to investigate whether there could be a memory leak, and we 
> have found the following issues within the retained references in 
> BTree.FastBuilder objects. The issue appears to stem from the reset() method, 
> which does not properly clear all buffers.  We are not really sure how the 
> BTree.FastBuilder works, but this this is our analysis of where a leak might 
> occur.
>  
> Specifically:
> Leaf Buffer Not Being Cleared:
> When leaf().count is 0, the statement Arrays.fill(leaf().buffer, 0, 
> leaf().count, null); does not clear the buffer because the end index is 0. 
> This leaves the buffer with references to potentially large objects, 
> preventing garbage collection and increasing heap usage.
> Branch inUse Property:
> If the inUse property of the branch is set to false elsewhere in the code, 
> the while loop while (branch != null && branch.inUse) does not execute, 
> resulting in uncleared branch buffers and retained references.
>  
> This is based on the following observations:
>     Heap Dumps: Analysis of heap dumps shows that leaf().count is often 0, 
> and as a result, the buffer is not being cleared, leading to high heap 
> utilization.
> !image-2024-07-19-08-47-19-517.png!
>     Remote Debugging: Debugging sessions indicate that the drain() method 
> sets count to 0, and the inUse flag for the parent branch is set to false, 
> preventing the while loop in reset() from clearing the branch buffers.
> !image-2024-07-19-08-47-34-582.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19785) Possible memory leak in BTree.FastBuilder

2024-09-16 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19785:
-
Status: Ready to Commit  (was: Review In Progress)

> Possible memory leak in BTree.FastBuilder 
> --
>
> Key: CASSANDRA-19785
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19785
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Paul Chandler
>Assignee: Benedict Elliott Smith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
> Attachments: image-2024-07-19-08-44-56-714.png, 
> image-2024-07-19-08-45-17-289.png, image-2024-07-19-08-45-33-933.png, 
> image-2024-07-19-08-45-50-383.png, image-2024-07-19-08-46-06-919.png, 
> image-2024-07-19-08-46-42-979.png, image-2024-07-19-08-46-56-594.png, 
> image-2024-07-19-08-47-19-517.png, image-2024-07-19-08-47-34-582.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are having a problem with the heap growing in size, This is a large 
> cluster > 1,000 nodes across a large number of dc’s. This is running version 
> 4.0.11.
>  
> Each node has a 32GB heap, and the amount used continues to grow until it 
> reaches 30GB, it then struggles with multiple Full GC pauses, as can be seen 
> here:
> !image-2024-07-19-08-44-56-714.png!
> We took 2 heap dumps on one node a few days after it was restarted, and the 
> heap had grown by 2.7GB
>  
> 9{^}th{^} July
> !image-2024-07-19-08-45-17-289.png!
> 11{^}th{^} July
> !image-2024-07-19-08-45-33-933.png!
> This can be seen as mainly an increase of memory used by 
> FastThreadLocalThread, increasing from 5.92GB to 8.53GB
> !image-2024-07-19-08-45-50-383.png!
> !image-2024-07-19-08-46-06-919.png!
> Looking deeper into this it can be seen that the growing heap is contained 
> within the threads for the MutationStage, Native-transport-Requests, 
> ReadStage etc. We would expect the memory used within these threads to be 
> short lived, and not grow as time goes on.  We recently increased the size of 
> theses threadpools, and that has increased the size of the problem.
>  
> Top memory usage for FastThreadLocalThread
> 9{^}th{^} July
> !image-2024-07-19-08-46-42-979.png!
> 11{^}th{^} July
> !image-2024-07-19-08-46-56-594.png!
> This has led us to investigate whether there could be a memory leak, and we 
> have found the following issues within the retained references in 
> BTree.FastBuilder objects. The issue appears to stem from the reset() method, 
> which does not properly clear all buffers.  We are not really sure how the 
> BTree.FastBuilder works, but this this is our analysis of where a leak might 
> occur.
>  
> Specifically:
> Leaf Buffer Not Being Cleared:
> When leaf().count is 0, the statement Arrays.fill(leaf().buffer, 0, 
> leaf().count, null); does not clear the buffer because the end index is 0. 
> This leaves the buffer with references to potentially large objects, 
> preventing garbage collection and increasing heap usage.
> Branch inUse Property:
> If the inUse property of the branch is set to false elsewhere in the code, 
> the while loop while (branch != null && branch.inUse) does not execute, 
> resulting in uncleared branch buffers and retained references.
>  
> This is based on the following observations:
>     Heap Dumps: Analysis of heap dumps shows that leaf().count is often 0, 
> and as a result, the buffer is not being cleared, leading to high heap 
> utilization.
> !image-2024-07-19-08-47-19-517.png!
>     Remote Debugging: Debugging sessions indicate that the drain() method 
> sets count to 0, and the inUse flag for the parent branch is set to false, 
> preventing the while loop in reset() from clearing the branch buffers.
> !image-2024-07-19-08-47-34-582.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19785) Possible memory leak in BTree.FastBuilder

2024-09-16 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19785:
-
Reviewers: Branimir Lambov

> Possible memory leak in BTree.FastBuilder 
> --
>
> Key: CASSANDRA-19785
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19785
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Paul Chandler
>Assignee: Benedict Elliott Smith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
> Attachments: image-2024-07-19-08-44-56-714.png, 
> image-2024-07-19-08-45-17-289.png, image-2024-07-19-08-45-33-933.png, 
> image-2024-07-19-08-45-50-383.png, image-2024-07-19-08-46-06-919.png, 
> image-2024-07-19-08-46-42-979.png, image-2024-07-19-08-46-56-594.png, 
> image-2024-07-19-08-47-19-517.png, image-2024-07-19-08-47-34-582.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are having a problem with the heap growing in size, This is a large 
> cluster > 1,000 nodes across a large number of dc’s. This is running version 
> 4.0.11.
>  
> Each node has a 32GB heap, and the amount used continues to grow until it 
> reaches 30GB, it then struggles with multiple Full GC pauses, as can be seen 
> here:
> !image-2024-07-19-08-44-56-714.png!
> We took 2 heap dumps on one node a few days after it was restarted, and the 
> heap had grown by 2.7GB
>  
> 9{^}th{^} July
> !image-2024-07-19-08-45-17-289.png!
> 11{^}th{^} July
> !image-2024-07-19-08-45-33-933.png!
> This can be seen as mainly an increase of memory used by 
> FastThreadLocalThread, increasing from 5.92GB to 8.53GB
> !image-2024-07-19-08-45-50-383.png!
> !image-2024-07-19-08-46-06-919.png!
> Looking deeper into this it can be seen that the growing heap is contained 
> within the threads for the MutationStage, Native-transport-Requests, 
> ReadStage etc. We would expect the memory used within these threads to be 
> short lived, and not grow as time goes on.  We recently increased the size of 
> theses threadpools, and that has increased the size of the problem.
>  
> Top memory usage for FastThreadLocalThread
> 9{^}th{^} July
> !image-2024-07-19-08-46-42-979.png!
> 11{^}th{^} July
> !image-2024-07-19-08-46-56-594.png!
> This has led us to investigate whether there could be a memory leak, and we 
> have found the following issues within the retained references in 
> BTree.FastBuilder objects. The issue appears to stem from the reset() method, 
> which does not properly clear all buffers.  We are not really sure how the 
> BTree.FastBuilder works, but this this is our analysis of where a leak might 
> occur.
>  
> Specifically:
> Leaf Buffer Not Being Cleared:
> When leaf().count is 0, the statement Arrays.fill(leaf().buffer, 0, 
> leaf().count, null); does not clear the buffer because the end index is 0. 
> This leaves the buffer with references to potentially large objects, 
> preventing garbage collection and increasing heap usage.
> Branch inUse Property:
> If the inUse property of the branch is set to false elsewhere in the code, 
> the while loop while (branch != null && branch.inUse) does not execute, 
> resulting in uncleared branch buffers and retained references.
>  
> This is based on the following observations:
>     Heap Dumps: Analysis of heap dumps shows that leaf().count is often 0, 
> and as a result, the buffer is not being cleared, leading to high heap 
> utilization.
> !image-2024-07-19-08-47-19-517.png!
>     Remote Debugging: Debugging sessions indicate that the drain() method 
> sets count to 0, and the inUse flag for the parent branch is set to false, 
> preventing the while loop in reset() from clearing the branch buffers.
> !image-2024-07-19-08-47-34-582.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19785) Possible memory leak in BTree.FastBuilder

2024-09-16 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19785:
-
Status: Needs Committer  (was: Patch Available)

> Possible memory leak in BTree.FastBuilder 
> --
>
> Key: CASSANDRA-19785
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19785
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Paul Chandler
>Assignee: Benedict Elliott Smith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
> Attachments: image-2024-07-19-08-44-56-714.png, 
> image-2024-07-19-08-45-17-289.png, image-2024-07-19-08-45-33-933.png, 
> image-2024-07-19-08-45-50-383.png, image-2024-07-19-08-46-06-919.png, 
> image-2024-07-19-08-46-42-979.png, image-2024-07-19-08-46-56-594.png, 
> image-2024-07-19-08-47-19-517.png, image-2024-07-19-08-47-34-582.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are having a problem with the heap growing in size, This is a large 
> cluster > 1,000 nodes across a large number of dc’s. This is running version 
> 4.0.11.
>  
> Each node has a 32GB heap, and the amount used continues to grow until it 
> reaches 30GB, it then struggles with multiple Full GC pauses, as can be seen 
> here:
> !image-2024-07-19-08-44-56-714.png!
> We took 2 heap dumps on one node a few days after it was restarted, and the 
> heap had grown by 2.7GB
>  
> 9{^}th{^} July
> !image-2024-07-19-08-45-17-289.png!
> 11{^}th{^} July
> !image-2024-07-19-08-45-33-933.png!
> This can be seen as mainly an increase of memory used by 
> FastThreadLocalThread, increasing from 5.92GB to 8.53GB
> !image-2024-07-19-08-45-50-383.png!
> !image-2024-07-19-08-46-06-919.png!
> Looking deeper into this it can be seen that the growing heap is contained 
> within the threads for the MutationStage, Native-transport-Requests, 
> ReadStage etc. We would expect the memory used within these threads to be 
> short lived, and not grow as time goes on.  We recently increased the size of 
> theses threadpools, and that has increased the size of the problem.
>  
> Top memory usage for FastThreadLocalThread
> 9{^}th{^} July
> !image-2024-07-19-08-46-42-979.png!
> 11{^}th{^} July
> !image-2024-07-19-08-46-56-594.png!
> This has led us to investigate whether there could be a memory leak, and we 
> have found the following issues within the retained references in 
> BTree.FastBuilder objects. The issue appears to stem from the reset() method, 
> which does not properly clear all buffers.  We are not really sure how the 
> BTree.FastBuilder works, but this this is our analysis of where a leak might 
> occur.
>  
> Specifically:
> Leaf Buffer Not Being Cleared:
> When leaf().count is 0, the statement Arrays.fill(leaf().buffer, 0, 
> leaf().count, null); does not clear the buffer because the end index is 0. 
> This leaves the buffer with references to potentially large objects, 
> preventing garbage collection and increasing heap usage.
> Branch inUse Property:
> If the inUse property of the branch is set to false elsewhere in the code, 
> the while loop while (branch != null && branch.inUse) does not execute, 
> resulting in uncleared branch buffers and retained references.
>  
> This is based on the following observations:
>     Heap Dumps: Analysis of heap dumps shows that leaf().count is often 0, 
> and as a result, the buffer is not being cleared, leading to high heap 
> utilization.
> !image-2024-07-19-08-47-19-517.png!
>     Remote Debugging: Debugging sessions indicate that the drain() method 
> sets count to 0, and the inUse flag for the parent branch is set to false, 
> preventing the while loop in reset() from clearing the branch buffers.
> !image-2024-07-19-08-47-34-582.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-19785) Possible memory leak in BTree.FastBuilder

2024-09-16 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams reassigned CASSANDRA-19785:


Assignee: Benedict Elliott Smith

> Possible memory leak in BTree.FastBuilder 
> --
>
> Key: CASSANDRA-19785
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19785
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Paul Chandler
>Assignee: Benedict Elliott Smith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
> Attachments: image-2024-07-19-08-44-56-714.png, 
> image-2024-07-19-08-45-17-289.png, image-2024-07-19-08-45-33-933.png, 
> image-2024-07-19-08-45-50-383.png, image-2024-07-19-08-46-06-919.png, 
> image-2024-07-19-08-46-42-979.png, image-2024-07-19-08-46-56-594.png, 
> image-2024-07-19-08-47-19-517.png, image-2024-07-19-08-47-34-582.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are having a problem with the heap growing in size, This is a large 
> cluster > 1,000 nodes across a large number of dc’s. This is running version 
> 4.0.11.
>  
> Each node has a 32GB heap, and the amount used continues to grow until it 
> reaches 30GB, it then struggles with multiple Full GC pauses, as can be seen 
> here:
> !image-2024-07-19-08-44-56-714.png!
> We took 2 heap dumps on one node a few days after it was restarted, and the 
> heap had grown by 2.7GB
>  
> 9{^}th{^} July
> !image-2024-07-19-08-45-17-289.png!
> 11{^}th{^} July
> !image-2024-07-19-08-45-33-933.png!
> This can be seen as mainly an increase of memory used by 
> FastThreadLocalThread, increasing from 5.92GB to 8.53GB
> !image-2024-07-19-08-45-50-383.png!
> !image-2024-07-19-08-46-06-919.png!
> Looking deeper into this it can be seen that the growing heap is contained 
> within the threads for the MutationStage, Native-transport-Requests, 
> ReadStage etc. We would expect the memory used within these threads to be 
> short lived, and not grow as time goes on.  We recently increased the size of 
> theses threadpools, and that has increased the size of the problem.
>  
> Top memory usage for FastThreadLocalThread
> 9{^}th{^} July
> !image-2024-07-19-08-46-42-979.png!
> 11{^}th{^} July
> !image-2024-07-19-08-46-56-594.png!
> This has led us to investigate whether there could be a memory leak, and we 
> have found the following issues within the retained references in 
> BTree.FastBuilder objects. The issue appears to stem from the reset() method, 
> which does not properly clear all buffers.  We are not really sure how the 
> BTree.FastBuilder works, but this this is our analysis of where a leak might 
> occur.
>  
> Specifically:
> Leaf Buffer Not Being Cleared:
> When leaf().count is 0, the statement Arrays.fill(leaf().buffer, 0, 
> leaf().count, null); does not clear the buffer because the end index is 0. 
> This leaves the buffer with references to potentially large objects, 
> preventing garbage collection and increasing heap usage.
> Branch inUse Property:
> If the inUse property of the branch is set to false elsewhere in the code, 
> the while loop while (branch != null && branch.inUse) does not execute, 
> resulting in uncleared branch buffers and retained references.
>  
> This is based on the following observations:
>     Heap Dumps: Analysis of heap dumps shows that leaf().count is often 0, 
> and as a result, the buffer is not being cleared, leading to high heap 
> utilization.
> !image-2024-07-19-08-47-19-517.png!
>     Remote Debugging: Debugging sessions indicate that the drain() method 
> sets count to 0, and the inUse flag for the parent branch is set to false, 
> preventing the while loop in reset() from clearing the branch buffers.
> !image-2024-07-19-08-47-34-582.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19785) Possible memory leak in BTree.FastBuilder

2024-09-16 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882060#comment-17882060
 ] 

Brandon Williams commented on CASSANDRA-19785:
--

Thanks, I will try to get this committed soon.

> Possible memory leak in BTree.FastBuilder 
> --
>
> Key: CASSANDRA-19785
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19785
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Paul Chandler
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
> Attachments: image-2024-07-19-08-44-56-714.png, 
> image-2024-07-19-08-45-17-289.png, image-2024-07-19-08-45-33-933.png, 
> image-2024-07-19-08-45-50-383.png, image-2024-07-19-08-46-06-919.png, 
> image-2024-07-19-08-46-42-979.png, image-2024-07-19-08-46-56-594.png, 
> image-2024-07-19-08-47-19-517.png, image-2024-07-19-08-47-34-582.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are having a problem with the heap growing in size, This is a large 
> cluster > 1,000 nodes across a large number of dc’s. This is running version 
> 4.0.11.
>  
> Each node has a 32GB heap, and the amount used continues to grow until it 
> reaches 30GB, it then struggles with multiple Full GC pauses, as can be seen 
> here:
> !image-2024-07-19-08-44-56-714.png!
> We took 2 heap dumps on one node a few days after it was restarted, and the 
> heap had grown by 2.7GB
>  
> 9{^}th{^} July
> !image-2024-07-19-08-45-17-289.png!
> 11{^}th{^} July
> !image-2024-07-19-08-45-33-933.png!
> This can be seen as mainly an increase of memory used by 
> FastThreadLocalThread, increasing from 5.92GB to 8.53GB
> !image-2024-07-19-08-45-50-383.png!
> !image-2024-07-19-08-46-06-919.png!
> Looking deeper into this it can be seen that the growing heap is contained 
> within the threads for the MutationStage, Native-transport-Requests, 
> ReadStage etc. We would expect the memory used within these threads to be 
> short lived, and not grow as time goes on.  We recently increased the size of 
> theses threadpools, and that has increased the size of the problem.
>  
> Top memory usage for FastThreadLocalThread
> 9{^}th{^} July
> !image-2024-07-19-08-46-42-979.png!
> 11{^}th{^} July
> !image-2024-07-19-08-46-56-594.png!
> This has led us to investigate whether there could be a memory leak, and we 
> have found the following issues within the retained references in 
> BTree.FastBuilder objects. The issue appears to stem from the reset() method, 
> which does not properly clear all buffers.  We are not really sure how the 
> BTree.FastBuilder works, but this this is our analysis of where a leak might 
> occur.
>  
> Specifically:
> Leaf Buffer Not Being Cleared:
> When leaf().count is 0, the statement Arrays.fill(leaf().buffer, 0, 
> leaf().count, null); does not clear the buffer because the end index is 0. 
> This leaves the buffer with references to potentially large objects, 
> preventing garbage collection and increasing heap usage.
> Branch inUse Property:
> If the inUse property of the branch is set to false elsewhere in the code, 
> the while loop while (branch != null && branch.inUse) does not execute, 
> resulting in uncleared branch buffers and retained references.
>  
> This is based on the following observations:
>     Heap Dumps: Analysis of heap dumps shows that leaf().count is often 0, 
> and as a result, the buffer is not being cleared, leading to high heap 
> utilization.
> !image-2024-07-19-08-47-19-517.png!
>     Remote Debugging: Debugging sessions indicate that the drain() method 
> sets count to 0, and the inUse flag for the parent branch is set to false, 
> preventing the while loop in reset() from clearing the branch buffers.
> !image-2024-07-19-08-47-34-582.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19925) should show error when altering table with Non-frozen UDTs with nested non-frozen collections

2024-09-16 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19925:
-
 Bug Category: Parent values: Correctness(12982)
   Complexity: Normal
  Component/s: Cluster/Schema
Discovered By: User Report
Fix Version/s: 5.0.x
 Severity: Normal
   Status: Open  (was: Triage Needed)

> should show error when altering table with Non-frozen UDTs with nested 
> non-frozen collections
> -
>
> Key: CASSANDRA-19925
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19925
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Soheil Rahsaz
>Priority: Normal
> Fix For: 5.0.x
>
>
> Cassandra version: 5.0.0 GA
> Given this type
> {code:java}
> CREATE TYPE testType
> (
> userids SET
> );
> {code}
> If I try to create this table
> {code:java}
> CREATE TABLE test
> (
> id INT PRIMARY KEY,
> myType testType
> );
> {code}
> It shows an error: "Non-frozen UDTs with nested non-frozen collections are 
> not supported"
> But if I create the table without the type and then alter the table:
> {code:java}
> CREATE TABLE test
> (
> id INT PRIMARY KEY
> );
> alter TABLE test add myType testType;
> {code}
> It does not show that error, which it should.
> ---
> Also the output of `describe test;` shows that it successfully altered the 
> table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19907) Add jeetkundoug's gig key to project's KEYS file

2024-09-10 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19907:
-
Resolution: Fixed
Status: Resolved  (was: Open)

Added to https://dist.apache.org/repos/dist/release/cassandra/KEYS in r71446

> Add jeetkundoug's gig key to project's KEYS file
> 
>
> Key: CASSANDRA-19907
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19907
> Project: Cassandra
>  Issue Type: Task
>  Components: Packaging
>Reporter: Doug Rohrer
>Assignee: Brandon Williams
>Priority: Normal
> Attachments: KEYS.patch
>
>
> This patch adds my gpg public key to the KEYS file.
> My gpg public key has the fingerprint 9A648E3DEDA36EE374C4277B602ED2C52277



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19907) Add jeetkundoug's gig key to project's KEYS file

2024-09-10 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19907:
-
Change Category: Operability
 Complexity: Normal
 Status: Open  (was: Triage Needed)

> Add jeetkundoug's gig key to project's KEYS file
> 
>
> Key: CASSANDRA-19907
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19907
> Project: Cassandra
>  Issue Type: Task
>  Components: Packaging
>Reporter: Doug Rohrer
>Assignee: Brandon Williams
>Priority: Normal
> Attachments: KEYS.patch
>
>
> This patch adds my gpg public key to the KEYS file.
> My gpg public key has the fingerprint 9A648E3DEDA36EE374C4277B602ED2C52277



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15891) provide a configuration option such as endpoint_verification_method

2024-09-04 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879388#comment-17879388
 ] 

Brandon Williams commented on CASSANDRA-15891:
--

It may be better to use 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-34%3A+mTLS+based+client+and+internode+authenticators
 instead at this point.

> provide a configuration option such as endpoint_verification_method
> ---
>
> Key: CASSANDRA-15891
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15891
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Messaging/Internode
>Reporter: Thanh
>Priority: Normal
> Fix For: 5.x
>
>
> With cassandra-9220, it's possible to configure endpoint/hostname 
> verification when enabling internode encryption.  However, you don't have any 
> control over what endpoint is used for the endpoint verification; instead, 
> cassandra will automatically try to use node IP (not node hostname) for 
> endpoint verification, so if your node certificates don't include the IP in 
> the ssl certificate's SAN list, then you'll get an error like:
> {code:java}
> ERROR [MessagingService-Outgoing-/10.10.88.194-Gossip] 2018-11-13 
> 10:20:26,903 OutboundTcpConnection.java:606 - SSL handshake error for 
> outbound connection to 50cc97c1[SSL_NULL_WITH_NULL_NULL: 
> Socket[addr=/,port=7001,localport=47684]] 
> javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: 
> No subject alternative names matching IP address  found 
> at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) {code}
> From what I've seen, most orgs will not have node IPs in their certs.
> So, it will be best if cassandra would provide another configuration option 
> such as *{{endpoint_verification_method}}* which you could set to "ip" or 
> "fqdn" or something else (eg "hostname_alias" if for whatever reason the org 
> doesn't want to use fqdn for endpoint verification).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19894) Vectors of counter as subtype do not work as expected

2024-09-04 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19894:
-
Fix Version/s: 5.x

> Vectors of counter as subtype do not work as expected
> -
>
> Key: CASSANDRA-19894
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19894
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Vector Search
>Reporter: Jane He
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> Here is the current behavior:
> {code:java}
> cqlsh:default> create table counter_table ( i int PRIMARY KEY, j 
> vector );
> cqlsh:default> insert into counter_table (i, j) values (0, [0,0]);
> cqlsh:default> select * from counter_table;
>  i | j
> ---+
>  0 | [576460752303423488, 2251799813685248]
> (1 rows) {code}
>  We may want vectors to handle `counter` subtype as other collections do, 
> which is throwing an error:
> {code:java}
> cqlsh:cycling>  create table upcoming_calendar ( year int, month int, events 
> list, primary key (year, month));
> InvalidRequest: Error from server: code=2200 [Invalid query] 
> message="Counters are not allowed inside collections: list"
> cqlsh:cycling>  create table upcoming_calendar ( year int, month int, events 
> map, primary key (year, month));
> InvalidRequest: Error from server: code=2200 [Invalid query] 
> message="Counters are not allowed inside collections: map" 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19894) Vectors of counter as subtype do not work as expected

2024-09-04 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879240#comment-17879240
 ] 

Brandon Williams commented on CASSANDRA-19894:
--

I think this was found during driver development, no use case, just something 
to clean up.

> Vectors of counter as subtype do not work as expected
> -
>
> Key: CASSANDRA-19894
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19894
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Vector Search
>Reporter: Jane He
>Priority: Normal
> Fix For: 5.0.x
>
>
> Here is the current behavior:
> {code:java}
> cqlsh:default> create table counter_table ( i int PRIMARY KEY, j 
> vector );
> cqlsh:default> insert into counter_table (i, j) values (0, [0,0]);
> cqlsh:default> select * from counter_table;
>  i | j
> ---+
>  0 | [576460752303423488, 2251799813685248]
> (1 rows) {code}
>  We may want vectors to handle `counter` subtype as other collections do, 
> which is throwing an error:
> {code:java}
> cqlsh:cycling>  create table upcoming_calendar ( year int, month int, events 
> list, primary key (year, month));
> InvalidRequest: Error from server: code=2200 [Invalid query] 
> message="Counters are not allowed inside collections: list"
> cqlsh:cycling>  create table upcoming_calendar ( year int, month int, events 
> map, primary key (year, month));
> InvalidRequest: Error from server: code=2200 [Invalid query] 
> message="Counters are not allowed inside collections: map" 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19894) Vectors of counter as subtype do not work as expected

2024-09-04 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879194#comment-17879194
 ] 

Brandon Williams edited comment on CASSANDRA-19894 at 9/4/24 11:29 AM:
---

Yes, it looks like we should throw an error here since a vector of counters 
doesn't make any sense.


was (Author: brandon.williams):
Yes, it looks like we should throw and error here since a vector of counters 
doesn't make any sense.

> Vectors of counter as subtype do not work as expected
> -
>
> Key: CASSANDRA-19894
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19894
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Vector Search
>Reporter: Jane He
>Priority: Normal
> Fix For: 5.0.x
>
>
> Here is the current behavior:
> {code:java}
> cqlsh:default> create table counter_table ( i int PRIMARY KEY, j 
> vector );
> cqlsh:default> insert into counter_table (i, j) values (0, [0,0]);
> cqlsh:default> select * from counter_table;
>  i | j
> ---+
>  0 | [576460752303423488, 2251799813685248]
> (1 rows) {code}
>  We may want vectors to handle `counter` subtype as other collections do, 
> which is throwing an error:
> {code:java}
> cqlsh:cycling>  create table upcoming_calendar ( year int, month int, events 
> list, primary key (year, month));
> InvalidRequest: Error from server: code=2200 [Invalid query] 
> message="Counters are not allowed inside collections: list"
> cqlsh:cycling>  create table upcoming_calendar ( year int, month int, events 
> map, primary key (year, month));
> InvalidRequest: Error from server: code=2200 [Invalid query] 
> message="Counters are not allowed inside collections: map" 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18750) Make unit tests compatible with the tmp.dir property

2024-09-03 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878994#comment-17878994
 ] 

Brandon Williams commented on CASSANDRA-18750:
--

[~dcapwell] that was actually added in CASSANDRA-16855

> Make unit tests compatible with the tmp.dir property
> 
>
> Key: CASSANDRA-18750
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18750
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Build
>Reporter: Derek Chen-Becker
>Assignee: Derek Chen-Becker
>Priority: Normal
> Fix For: 4.1.4, 5.0-alpha1, 5.0, 5.1
>
> Attachments: signature.asc
>
>
> Several unit tests hard-code file paths under the "/tmp" directory, which 
> means they do not honor the {{tmp.dir}} ant build property. These should be 
> updated so that when the user specifies {{tmp.dir}}, they can be certain that 
> any files or directories created by the unit tests will be in their specified 
> location.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] (CASSANDRA-18750) Make unit tests compatible with the tmp.dir property

2024-09-03 Thread Brandon Williams (Jira)


[ https://issues.apache.org/jira/browse/CASSANDRA-18750 ]


Brandon Williams deleted comment on CASSANDRA-18750:
--

was (Author: brandon.williams):
[~dcapwell] that was actually added in CASSANDRA-16855

> Make unit tests compatible with the tmp.dir property
> 
>
> Key: CASSANDRA-18750
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18750
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Build
>Reporter: Derek Chen-Becker
>Assignee: Derek Chen-Becker
>Priority: Normal
> Fix For: 4.1.4, 5.0-alpha1, 5.0, 5.1
>
> Attachments: signature.asc
>
>
> Several unit tests hard-code file paths under the "/tmp" directory, which 
> means they do not honor the {{tmp.dir}} ant build property. These should be 
> updated so that when the user specifies {{tmp.dir}}, they can be certain that 
> any files or directories created by the unit tests will be in their specified 
> location.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19888) 2i created in C* 2.2 with COMPACT STORAGE are not readable from 4.x

2024-09-03 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19888:
-
 Bug Category: Parent values: Availability(12983)Level 1 values: 
Unavailable(12994)
   Complexity: Normal
  Component/s: Feature/2i Index
Discovered By: User Report
Fix Version/s: 4.0.x
 Severity: Normal
   Status: Open  (was: Triage Needed)

> 2i created in C* 2.2 with COMPACT STORAGE are not readable from 4.x
> ---
>
> Key: CASSANDRA-19888
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19888
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/2i Index
>Reporter: Romain Anselin
>Priority: Normal
> Fix For: 4.0.x
>
> Attachments: docker_cass_upgrade.sh
>
>
> While attempting to read a historical secondary index created in Cassandra 
> 2.2 and upgraded to 4.0, the index isn't readable.
> The issue can be circumvene by dropping and recreating the 2i but considering 
> COMPACT STORAGE is still covered in 
> Sample schema:
> {code:java}
> CREATE KEYSPACE testkeyspace WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> USE testkeyspace;
> CREATE TABLE repro01 (id text, c2 text, version text, primary key(id)) WITH 
> COMPACT STORAGE;
> INSERT INTO repro01 (id,c2,version) values ('a','b','v01');
> CREATE INDEX idx_repro01 ON repro01 (c2);
> {code}
>  
> Sample output in 2.2 based on script:
> {code:java}
> Waiting for Cassandra 2.2 to start...
> TEST WITH COMPACT STORAGE
>  id | c2 | version
> ++-
>   a |  b | v01
> (1 rows) {code}
>  
> Output in 4.0
> {code:java}
> Waiting for Cassandra 4.0 to start...
> TEST WITH COMPACT STORAGE
>  id | c2 | version
> ++- 
> (0 rows){code}
>  
> Error extracted from log
> {code:java}
> ERROR [SSTableBatchOpen:1] 2024-09-03 14:19:52,118 CassandraDaemon.java:577 - 
> Exception in thread Thread[SSTableBatchOpen:1,5,main]
> java.lang.IllegalStateException: 
> org.apache.cassandra.exceptions.UnknownColumnException: Unknown column value 
> during deserialization
>     at 
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:505)
>     at 
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:372)
>     at 
> org.apache.cassandra.io.sstable.format.SSTableReader$2.run(SSTableReader.java:540)
>     at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown 
> Source)
>     at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
>     at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>     at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>     at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>     at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.cassandra.exceptions.UnknownColumnException: Unknown 
> column value during deserialization
>     at 
> org.apache.cassandra.db.SerializationHeader$Component.toHeader(SerializationHeader.java:337)
>     at 
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:501)
>     ... 8 common frames omitted{code}
> Sample script was created with docker to reproduce the upgrade process and 
> the shell script attached *_docker_cass_upgrade.sh_*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18712) Update Chronicle bytes

2024-08-31 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878391#comment-17878391
 ] 

Brandon Williams commented on CASSANDRA-18712:
--

Ea versions is the least of the problems, I would rank concrete things like the 
phoning home (CASSANDRA-18538, CASSANDRA-19656), flaky tests (CASSANDRA-16526) 
that aren't our fault, and are still flaky (CASSANDRA-18274) as larger 
concerns.  But really it's the phoning home, I don't want to have to keep 
checking on software I ultimately can't trust to be a good actor.

> Update Chronicle bytes
> --
>
> Key: CASSANDRA-18712
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18712
> Project: Cassandra
>  Issue Type: Task
>  Components: Dependencies
>Reporter: Nayana Thorat
>Assignee: Nayana Thorat
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [https://github.com/OpenHFT/Chronicle-Bytes/pull/485] fixes test failures of 
> cassandra on s390x.
> This patch is merged and available in chronicle-bytes-2.24ea7 and later 
> releases.
> possible to update Chronicle bytes and related pkg versions in Cassandra 
> (https://github.com/apache/cassandra/blob/trunk/.build/parent-pom-template.xml)?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18712) Update Chronicle bytes

2024-08-31 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878334#comment-17878334
 ] 

Brandon Williams commented on CASSANDRA-18712:
--

That's the next step to figure out.

> Update Chronicle bytes
> --
>
> Key: CASSANDRA-18712
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18712
> Project: Cassandra
>  Issue Type: Task
>  Components: Dependencies
>Reporter: Nayana Thorat
>Assignee: Nayana Thorat
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [https://github.com/OpenHFT/Chronicle-Bytes/pull/485] fixes test failures of 
> cassandra on s390x.
> This patch is merged and available in chronicle-bytes-2.24ea7 and later 
> releases.
> possible to update Chronicle bytes and related pkg versions in Cassandra 
> (https://github.com/apache/cassandra/blob/trunk/.build/parent-pom-template.xml)?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18712) Update Chronicle bytes

2024-08-31 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878331#comment-17878331
 ] 

Brandon Williams commented on CASSANDRA-18712:
--

For me this is just more fuel for the fire to remove it as expressed in 
CASSANDRA-19656.

> Update Chronicle bytes
> --
>
> Key: CASSANDRA-18712
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18712
> Project: Cassandra
>  Issue Type: Task
>  Components: Dependencies
>Reporter: Nayana Thorat
>Assignee: Nayana Thorat
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [https://github.com/OpenHFT/Chronicle-Bytes/pull/485] fixes test failures of 
> cassandra on s390x.
> This patch is merged and available in chronicle-bytes-2.24ea7 and later 
> releases.
> possible to update Chronicle bytes and related pkg versions in Cassandra 
> (https://github.com/apache/cassandra/blob/trunk/.build/parent-pom-template.xml)?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time

2024-08-30 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19448:
-
Status: Needs Committer  (was: Patch Available)

> CommitlogArchiver only has granularity to seconds for restore_point_in_time
> ---
>
> Key: CASSANDRA-19448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19448
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jeremy Hanna
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Commitlog archiver allows users to backup commitlog files for the purpose of 
> doing point in time restores.  The [configuration 
> file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties]
>  gives an example of down to the seconds granularity but then asks what 
> whether the timestamps are microseconds or milliseconds - defaulting to 
> microseconds.  Because the [CommitLogArchiver uses a second based date 
> format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52],
>  if a user specifies to restore at something at a lower granularity like 
> milliseconds or microseconds, that means that the it will truncate everything 
> after the second and restore to that second.  So say you specify a 
> restore_point_in_time like this:
> restore_point_in_time=2024:01:18 17:01:01.623392
> it will silently truncate everything after the 01 seconds.  So effectively to 
> the user, it is missing updates between 01 and 01.623392.
> This appears to be a bug in the intent.  We should allow users to specify 
> down to the millisecond or even microsecond level. If we allow them to 
> specify down to microseconds for the restore point in time, then it may 
> internally need to change from a long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time

2024-08-30 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19448:
-
Status: Patch Available  (was: Open)

> CommitlogArchiver only has granularity to seconds for restore_point_in_time
> ---
>
> Key: CASSANDRA-19448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19448
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jeremy Hanna
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Commitlog archiver allows users to backup commitlog files for the purpose of 
> doing point in time restores.  The [configuration 
> file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties]
>  gives an example of down to the seconds granularity but then asks what 
> whether the timestamps are microseconds or milliseconds - defaulting to 
> microseconds.  Because the [CommitLogArchiver uses a second based date 
> format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52],
>  if a user specifies to restore at something at a lower granularity like 
> milliseconds or microseconds, that means that the it will truncate everything 
> after the second and restore to that second.  So say you specify a 
> restore_point_in_time like this:
> restore_point_in_time=2024:01:18 17:01:01.623392
> it will silently truncate everything after the 01 seconds.  So effectively to 
> the user, it is missing updates between 01 and 01.623392.
> This appears to be a bug in the intent.  We should allow users to specify 
> down to the millisecond or even microsecond level. If we allow them to 
> specify down to microseconds for the restore point in time, then it may 
> internally need to change from a long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time

2024-08-30 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878183#comment-17878183
 ] 

Brandon Williams commented on CASSANDRA-19448:
--

Ah, I missed that ticket, thank you!  Okay, I think we are good here then. +1 
from me.

> CommitlogArchiver only has granularity to seconds for restore_point_in_time
> ---
>
> Key: CASSANDRA-19448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19448
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jeremy Hanna
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Commitlog archiver allows users to backup commitlog files for the purpose of 
> doing point in time restores.  The [configuration 
> file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties]
>  gives an example of down to the seconds granularity but then asks what 
> whether the timestamps are microseconds or milliseconds - defaulting to 
> microseconds.  Because the [CommitLogArchiver uses a second based date 
> format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52],
>  if a user specifies to restore at something at a lower granularity like 
> milliseconds or microseconds, that means that the it will truncate everything 
> after the second and restore to that second.  So say you specify a 
> restore_point_in_time like this:
> restore_point_in_time=2024:01:18 17:01:01.623392
> it will silently truncate everything after the 01 seconds.  So effectively to 
> the user, it is missing updates between 01 and 01.623392.
> This appears to be a bug in the intent.  We should allow users to specify 
> down to the millisecond or even microsecond level. If we allow them to 
> specify down to microseconds for the restore point in time, then it may 
> internally need to change from a long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time

2024-08-30 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878175#comment-17878175
 ] 

Brandon Williams commented on CASSANDRA-19448:
--

So close... there are some unrelated failures we don't need to be concerned 
with, but the CommitlogShutdownTest failure is one that we do.

> CommitlogArchiver only has granularity to seconds for restore_point_in_time
> ---
>
> Key: CASSANDRA-19448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19448
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jeremy Hanna
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Commitlog archiver allows users to backup commitlog files for the purpose of 
> doing point in time restores.  The [configuration 
> file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties]
>  gives an example of down to the seconds granularity but then asks what 
> whether the timestamps are microseconds or milliseconds - defaulting to 
> microseconds.  Because the [CommitLogArchiver uses a second based date 
> format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52],
>  if a user specifies to restore at something at a lower granularity like 
> milliseconds or microseconds, that means that the it will truncate everything 
> after the second and restore to that second.  So say you specify a 
> restore_point_in_time like this:
> restore_point_in_time=2024:01:18 17:01:01.623392
> it will silently truncate everything after the 01 seconds.  So effectively to 
> the user, it is missing updates between 01 and 01.623392.
> This appears to be a bug in the intent.  We should allow users to specify 
> down to the millisecond or even microsecond level. If we allow them to 
> specify down to microseconds for the restore point in time, then it may 
> internally need to change from a long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time

2024-08-30 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878063#comment-17878063
 ] 

Brandon Williams commented on CASSANDRA-19448:
--

Thanks for rebasing, let's give this one more shot:

||Branch||CI||
|[trunk|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-trunk]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1729/workflows/e6405434-2093-493a-9117-fa09bacbcece],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1729/workflows/2a5f9d70-c97e-4d5c-9bad-aef7ae145ed2]|


> CommitlogArchiver only has granularity to seconds for restore_point_in_time
> ---
>
> Key: CASSANDRA-19448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19448
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jeremy Hanna
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Commitlog archiver allows users to backup commitlog files for the purpose of 
> doing point in time restores.  The [configuration 
> file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties]
>  gives an example of down to the seconds granularity but then asks what 
> whether the timestamps are microseconds or milliseconds - defaulting to 
> microseconds.  Because the [CommitLogArchiver uses a second based date 
> format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52],
>  if a user specifies to restore at something at a lower granularity like 
> milliseconds or microseconds, that means that the it will truncate everything 
> after the second and restore to that second.  So say you specify a 
> restore_point_in_time like this:
> restore_point_in_time=2024:01:18 17:01:01.623392
> it will silently truncate everything after the 01 seconds.  So effectively to 
> the user, it is missing updates between 01 and 01.623392.
> This appears to be a bug in the intent.  We should allow users to specify 
> down to the millisecond or even microsecond level. If we allow them to 
> specify down to microseconds for the restore point in time, then it may 
> internally need to change from a long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time

2024-08-29 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877870#comment-17877870
 ] 

Brandon Williams commented on CASSANDRA-19448:
--

I see CommitLogArchiverTest failing again :(

> CommitlogArchiver only has granularity to seconds for restore_point_in_time
> ---
>
> Key: CASSANDRA-19448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19448
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jeremy Hanna
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Commitlog archiver allows users to backup commitlog files for the purpose of 
> doing point in time restores.  The [configuration 
> file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties]
>  gives an example of down to the seconds granularity but then asks what 
> whether the timestamps are microseconds or milliseconds - defaulting to 
> microseconds.  Because the [CommitLogArchiver uses a second based date 
> format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52],
>  if a user specifies to restore at something at a lower granularity like 
> milliseconds or microseconds, that means that the it will truncate everything 
> after the second and restore to that second.  So say you specify a 
> restore_point_in_time like this:
> restore_point_in_time=2024:01:18 17:01:01.623392
> it will silently truncate everything after the 01 seconds.  So effectively to 
> the user, it is missing updates between 01 and 01.623392.
> This appears to be a bug in the intent.  We should allow users to specify 
> down to the millisecond or even microsecond level. If we allow them to 
> specify down to microseconds for the restore point in time, then it may 
> internally need to change from a long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time

2024-08-29 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877730#comment-17877730
 ] 

Brandon Williams edited comment on CASSANDRA-19448 at 8/29/24 6:26 PM:
---

Alright, let's see:

||Branch||CI||
|[5.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-5.0]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1727/workflows/43cb9261-3a3f-4080-aa5a-ab07796c8c56],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1727/workflows/307314f1-b3f1-4880-a5e8-65db74fd3e60]|
|[trunk|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-trunk]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1728/workflows/3d758072-de95-4216-943c-7af5a559352b],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1728/workflows/e7dd3eb6-1d3b-42d2-bc08-95fa1a6a6efd]|



was (Author: brandon.williams):
Alright, let's see:

||Branch||CI||
|[5.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-5.0]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1725/workflows/fa68f2b7-0a37-4e8b-a0a8-8e344020db8b],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1725/workflows/281db2b0-dce0-43b2-9897-58282452f06a]|
|[trunk|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-trunk]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1726/workflows/a5a32b2b-0261-4e48-b48b-04910c1e2814],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1726/workflows/5eb31062-16eb-4cf1-9390-abe28dbfd78e]|


> CommitlogArchiver only has granularity to seconds for restore_point_in_time
> ---
>
> Key: CASSANDRA-19448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19448
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jeremy Hanna
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Commitlog archiver allows users to backup commitlog files for the purpose of 
> doing point in time restores.  The [configuration 
> file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties]
>  gives an example of down to the seconds granularity but then asks what 
> whether the timestamps are microseconds or milliseconds - defaulting to 
> microseconds.  Because the [CommitLogArchiver uses a second based date 
> format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52],
>  if a user specifies to restore at something at a lower granularity like 
> milliseconds or microseconds, that means that the it will truncate everything 
> after the second and restore to that second.  So say you specify a 
> restore_point_in_time like this:
> restore_point_in_time=2024:01:18 17:01:01.623392
> it will silently truncate everything after the 01 seconds.  So effectively to 
> the user, it is missing updates between 01 and 01.623392.
> This appears to be a bug in the intent.  We should allow users to specify 
> down to the millisecond or even microsecond level. If we allow them to 
> specify down to microseconds for the restore point in time, then it may 
> internally need to change from a long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time

2024-08-29 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877730#comment-17877730
 ] 

Brandon Williams commented on CASSANDRA-19448:
--

Alright, let's see:

||Branch||CI||
|[5.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-5.0]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1725/workflows/fa68f2b7-0a37-4e8b-a0a8-8e344020db8b],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1725/workflows/281db2b0-dce0-43b2-9897-58282452f06a]|
|[trunk|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-trunk]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1726/workflows/a5a32b2b-0261-4e48-b48b-04910c1e2814],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1726/workflows/5eb31062-16eb-4cf1-9390-abe28dbfd78e]|


> CommitlogArchiver only has granularity to seconds for restore_point_in_time
> ---
>
> Key: CASSANDRA-19448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19448
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jeremy Hanna
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Commitlog archiver allows users to backup commitlog files for the purpose of 
> doing point in time restores.  The [configuration 
> file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties]
>  gives an example of down to the seconds granularity but then asks what 
> whether the timestamps are microseconds or milliseconds - defaulting to 
> microseconds.  Because the [CommitLogArchiver uses a second based date 
> format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52],
>  if a user specifies to restore at something at a lower granularity like 
> milliseconds or microseconds, that means that the it will truncate everything 
> after the second and restore to that second.  So say you specify a 
> restore_point_in_time like this:
> restore_point_in_time=2024:01:18 17:01:01.623392
> it will silently truncate everything after the 01 seconds.  So effectively to 
> the user, it is missing updates between 01 and 01.623392.
> This appears to be a bug in the intent.  We should allow users to specify 
> down to the millisecond or even microsecond level. If we allow them to 
> specify down to microseconds for the restore point in time, then it may 
> internally need to change from a long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time

2024-08-27 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877109#comment-17877109
 ] 

Brandon Williams edited comment on CASSANDRA-19448 at 8/27/24 7:13 PM:
---

Rebasing fixed the cqlsh tests, but now we have the CommitLogArchiverTest 
failing :(


was (Author: brandon.williams):
Rebasing fixed the cqlsh tests, but now we are have the CommitLogArchiverTest 
failing :(

> CommitlogArchiver only has granularity to seconds for restore_point_in_time
> ---
>
> Key: CASSANDRA-19448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19448
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jeremy Hanna
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Commitlog archiver allows users to backup commitlog files for the purpose of 
> doing point in time restores.  The [configuration 
> file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties]
>  gives an example of down to the seconds granularity but then asks what 
> whether the timestamps are microseconds or milliseconds - defaulting to 
> microseconds.  Because the [CommitLogArchiver uses a second based date 
> format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52],
>  if a user specifies to restore at something at a lower granularity like 
> milliseconds or microseconds, that means that the it will truncate everything 
> after the second and restore to that second.  So say you specify a 
> restore_point_in_time like this:
> restore_point_in_time=2024:01:18 17:01:01.623392
> it will silently truncate everything after the 01 seconds.  So effectively to 
> the user, it is missing updates between 01 and 01.623392.
> This appears to be a bug in the intent.  We should allow users to specify 
> down to the millisecond or even microsecond level. If we allow them to 
> specify down to microseconds for the restore point in time, then it may 
> internally need to change from a long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time

2024-08-27 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877109#comment-17877109
 ] 

Brandon Williams commented on CASSANDRA-19448:
--

Rebasing fixed the cqlsh tests, but now we are have the CommitLogArchiverTest 
failing :(

> CommitlogArchiver only has granularity to seconds for restore_point_in_time
> ---
>
> Key: CASSANDRA-19448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19448
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jeremy Hanna
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Commitlog archiver allows users to backup commitlog files for the purpose of 
> doing point in time restores.  The [configuration 
> file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties]
>  gives an example of down to the seconds granularity but then asks what 
> whether the timestamps are microseconds or milliseconds - defaulting to 
> microseconds.  Because the [CommitLogArchiver uses a second based date 
> format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52],
>  if a user specifies to restore at something at a lower granularity like 
> milliseconds or microseconds, that means that the it will truncate everything 
> after the second and restore to that second.  So say you specify a 
> restore_point_in_time like this:
> restore_point_in_time=2024:01:18 17:01:01.623392
> it will silently truncate everything after the 01 seconds.  So effectively to 
> the user, it is missing updates between 01 and 01.623392.
> This appears to be a bug in the intent.  We should allow users to specify 
> down to the millisecond or even microsecond level. If we allow them to 
> specify down to microseconds for the restore point in time, then it may 
> internally need to change from a long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time

2024-08-27 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877090#comment-17877090
 ] 

Brandon Williams commented on CASSANDRA-19448:
--

TopPartitionsTest is CASSANDRA-17798 now, previously it was CASSANDRA-17455.  
The rest appears correct to me, but I'm not sure what's going with the cqlsh 
failures.  I've rebased 5.0 and trunk, let's see if they persist:

||Branch||CI||
|[5.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-5.0]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1721/workflows/0ef7ab43-5606-43f9-8427-9547d2e594e5],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1721/workflows/c065040c-6641-429b-9519-74a642f1cebd]|
|[trunk|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-trunk]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1722/workflows/981a522b-076f-4da6-95d7-023b741b9633],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1722/workflows/48bb150d-dd68-4e41-ae0c-008c6bef0e24]|



> CommitlogArchiver only has granularity to seconds for restore_point_in_time
> ---
>
> Key: CASSANDRA-19448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19448
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jeremy Hanna
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Commitlog archiver allows users to backup commitlog files for the purpose of 
> doing point in time restores.  The [configuration 
> file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties]
>  gives an example of down to the seconds granularity but then asks what 
> whether the timestamps are microseconds or milliseconds - defaulting to 
> microseconds.  Because the [CommitLogArchiver uses a second based date 
> format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52],
>  if a user specifies to restore at something at a lower granularity like 
> milliseconds or microseconds, that means that the it will truncate everything 
> after the second and restore to that second.  So say you specify a 
> restore_point_in_time like this:
> restore_point_in_time=2024:01:18 17:01:01.623392
> it will silently truncate everything after the 01 seconds.  So effectively to 
> the user, it is missing updates between 01 and 01.623392.
> This appears to be a bug in the intent.  We should allow users to specify 
> down to the millisecond or even microsecond level. If we allow them to 
> specify down to microseconds for the restore point in time, then it may 
> internally need to change from a long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time

2024-08-26 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876849#comment-17876849
 ] 

Brandon Williams commented on CASSANDRA-19448:
--

We don't need to worry about those, we just only make sure there aren't new 
failures from this patch.

||Branch||CI||
|[4.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-4.0]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1718/workflows/8c905611-0986-4c44-b925-4cab1eb817d1],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1718/workflows/2c4077f0-f123-41c4-aa5e-6f61c1632fa1]|
|[4.1|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-4.1]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1719/workflows/216c35aa-53cb-45ec-975b-954170943371],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1719/workflows/ae11aa63-a55b-4e7f-816d-f73e80d6b455]|
|[5.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-5.0]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1720/workflows/d66fee74-b544-4fe7-beb3-fc92710b4502],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1720/workflows/1bd8307b-1e65-4912-9da7-3509d49ab19a]|
|[trunk|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-trunk]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1717/workflows/76f2613e-e207-48ee-a7f6-9a9f58c9bc74],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1717/workflows/b84a1ea3-810f-4fc4-a38c-b76573039490]|


> CommitlogArchiver only has granularity to seconds for restore_point_in_time
> ---
>
> Key: CASSANDRA-19448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19448
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jeremy Hanna
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Commitlog archiver allows users to backup commitlog files for the purpose of 
> doing point in time restores.  The [configuration 
> file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties]
>  gives an example of down to the seconds granularity but then asks what 
> whether the timestamps are microseconds or milliseconds - defaulting to 
> microseconds.  Because the [CommitLogArchiver uses a second based date 
> format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52],
>  if a user specifies to restore at something at a lower granularity like 
> milliseconds or microseconds, that means that the it will truncate everything 
> after the second and restore to that second.  So say you specify a 
> restore_point_in_time like this:
> restore_point_in_time=2024:01:18 17:01:01.623392
> it will silently truncate everything after the 01 seconds.  So effectively to 
> the user, it is missing updates between 01 and 01.623392.
> This appears to be a bug in the intent.  We should allow users to specify 
> down to the millisecond or even microsecond level. If we allow them to 
> specify down to microseconds for the restore point in time, then it may 
> internally need to change from a long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time

2024-08-19 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874980#comment-17874980
 ] 

Brandon Williams commented on CASSANDRA-19448:
--

Since all branches changed, let's run the full gamut again:

||Branch||CI||
|[4.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-4.0]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1715/workflows/bdb19a99-6390-4f7f-a250-cd84efbc0627],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1715/workflows/6d1904de-1ea0-4c3a-bac0-bb4ac0a8c146]|
|[4.1|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-4.1]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1716/workflows/08926280-235e-4cab-a694-e55be2f955f8],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1716/workflows/a1231bb9-9fbe-42f9-adf2-1b400ca65e28]|
|[5.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-5.0]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1713/workflows/c40f50a1-0772-4ecb-a934-76d0ae102792],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1713/workflows/262b6e3c-0fa5-4b9b-8fdb-cb6e5991c684]|
|[trunk|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-trunk]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1714/workflows/f7bfb8ad-d113-41f8-80b0-fa851d28b0ff],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1714/workflows/33b35768-e97d-42b4-8135-751b72d1ee23]|


> CommitlogArchiver only has granularity to seconds for restore_point_in_time
> ---
>
> Key: CASSANDRA-19448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19448
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jeremy Hanna
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Commitlog archiver allows users to backup commitlog files for the purpose of 
> doing point in time restores.  The [configuration 
> file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties]
>  gives an example of down to the seconds granularity but then asks what 
> whether the timestamps are microseconds or milliseconds - defaulting to 
> microseconds.  Because the [CommitLogArchiver uses a second based date 
> format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52],
>  if a user specifies to restore at something at a lower granularity like 
> milliseconds or microseconds, that means that the it will truncate everything 
> after the second and restore to that second.  So say you specify a 
> restore_point_in_time like this:
> restore_point_in_time=2024:01:18 17:01:01.623392
> it will silently truncate everything after the 01 seconds.  So effectively to 
> the user, it is missing updates between 01 and 01.623392.
> This appears to be a bug in the intent.  We should allow users to specify 
> down to the millisecond or even microsecond level. If we allow them to 
> specify down to microseconds for the restore point in time, then it may 
> internally need to change from a long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18766) high speculative retries on v4.1.3

2024-08-16 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-18766:
-
  Fix Version/s: 4.1.6
 (was: 4.1.x)
  Since Version: 4.1.3
Source Control Link: 
https://github.com/apache/cassandra-dtest/commit/f14da069295aa66e6d1dd638a73ce136e759b1bf
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Thanks for the review! Committed.

> high speculative retries on v4.1.3
> --
>
> Key: CASSANDRA-18766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18766
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Ivans Novikovs
>Assignee: Brandon Williams
>Priority: Urgent
> Fix For: 4.1.6
>
> Attachments: screenshot-1.png, screenshot-2.png, signature.asc, 
> signature.asc, v4.0.png, v4.1.png
>
>
> There are up to 10+ times higher speculative retries for reads on 4.1.3 
> comparing to 4.0.7 and 4.1.2 when using QUORUM and default setting of 99p.
> On 4.1.3 after upgrade I see speculative retries for up to 35% of all reads 
> for specific table. Latency for reads is stable around 500 microseconds.
> java 1.8.0_382 is used



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18766) high speculative retries on v4.1.3

2024-08-16 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-18766:
-
Reviewers: Stefan Miklosovic  (was: Brandon Williams, Stefan Miklosovic)

> high speculative retries on v4.1.3
> --
>
> Key: CASSANDRA-18766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18766
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Ivans Novikovs
>Assignee: Brandon Williams
>Priority: Urgent
> Fix For: 4.1.x
>
> Attachments: screenshot-1.png, screenshot-2.png, signature.asc, 
> signature.asc, v4.0.png, v4.1.png
>
>
> There are up to 10+ times higher speculative retries for reads on 4.1.3 
> comparing to 4.0.7 and 4.1.2 when using QUORUM and default setting of 99p.
> On 4.1.3 after upgrade I see speculative retries for up to 35% of all reads 
> for specific table. Latency for reads is stable around 500 microseconds.
> java 1.8.0_382 is used



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18766) high speculative retries on v4.1.3

2024-08-16 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874343#comment-17874343
 ] 

Brandon Williams commented on CASSANDRA-18766:
--

Here is CI for the dtest, which also confirms the bug is not present in any of 
the branches.

||Branch||CI||
|[4.0|[dtest 
repeat|https://app.circleci.com/pipelines/github/driftx/cassandra/1709/workflows/1601727d-0abc-4028-9db1-8cc67f36382c/jobs/98520]|
|[4.1|[dtest 
repeat|https://app.circleci.com/pipelines/github/driftx/cassandra/1711/workflows/f758aeb2-26e5-46f0-a1cf-a2e2925d040c/jobs/98522]|
|[5.0|[dtest 
repeat|https://app.circleci.com/pipelines/github/driftx/cassandra/1712/workflows/0aaf16bc-48b1-4a24-b056-a3f1aa3e03d9/jobs/98523/]|
|[trunk|[dtest 
repeat|https://app.circleci.com/pipelines/github/driftx/cassandra/1710/workflows/27c6e82b-7a62-4917-b4a0-1c5fcaaa222b/jobs/98525]|


> high speculative retries on v4.1.3
> --
>
> Key: CASSANDRA-18766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18766
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Ivans Novikovs
>Assignee: Brandon Williams
>Priority: Urgent
> Fix For: 4.1.x
>
> Attachments: screenshot-1.png, screenshot-2.png, signature.asc, 
> signature.asc, v4.0.png, v4.1.png
>
>
> There are up to 10+ times higher speculative retries for reads on 4.1.3 
> comparing to 4.0.7 and 4.1.2 when using QUORUM and default setting of 99p.
> On 4.1.3 after upgrade I see speculative retries for up to 35% of all reads 
> for specific table. Latency for reads is stable around 500 microseconds.
> java 1.8.0_382 is used



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18766) high speculative retries on v4.1.3

2024-08-16 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874343#comment-17874343
 ] 

Brandon Williams edited comment on CASSANDRA-18766 at 8/16/24 6:02 PM:
---

Here is CI for the dtest, which also confirms the bug is not present in any of 
the branches.

||Branch||CI||
|4.0|[dtest 
repeat|https://app.circleci.com/pipelines/github/driftx/cassandra/1709/workflows/1601727d-0abc-4028-9db1-8cc67f36382c/jobs/98520]|
|4.1|[dtest 
repeat|https://app.circleci.com/pipelines/github/driftx/cassandra/1711/workflows/f758aeb2-26e5-46f0-a1cf-a2e2925d040c/jobs/98522]|
|5.0|[dtest 
repeat|https://app.circleci.com/pipelines/github/driftx/cassandra/1712/workflows/0aaf16bc-48b1-4a24-b056-a3f1aa3e03d9/jobs/98523/]|
|trunk|[dtest 
repeat|https://app.circleci.com/pipelines/github/driftx/cassandra/1710/workflows/27c6e82b-7a62-4917-b4a0-1c5fcaaa222b/jobs/98525]|



was (Author: brandon.williams):
Here is CI for the dtest, which also confirms the bug is not present in any of 
the branches.

||Branch||CI||
|[4.0|[dtest 
repeat|https://app.circleci.com/pipelines/github/driftx/cassandra/1709/workflows/1601727d-0abc-4028-9db1-8cc67f36382c/jobs/98520]|
|[4.1|[dtest 
repeat|https://app.circleci.com/pipelines/github/driftx/cassandra/1711/workflows/f758aeb2-26e5-46f0-a1cf-a2e2925d040c/jobs/98522]|
|[5.0|[dtest 
repeat|https://app.circleci.com/pipelines/github/driftx/cassandra/1712/workflows/0aaf16bc-48b1-4a24-b056-a3f1aa3e03d9/jobs/98523/]|
|[trunk|[dtest 
repeat|https://app.circleci.com/pipelines/github/driftx/cassandra/1710/workflows/27c6e82b-7a62-4917-b4a0-1c5fcaaa222b/jobs/98525]|


> high speculative retries on v4.1.3
> --
>
> Key: CASSANDRA-18766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18766
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Ivans Novikovs
>Assignee: Brandon Williams
>Priority: Urgent
> Fix For: 4.1.x
>
> Attachments: screenshot-1.png, screenshot-2.png, signature.asc, 
> signature.asc, v4.0.png, v4.1.png
>
>
> There are up to 10+ times higher speculative retries for reads on 4.1.3 
> comparing to 4.0.7 and 4.1.2 when using QUORUM and default setting of 99p.
> On 4.1.3 after upgrade I see speculative retries for up to 35% of all reads 
> for specific table. Latency for reads is stable around 500 microseconds.
> java 1.8.0_382 is used



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19785) Possible memory leak in BTree.FastBuilder

2024-08-16 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19785:
-
Test and Documentation Plan: run CI
 Status: Patch Available  (was: Open)

> Possible memory leak in BTree.FastBuilder 
> --
>
> Key: CASSANDRA-19785
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19785
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Paul Chandler
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
> Attachments: image-2024-07-19-08-44-56-714.png, 
> image-2024-07-19-08-45-17-289.png, image-2024-07-19-08-45-33-933.png, 
> image-2024-07-19-08-45-50-383.png, image-2024-07-19-08-46-06-919.png, 
> image-2024-07-19-08-46-42-979.png, image-2024-07-19-08-46-56-594.png, 
> image-2024-07-19-08-47-19-517.png, image-2024-07-19-08-47-34-582.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are having a problem with the heap growing in size, This is a large 
> cluster > 1,000 nodes across a large number of dc’s. This is running version 
> 4.0.11.
>  
> Each node has a 32GB heap, and the amount used continues to grow until it 
> reaches 30GB, it then struggles with multiple Full GC pauses, as can be seen 
> here:
> !image-2024-07-19-08-44-56-714.png!
> We took 2 heap dumps on one node a few days after it was restarted, and the 
> heap had grown by 2.7GB
>  
> 9{^}th{^} July
> !image-2024-07-19-08-45-17-289.png!
> 11{^}th{^} July
> !image-2024-07-19-08-45-33-933.png!
> This can be seen as mainly an increase of memory used by 
> FastThreadLocalThread, increasing from 5.92GB to 8.53GB
> !image-2024-07-19-08-45-50-383.png!
> !image-2024-07-19-08-46-06-919.png!
> Looking deeper into this it can be seen that the growing heap is contained 
> within the threads for the MutationStage, Native-transport-Requests, 
> ReadStage etc. We would expect the memory used within these threads to be 
> short lived, and not grow as time goes on.  We recently increased the size of 
> theses threadpools, and that has increased the size of the problem.
>  
> Top memory usage for FastThreadLocalThread
> 9{^}th{^} July
> !image-2024-07-19-08-46-42-979.png!
> 11{^}th{^} July
> !image-2024-07-19-08-46-56-594.png!
> This has led us to investigate whether there could be a memory leak, and we 
> have found the following issues within the retained references in 
> BTree.FastBuilder objects. The issue appears to stem from the reset() method, 
> which does not properly clear all buffers.  We are not really sure how the 
> BTree.FastBuilder works, but this this is our analysis of where a leak might 
> occur.
>  
> Specifically:
> Leaf Buffer Not Being Cleared:
> When leaf().count is 0, the statement Arrays.fill(leaf().buffer, 0, 
> leaf().count, null); does not clear the buffer because the end index is 0. 
> This leaves the buffer with references to potentially large objects, 
> preventing garbage collection and increasing heap usage.
> Branch inUse Property:
> If the inUse property of the branch is set to false elsewhere in the code, 
> the while loop while (branch != null && branch.inUse) does not execute, 
> resulting in uncleared branch buffers and retained references.
>  
> This is based on the following observations:
>     Heap Dumps: Analysis of heap dumps shows that leaf().count is often 0, 
> and as a result, the buffer is not being cleared, leading to high heap 
> utilization.
> !image-2024-07-19-08-47-19-517.png!
>     Remote Debugging: Debugging sessions indicate that the drain() method 
> sets count to 0, and the inUse flag for the parent branch is set to false, 
> preventing the while loop in reset() from clearing the branch buffers.
> !image-2024-07-19-08-47-34-582.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19785) Possible memory leak in BTree.FastBuilder

2024-08-16 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19785:
-
Fix Version/s: 4.1.x
   5.0.x
   5.x

> Possible memory leak in BTree.FastBuilder 
> --
>
> Key: CASSANDRA-19785
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19785
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Paul Chandler
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
> Attachments: image-2024-07-19-08-44-56-714.png, 
> image-2024-07-19-08-45-17-289.png, image-2024-07-19-08-45-33-933.png, 
> image-2024-07-19-08-45-50-383.png, image-2024-07-19-08-46-06-919.png, 
> image-2024-07-19-08-46-42-979.png, image-2024-07-19-08-46-56-594.png, 
> image-2024-07-19-08-47-19-517.png, image-2024-07-19-08-47-34-582.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are having a problem with the heap growing in size, This is a large 
> cluster > 1,000 nodes across a large number of dc’s. This is running version 
> 4.0.11.
>  
> Each node has a 32GB heap, and the amount used continues to grow until it 
> reaches 30GB, it then struggles with multiple Full GC pauses, as can be seen 
> here:
> !image-2024-07-19-08-44-56-714.png!
> We took 2 heap dumps on one node a few days after it was restarted, and the 
> heap had grown by 2.7GB
>  
> 9{^}th{^} July
> !image-2024-07-19-08-45-17-289.png!
> 11{^}th{^} July
> !image-2024-07-19-08-45-33-933.png!
> This can be seen as mainly an increase of memory used by 
> FastThreadLocalThread, increasing from 5.92GB to 8.53GB
> !image-2024-07-19-08-45-50-383.png!
> !image-2024-07-19-08-46-06-919.png!
> Looking deeper into this it can be seen that the growing heap is contained 
> within the threads for the MutationStage, Native-transport-Requests, 
> ReadStage etc. We would expect the memory used within these threads to be 
> short lived, and not grow as time goes on.  We recently increased the size of 
> theses threadpools, and that has increased the size of the problem.
>  
> Top memory usage for FastThreadLocalThread
> 9{^}th{^} July
> !image-2024-07-19-08-46-42-979.png!
> 11{^}th{^} July
> !image-2024-07-19-08-46-56-594.png!
> This has led us to investigate whether there could be a memory leak, and we 
> have found the following issues within the retained references in 
> BTree.FastBuilder objects. The issue appears to stem from the reset() method, 
> which does not properly clear all buffers.  We are not really sure how the 
> BTree.FastBuilder works, but this this is our analysis of where a leak might 
> occur.
>  
> Specifically:
> Leaf Buffer Not Being Cleared:
> When leaf().count is 0, the statement Arrays.fill(leaf().buffer, 0, 
> leaf().count, null); does not clear the buffer because the end index is 0. 
> This leaves the buffer with references to potentially large objects, 
> preventing garbage collection and increasing heap usage.
> Branch inUse Property:
> If the inUse property of the branch is set to false elsewhere in the code, 
> the while loop while (branch != null && branch.inUse) does not execute, 
> resulting in uncleared branch buffers and retained references.
>  
> This is based on the following observations:
>     Heap Dumps: Analysis of heap dumps shows that leaf().count is often 0, 
> and as a result, the buffer is not being cleared, leading to high heap 
> utilization.
> !image-2024-07-19-08-47-19-517.png!
>     Remote Debugging: Debugging sessions indicate that the drain() method 
> sets count to 0, and the inUse flag for the parent branch is set to false, 
> preventing the while loop in reset() from clearing the branch buffers.
> !image-2024-07-19-08-47-34-582.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19785) Possible memory leak in BTree.FastBuilder

2024-08-16 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874255#comment-17874255
 ] 

Brandon Williams commented on CASSANDRA-19785:
--

CI looks good, all failures are known.  As for a reviewer... maybe [~blerer], 
[~bdeggleston], or [~blambov] can help?

> Possible memory leak in BTree.FastBuilder 
> --
>
> Key: CASSANDRA-19785
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19785
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Paul Chandler
>Priority: Normal
> Fix For: 4.0.x
>
> Attachments: image-2024-07-19-08-44-56-714.png, 
> image-2024-07-19-08-45-17-289.png, image-2024-07-19-08-45-33-933.png, 
> image-2024-07-19-08-45-50-383.png, image-2024-07-19-08-46-06-919.png, 
> image-2024-07-19-08-46-42-979.png, image-2024-07-19-08-46-56-594.png, 
> image-2024-07-19-08-47-19-517.png, image-2024-07-19-08-47-34-582.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are having a problem with the heap growing in size, This is a large 
> cluster > 1,000 nodes across a large number of dc’s. This is running version 
> 4.0.11.
>  
> Each node has a 32GB heap, and the amount used continues to grow until it 
> reaches 30GB, it then struggles with multiple Full GC pauses, as can be seen 
> here:
> !image-2024-07-19-08-44-56-714.png!
> We took 2 heap dumps on one node a few days after it was restarted, and the 
> heap had grown by 2.7GB
>  
> 9{^}th{^} July
> !image-2024-07-19-08-45-17-289.png!
> 11{^}th{^} July
> !image-2024-07-19-08-45-33-933.png!
> This can be seen as mainly an increase of memory used by 
> FastThreadLocalThread, increasing from 5.92GB to 8.53GB
> !image-2024-07-19-08-45-50-383.png!
> !image-2024-07-19-08-46-06-919.png!
> Looking deeper into this it can be seen that the growing heap is contained 
> within the threads for the MutationStage, Native-transport-Requests, 
> ReadStage etc. We would expect the memory used within these threads to be 
> short lived, and not grow as time goes on.  We recently increased the size of 
> theses threadpools, and that has increased the size of the problem.
>  
> Top memory usage for FastThreadLocalThread
> 9{^}th{^} July
> !image-2024-07-19-08-46-42-979.png!
> 11{^}th{^} July
> !image-2024-07-19-08-46-56-594.png!
> This has led us to investigate whether there could be a memory leak, and we 
> have found the following issues within the retained references in 
> BTree.FastBuilder objects. The issue appears to stem from the reset() method, 
> which does not properly clear all buffers.  We are not really sure how the 
> BTree.FastBuilder works, but this this is our analysis of where a leak might 
> occur.
>  
> Specifically:
> Leaf Buffer Not Being Cleared:
> When leaf().count is 0, the statement Arrays.fill(leaf().buffer, 0, 
> leaf().count, null); does not clear the buffer because the end index is 0. 
> This leaves the buffer with references to potentially large objects, 
> preventing garbage collection and increasing heap usage.
> Branch inUse Property:
> If the inUse property of the branch is set to false elsewhere in the code, 
> the while loop while (branch != null && branch.inUse) does not execute, 
> resulting in uncleared branch buffers and retained references.
>  
> This is based on the following observations:
>     Heap Dumps: Analysis of heap dumps shows that leaf().count is often 0, 
> and as a result, the buffer is not being cleared, leading to high heap 
> utilization.
> !image-2024-07-19-08-47-19-517.png!
>     Remote Debugging: Debugging sessions indicate that the drain() method 
> sets count to 0, and the inUse flag for the parent branch is set to false, 
> preventing the while loop in reset() from clearing the branch buffers.
> !image-2024-07-19-08-47-34-582.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19785) Possible memory leak in BTree.FastBuilder

2024-08-16 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874184#comment-17874184
 ] 

Brandon Williams commented on CASSANDRA-19785:
--

Thanks for the patch, Benedict! I'll do what I can to move this along.  Here's 
CI:

||Branch||CI||
|[4.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19785-4.0]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1707/workflows/8f339da4-78ee-4024-81c9-7b26c940a93d],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1707/workflows/22f1cd83-e563-48c7-9877-febb5c617067]|
|[4.1|https://github.com/driftx/cassandra/tree/CASSANDRA-19785-4.1]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1706/workflows/fb5cb598-f96f-4785-a22b-0a808bde7814],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1706/workflows/7c086e14-dfac-47c5-97ce-af9d3216149a]|
|[5.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19785-5.0]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1705/workflows/e0ecc857-0157-447a-8851-27f65e6a2f9c],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1705/workflows/f3be8609-490a-459e-b080-2a3b11481a2d]|
|[trunk|https://github.com/driftx/cassandra/tree/CASSANDRA-19785-trunk]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1708/workflows/c3067071-82a3-4122-9af3-bb448ed481fb],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1708/workflows/f825016f-0342-45b7-ad17-db745e51c341]|


> Possible memory leak in BTree.FastBuilder 
> --
>
> Key: CASSANDRA-19785
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19785
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Paul Chandler
>Priority: Normal
> Fix For: 4.0.x
>
> Attachments: image-2024-07-19-08-44-56-714.png, 
> image-2024-07-19-08-45-17-289.png, image-2024-07-19-08-45-33-933.png, 
> image-2024-07-19-08-45-50-383.png, image-2024-07-19-08-46-06-919.png, 
> image-2024-07-19-08-46-42-979.png, image-2024-07-19-08-46-56-594.png, 
> image-2024-07-19-08-47-19-517.png, image-2024-07-19-08-47-34-582.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are having a problem with the heap growing in size, This is a large 
> cluster > 1,000 nodes across a large number of dc’s. This is running version 
> 4.0.11.
>  
> Each node has a 32GB heap, and the amount used continues to grow until it 
> reaches 30GB, it then struggles with multiple Full GC pauses, as can be seen 
> here:
> !image-2024-07-19-08-44-56-714.png!
> We took 2 heap dumps on one node a few days after it was restarted, and the 
> heap had grown by 2.7GB
>  
> 9{^}th{^} July
> !image-2024-07-19-08-45-17-289.png!
> 11{^}th{^} July
> !image-2024-07-19-08-45-33-933.png!
> This can be seen as mainly an increase of memory used by 
> FastThreadLocalThread, increasing from 5.92GB to 8.53GB
> !image-2024-07-19-08-45-50-383.png!
> !image-2024-07-19-08-46-06-919.png!
> Looking deeper into this it can be seen that the growing heap is contained 
> within the threads for the MutationStage, Native-transport-Requests, 
> ReadStage etc. We would expect the memory used within these threads to be 
> short lived, and not grow as time goes on.  We recently increased the size of 
> theses threadpools, and that has increased the size of the problem.
>  
> Top memory usage for FastThreadLocalThread
> 9{^}th{^} July
> !image-2024-07-19-08-46-42-979.png!
> 11{^}th{^} July
> !image-2024-07-19-08-46-56-594.png!
> This has led us to investigate whether there could be a memory leak, and we 
> have found the following issues within the retained references in 
> BTree.FastBuilder objects. The issue appears to stem from the reset() method, 
> which does not properly clear all buffers.  We are not really sure how the 
> BTree.FastBuilder works, but this this is our analysis of where a leak might 
> occur.
>  
> Specifically:
> Leaf Buffer Not Being Cleared:
> When leaf().count is 0, the statement Arrays.fill(leaf().buffer, 0, 
> leaf().count, null); does not clear the buffer because the end index is 0. 
> This leaves the buffer with references to potentially large objects, 
> preventing garbage collection and increasing heap usage.
> Branch inUse Property:
> If the inUse property of the branch is set to false elsewhere in the code, 
> the while loop while (branch != null && branch.inUse) does not execute, 
> resulting in uncleared branch buffers and retained references.
>  
> This is based on the following observations:
>     Heap Dumps: Analysis of heap dumps shows that leaf().count is often 0, 
> and as a result, the buffer is not being cleared, leading to high heap 
> utilization.
> !image-2024-07-19-08-47-19-517.png!
>     Remote Debugging: Debugging ses

[jira] [Commented] (CASSANDRA-19812) We should throw exception when commitlog 's DiskAccessMode is direct but direct io is not support

2024-08-15 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873972#comment-17873972
 ] 

Brandon Williams commented on CASSANDRA-19812:
--

I don't have a strong feeling either way.

> We should throw exception when commitlog 's DiskAccessMode is direct but 
> direct io is not support
> -
>
> Key: CASSANDRA-19812
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19812
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Commit Log
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> Looking into the code below : 
> {code:java}
> private static DiskAccessMode 
> resolveCommitLogWriteDiskAccessMode(DiskAccessMode providedDiskAccessMode)
> {
> boolean compressOrEncrypt = getCommitLogCompression() != null || 
> (getEncryptionContext() != null && getEncryptionContext().isEnabled());
> boolean directIOSupported = false;
> try
> {
> directIOSupported = FileUtils.getBlockSize(new 
> File(getCommitLogLocation())) > 0;
> }
> catch (RuntimeException e)
> {
> logger.warn("Unable to determine block size for commit log 
> directory: {}", e.getMessage());
> }
> if (providedDiskAccessMode == DiskAccessMode.auto)
> {
> if (compressOrEncrypt)
> providedDiskAccessMode = DiskAccessMode.legacy;
> else
> {
> providedDiskAccessMode = directIOSupported && 
> conf.disk_optimization_strategy == Config.DiskOptimizationStrategy.ssd ? 
> DiskAccessMode.direct
>   
>: 
> DiskAccessMode.legacy;
> }
> }
> if (providedDiskAccessMode == DiskAccessMode.legacy)
> {
> providedDiskAccessMode = compressOrEncrypt ? 
> DiskAccessMode.standard : DiskAccessMode.mmap;
> }
> return providedDiskAccessMode;
> }
> {code}
> We should throw exception when user set the DiskAccessMode to direct for 
> commitlog but the directIOSupported return false after the judgement of 
> "FileUtils.getBlockSize(new File(getCommitLogLocation())) > 0;" instead of 
> waiting for the system to start and accepting reads and writes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19812) We should throw exception when commitlog 's DiskAccessMode is direct but direct io is not support

2024-08-15 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873952#comment-17873952
 ] 

Brandon Williams commented on CASSANDRA-19812:
--

If it's going to fail to start I think we should put it in 5.0.0

> We should throw exception when commitlog 's DiskAccessMode is direct but 
> direct io is not support
> -
>
> Key: CASSANDRA-19812
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19812
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Commit Log
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> Looking into the code below : 
> {code:java}
> private static DiskAccessMode 
> resolveCommitLogWriteDiskAccessMode(DiskAccessMode providedDiskAccessMode)
> {
> boolean compressOrEncrypt = getCommitLogCompression() != null || 
> (getEncryptionContext() != null && getEncryptionContext().isEnabled());
> boolean directIOSupported = false;
> try
> {
> directIOSupported = FileUtils.getBlockSize(new 
> File(getCommitLogLocation())) > 0;
> }
> catch (RuntimeException e)
> {
> logger.warn("Unable to determine block size for commit log 
> directory: {}", e.getMessage());
> }
> if (providedDiskAccessMode == DiskAccessMode.auto)
> {
> if (compressOrEncrypt)
> providedDiskAccessMode = DiskAccessMode.legacy;
> else
> {
> providedDiskAccessMode = directIOSupported && 
> conf.disk_optimization_strategy == Config.DiskOptimizationStrategy.ssd ? 
> DiskAccessMode.direct
>   
>: 
> DiskAccessMode.legacy;
> }
> }
> if (providedDiskAccessMode == DiskAccessMode.legacy)
> {
> providedDiskAccessMode = compressOrEncrypt ? 
> DiskAccessMode.standard : DiskAccessMode.mmap;
> }
> return providedDiskAccessMode;
> }
> {code}
> We should throw exception when user set the DiskAccessMode to direct for 
> commitlog but the directIOSupported return false after the judgement of 
> "FileUtils.getBlockSize(new File(getCommitLogLocation())) > 0;" instead of 
> waiting for the system to start and accepting reads and writes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19812) We should throw exception when commitlog 's DiskAccessMode is direct but direct io is not support

2024-08-15 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873940#comment-17873940
 ] 

Brandon Williams edited comment on CASSANDRA-19812 at 8/15/24 12:50 PM:


That would seem like a bug, and since this is a follow up to CASSANDRA-19779 I 
think it's fine to put in 5.0.0 if we want, but should at least go to 5.0.1.


was (Author: brandon.williams):
That would seem like a bug, and since this is a follow up to CASSANDRA-19779 I 
think it's fine to put in 5.0.0 if we want.

> We should throw exception when commitlog 's DiskAccessMode is direct but 
> direct io is not support
> -
>
> Key: CASSANDRA-19812
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19812
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Commit Log
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> Looking into the code below : 
> {code:java}
> private static DiskAccessMode 
> resolveCommitLogWriteDiskAccessMode(DiskAccessMode providedDiskAccessMode)
> {
> boolean compressOrEncrypt = getCommitLogCompression() != null || 
> (getEncryptionContext() != null && getEncryptionContext().isEnabled());
> boolean directIOSupported = false;
> try
> {
> directIOSupported = FileUtils.getBlockSize(new 
> File(getCommitLogLocation())) > 0;
> }
> catch (RuntimeException e)
> {
> logger.warn("Unable to determine block size for commit log 
> directory: {}", e.getMessage());
> }
> if (providedDiskAccessMode == DiskAccessMode.auto)
> {
> if (compressOrEncrypt)
> providedDiskAccessMode = DiskAccessMode.legacy;
> else
> {
> providedDiskAccessMode = directIOSupported && 
> conf.disk_optimization_strategy == Config.DiskOptimizationStrategy.ssd ? 
> DiskAccessMode.direct
>   
>: 
> DiskAccessMode.legacy;
> }
> }
> if (providedDiskAccessMode == DiskAccessMode.legacy)
> {
> providedDiskAccessMode = compressOrEncrypt ? 
> DiskAccessMode.standard : DiskAccessMode.mmap;
> }
> return providedDiskAccessMode;
> }
> {code}
> We should throw exception when user set the DiskAccessMode to direct for 
> commitlog but the directIOSupported return false after the judgement of 
> "FileUtils.getBlockSize(new File(getCommitLogLocation())) > 0;" instead of 
> waiting for the system to start and accepting reads and writes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19812) We should throw exception when commitlog 's DiskAccessMode is direct but direct io is not support

2024-08-15 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19812:
-
Fix Version/s: 5.0.x

> We should throw exception when commitlog 's DiskAccessMode is direct but 
> direct io is not support
> -
>
> Key: CASSANDRA-19812
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19812
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Commit Log
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> Looking into the code below : 
> {code:java}
> private static DiskAccessMode 
> resolveCommitLogWriteDiskAccessMode(DiskAccessMode providedDiskAccessMode)
> {
> boolean compressOrEncrypt = getCommitLogCompression() != null || 
> (getEncryptionContext() != null && getEncryptionContext().isEnabled());
> boolean directIOSupported = false;
> try
> {
> directIOSupported = FileUtils.getBlockSize(new 
> File(getCommitLogLocation())) > 0;
> }
> catch (RuntimeException e)
> {
> logger.warn("Unable to determine block size for commit log 
> directory: {}", e.getMessage());
> }
> if (providedDiskAccessMode == DiskAccessMode.auto)
> {
> if (compressOrEncrypt)
> providedDiskAccessMode = DiskAccessMode.legacy;
> else
> {
> providedDiskAccessMode = directIOSupported && 
> conf.disk_optimization_strategy == Config.DiskOptimizationStrategy.ssd ? 
> DiskAccessMode.direct
>   
>: 
> DiskAccessMode.legacy;
> }
> }
> if (providedDiskAccessMode == DiskAccessMode.legacy)
> {
> providedDiskAccessMode = compressOrEncrypt ? 
> DiskAccessMode.standard : DiskAccessMode.mmap;
> }
> return providedDiskAccessMode;
> }
> {code}
> We should throw exception when user set the DiskAccessMode to direct for 
> commitlog but the directIOSupported return false after the judgement of 
> "FileUtils.getBlockSize(new File(getCommitLogLocation())) > 0;" instead of 
> waiting for the system to start and accepting reads and writes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19812) We should throw exception when commitlog 's DiskAccessMode is direct but direct io is not support

2024-08-15 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873940#comment-17873940
 ] 

Brandon Williams commented on CASSANDRA-19812:
--

That would seem like a bug, and since this is a follow up to CASSANDRA-19779 I 
think it's fine to put in 5.0.0 if we want.

> We should throw exception when commitlog 's DiskAccessMode is direct but 
> direct io is not support
> -
>
> Key: CASSANDRA-19812
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19812
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Commit Log
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 5.x
>
>
> Looking into the code below : 
> {code:java}
> private static DiskAccessMode 
> resolveCommitLogWriteDiskAccessMode(DiskAccessMode providedDiskAccessMode)
> {
> boolean compressOrEncrypt = getCommitLogCompression() != null || 
> (getEncryptionContext() != null && getEncryptionContext().isEnabled());
> boolean directIOSupported = false;
> try
> {
> directIOSupported = FileUtils.getBlockSize(new 
> File(getCommitLogLocation())) > 0;
> }
> catch (RuntimeException e)
> {
> logger.warn("Unable to determine block size for commit log 
> directory: {}", e.getMessage());
> }
> if (providedDiskAccessMode == DiskAccessMode.auto)
> {
> if (compressOrEncrypt)
> providedDiskAccessMode = DiskAccessMode.legacy;
> else
> {
> providedDiskAccessMode = directIOSupported && 
> conf.disk_optimization_strategy == Config.DiskOptimizationStrategy.ssd ? 
> DiskAccessMode.direct
>   
>: 
> DiskAccessMode.legacy;
> }
> }
> if (providedDiskAccessMode == DiskAccessMode.legacy)
> {
> providedDiskAccessMode = compressOrEncrypt ? 
> DiskAccessMode.standard : DiskAccessMode.mmap;
> }
> return providedDiskAccessMode;
> }
> {code}
> We should throw exception when user set the DiskAccessMode to direct for 
> commitlog but the directIOSupported return false after the judgement of 
> "FileUtils.getBlockSize(new File(getCommitLogLocation())) > 0;" instead of 
> waiting for the system to start and accepting reads and writes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19835) Memtable allocation type unslabbed_heap_buffers_logged will cause an assertion error for TrieMemtables and SegmentedTrieMemtables

2024-08-14 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873670#comment-17873670
 ] 

Brandon Williams commented on CASSANDRA-19835:
--

+1

> Memtable allocation type unslabbed_heap_buffers_logged will cause an 
> assertion error for TrieMemtables and SegmentedTrieMemtables
> -
>
> Key: CASSANDRA-19835
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19835
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Memtable
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 5.0-rc2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Config used
> {code}
>   ---
>   partitioner: Murmur3Partitioner
>   commitlog_sync: periodic
>   commitlog_sync_period: 9000ms
>   commitlog_disk_access_mode: legacy
>   memtable_allocation_type: unslabbed_heap_buffers_logged
>   sstable:
> selected_format: big
>   disk_access_mode: standard
> {code}
> Error
> {code}
> Caused by: java.lang.AssertionError: null
>   at 
> org.apache.cassandra.config.Config$MemtableAllocationType.toBufferType(Config.java:1206)
>   at 
> org.apache.cassandra.index.sai.disk.v1.segment.SegmentTrieBuffer.(SegmentTrieBuffer.java:48)
>   at 
> org.apache.cassandra.index.sai.disk.v1.segment.SegmentBuilder$TrieSegmentBuilder.(SegmentBuilder.java:83)
>   at 
> org.apache.cassandra.index.sai.disk.v1.SSTableIndexWriter.newSegmentBuilder(SSTableIndexWriter.java:311)
>   at 
> org.apache.cassandra.index.sai.disk.v1.SSTableIndexWriter.addTerm(SSTableIndexWriter.java:195)
>   at 
> org.apache.cassandra.index.sai.disk.v1.SSTableIndexWriter.addRow(SSTableIndexWriter.java:99)
>   at 
> org.apache.cassandra.index.sai.disk.StorageAttachedIndexWriter.addRow(StorageAttachedIndexWriter.java:257)
>   at 
> org.apache.cassandra.index.sai.disk.StorageAttachedIndexWriter.nextUnfilteredCluster(StorageAttachedIndexWriter.java:131)
> {code}
> This was found by CASSANDRA-19833



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18766) high speculative retries on v4.1.3

2024-08-14 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-18766:
-
Test and Documentation Plan: add test
 Status: Patch Available  (was: Open)

> high speculative retries on v4.1.3
> --
>
> Key: CASSANDRA-18766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18766
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Ivans Novikovs
>Assignee: Brandon Williams
>Priority: Urgent
> Fix For: 4.1.x
>
> Attachments: screenshot-1.png, screenshot-2.png, signature.asc, 
> signature.asc, v4.0.png, v4.1.png
>
>
> There are up to 10+ times higher speculative retries for reads on 4.1.3 
> comparing to 4.0.7 and 4.1.2 when using QUORUM and default setting of 99p.
> On 4.1.3 after upgrade I see speculative retries for up to 35% of all reads 
> for specific table. Latency for reads is stable around 500 microseconds.
> java 1.8.0_382 is used



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18766) high speculative retries on v4.1.3

2024-08-14 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873600#comment-17873600
 ] 

Brandon Williams commented on CASSANDRA-18766:
--

This bug was introduced in CASSANDRA-15241 which explains why it started in 
4.1.3.  However, CASSANDRA-19534 happened to fix it, so no code changes are 
needed, just a release.  /cc [~maedhroz] since you were involved in both of 
those!

I've pared down the 
[dtest|https://github.com/driftx/cassandra-dtest/tree/CASSANDRA-18766] to not 
run so long but still reproduce (under 2m), and confirmed it still bisects 
correctly, so I think it's suitable to commit now.  

I'll work on getting 4.1.6 released for this.

> high speculative retries on v4.1.3
> --
>
> Key: CASSANDRA-18766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18766
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Ivans Novikovs
>Assignee: Brandon Williams
>Priority: Urgent
> Fix For: 4.1.x
>
> Attachments: screenshot-1.png, screenshot-2.png, signature.asc, 
> signature.asc, v4.0.png, v4.1.png
>
>
> There are up to 10+ times higher speculative retries for reads on 4.1.3 
> comparing to 4.0.7 and 4.1.2 when using QUORUM and default setting of 99p.
> On 4.1.3 after upgrade I see speculative retries for up to 35% of all reads 
> for specific table. Latency for reads is stable around 500 microseconds.
> java 1.8.0_382 is used



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time

2024-08-14 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873572#comment-17873572
 ] 

Brandon Williams commented on CASSANDRA-19448:
--

Unfortunately it looks like there are still a handful of 
[failures|https://app.circleci.com/pipelines/github/driftx/cassandra/1704/workflows/bd8b0614-0b2a-4231-aca7-22688e0a06b8/jobs/97629/tests].

> CommitlogArchiver only has granularity to seconds for restore_point_in_time
> ---
>
> Key: CASSANDRA-19448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19448
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jeremy Hanna
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Commitlog archiver allows users to backup commitlog files for the purpose of 
> doing point in time restores.  The [configuration 
> file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties]
>  gives an example of down to the seconds granularity but then asks what 
> whether the timestamps are microseconds or milliseconds - defaulting to 
> microseconds.  Because the [CommitLogArchiver uses a second based date 
> format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52],
>  if a user specifies to restore at something at a lower granularity like 
> milliseconds or microseconds, that means that the it will truncate everything 
> after the second and restore to that second.  So say you specify a 
> restore_point_in_time like this:
> restore_point_in_time=2024:01:18 17:01:01.623392
> it will silently truncate everything after the 01 seconds.  So effectively to 
> the user, it is missing updates between 01 and 01.623392.
> This appears to be a bug in the intent.  We should allow users to specify 
> down to the millisecond or even microsecond level. If we allow them to 
> specify down to microseconds for the restore point in time, then it may 
> internally need to change from a long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18766) high speculative retries on v4.1.3

2024-08-13 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873298#comment-17873298
 ] 

Brandon Williams edited comment on CASSANDRA-18766 at 8/13/24 9:47 PM:
---

I was hoping this would provide a smoking gun that I could use to bisect and 
finally find the culprit, so I converted this to a dtest 
[here|https://github.com/driftx/cassandra-dtest/tree/CASSANDRA-18766].  
Unfortunately however, this doesn't reproduce for me, on any of 4.0, 4.1, or 
5.0 when run on isolated hardware.  I also checked the raw SR numbers from each 
and they are comparable across the branches.

-[~aswinkarthik] was this run on isolated hardware and does this consistently 
reproduce for you?-

edit: I modified the test to check SR on all the nodes, and now I'm seeing 
failures.


was (Author: brandon.williams):
I was hoping this would provide a smoking gun that I could use to bisect and 
finally find the culprit, so I converted this to a dtest 
[here|https://github.com/driftx/cassandra-dtest/commit/68cb71920c9242aeb0e0cc3eb7e998175d9b5ccb].
  Unfortunately however, this doesn't reproduce for me, on any of 4.0, 4.1, or 
5.0 when run on isolated hardware.  I also checked the raw SR numbers from each 
and they are comparable across the branches.

-[~aswinkarthik] was this run on isolated hardware and does this consistently 
reproduce for you?-

edit: I modified the test to check SR on all the nodes, and now I'm seeing 
failures.

> high speculative retries on v4.1.3
> --
>
> Key: CASSANDRA-18766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18766
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Ivans Novikovs
>Assignee: Brandon Williams
>Priority: Urgent
> Fix For: 4.1.x
>
> Attachments: screenshot-1.png, screenshot-2.png, signature.asc, 
> signature.asc, v4.0.png, v4.1.png
>
>
> There are up to 10+ times higher speculative retries for reads on 4.1.3 
> comparing to 4.0.7 and 4.1.2 when using QUORUM and default setting of 99p.
> On 4.1.3 after upgrade I see speculative retries for up to 35% of all reads 
> for specific table. Latency for reads is stable around 500 microseconds.
> java 1.8.0_382 is used



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18766) high speculative retries on v4.1.3

2024-08-13 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873298#comment-17873298
 ] 

Brandon Williams edited comment on CASSANDRA-18766 at 8/13/24 9:45 PM:
---

I was hoping this would provide a smoking gun that I could use to bisect and 
finally find the culprit, so I converted this to a dtest 
[here|https://github.com/driftx/cassandra-dtest/commit/68cb71920c9242aeb0e0cc3eb7e998175d9b5ccb].
  Unfortunately however, this doesn't reproduce for me, on any of 4.0, 4.1, or 
5.0 when run on isolated hardware.  I also checked the raw SR numbers from each 
and they are comparable across the branches.

-[~aswinkarthik] was this run on isolated hardware and does this consistently 
reproduce for you?-

edit: I modified the test to check SR on all the nodes, and now I'm seeing 
failures.


was (Author: brandon.williams):
I was hoping this would provide a smoking gun that I could use to bisect and 
finally find the culprit, so I converted this to a dtest 
[here|https://github.com/driftx/cassandra-dtest/commit/68cb71920c9242aeb0e0cc3eb7e998175d9b5ccb].
  Unfortunately however, this doesn't reproduce for me, on any of 4.0, 4.1, or 
5.0 when run on isolated hardware.  I also checked the raw SR numbers from each 
and they are comparable across the branches.

[~aswinkarthik] was this run on isolated hardware and does this consistently 
reproduce for you?

> high speculative retries on v4.1.3
> --
>
> Key: CASSANDRA-18766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18766
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Ivans Novikovs
>Assignee: Brandon Williams
>Priority: Urgent
> Fix For: 4.1.x
>
> Attachments: screenshot-1.png, screenshot-2.png, signature.asc, 
> signature.asc, v4.0.png, v4.1.png
>
>
> There are up to 10+ times higher speculative retries for reads on 4.1.3 
> comparing to 4.0.7 and 4.1.2 when using QUORUM and default setting of 99p.
> On 4.1.3 after upgrade I see speculative retries for up to 35% of all reads 
> for specific table. Latency for reads is stable around 500 microseconds.
> java 1.8.0_382 is used



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18766) high speculative retries on v4.1.3

2024-08-13 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873298#comment-17873298
 ] 

Brandon Williams commented on CASSANDRA-18766:
--

I was hoping this would provide a smoking gun that I could use to bisect and 
finally find the culprit, so I converted this to a dtest 
[here|https://github.com/driftx/cassandra-dtest/commit/68cb71920c9242aeb0e0cc3eb7e998175d9b5ccb].
  Unfortunately however, this doesn't reproduce for me, on any of 4.0, 4.1, or 
5.0 when run on isolated hardware.  I also checked the raw SR numbers from each 
and they are comparable across the branches.

[~aswinkarthik] was this run on isolated hardware and does this consistently 
reproduce for you?

> high speculative retries on v4.1.3
> --
>
> Key: CASSANDRA-18766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18766
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Ivans Novikovs
>Assignee: Brandon Williams
>Priority: Urgent
> Fix For: 4.1.x
>
> Attachments: screenshot-1.png, screenshot-2.png, signature.asc, 
> signature.asc, v4.0.png, v4.1.png
>
>
> There are up to 10+ times higher speculative retries for reads on 4.1.3 
> comparing to 4.0.7 and 4.1.2 when using QUORUM and default setting of 99p.
> On 4.1.3 after upgrade I see speculative retries for up to 35% of all reads 
> for specific table. Latency for reads is stable around 500 microseconds.
> java 1.8.0_382 is used



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19824) Cassandra 4.1.5 - Curl timeout on receiving metadata for RedHat repo

2024-08-12 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872933#comment-17872933
 ] 

Brandon Williams commented on CASSANDRA-19824:
--

I opened INFRA-26034 for this.

> Cassandra 4.1.5 - Curl timeout on receiving metadata for RedHat repo
> 
>
> Key: CASSANDRA-19824
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19824
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Aditya Krishnakumar
>Priority: Normal
>
> Hi team,
> On trying to install and download Cassandra 4.1.5 for RedHat we are getting 
> the following timeout message:
> {code:java}
> 120.5   - Curl error (28): Timeout was reached for 
> https://redhat.cassandra.apache.org/41x/repodata/repomd.xml [Operation timed 
> out after 3 milliseconds with 0 out of 0 bytes received]
> 120.5 Error: Failed to download metadata for repo 'cassandra': Cannot 
> download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were 
> tried {code}
> On trying to download the metadata for the repo it is currently failing and 
> we are unable to install Cassandra 4.1.x on our RedHat systems. Please help 
> us address the same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19810) Add snapshot repo aliasing to build.properties.default and build-resolver.xml

2024-08-07 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871700#comment-17871700
 ] 

Brandon Williams edited comment on CASSANDRA-19810 at 8/7/24 6:02 PM:
--

That looks good to me, but

bq. Ideally I'd like to see this enforced by 
cassandra-release/prepare_release.sh in one of its pre-condition checks.

maybe we should do this too? 
[Here|https://github.com/driftx/cassandra-builds/commit/25c3e8bbd8fb0fd6bfe9c68cb3d5eccad6c7d2e3]
 is a simple patch to do that.

branch-wise I think 4.0 and up is fine.


was (Author: brandon.williams):
That looks good to me, but

bq. Ideally I'd like to see this enforced by 
cassandra-release/prepare_release.sh in one of its pre-condition checks.

maybe we should do this first?

branch-wise I think 4.0 and up is fine.

> Add snapshot repo aliasing to build.properties.default and build-resolver.xml
> -
>
> Key: CASSANDRA-19810
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19810
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: Normal
> Attachments: fix_snapshot_repo.diff
>
>
> Currently {{resolver-apache-snapshot}} is hard-coded in 
> {{build-resolver.xml}} and overriding it isn't supported in 
> {{build.properties.default}}. Further, the comment on that repository in 
> {{build-resolver.xml}} is... rather confusing, since it states we don't 
> support it and to uncomment it out if you need it, but then it's already 
> uncommented out.
> We should just support aliasing the snapshot repo the way we do the others.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19810) Add snapshot repo aliasing to build.properties.default and build-resolver.xml

2024-08-07 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871700#comment-17871700
 ] 

Brandon Williams edited comment on CASSANDRA-19810 at 8/7/24 3:15 PM:
--

That looks good to me, but

bq. Ideally I'd like to see this enforced by 
cassandra-release/prepare_release.sh in one of its pre-condition checks.

maybe we should do this first?

branch-wise I think 4.0 and up is fine.


was (Author: brandon.williams):
That looks good to me, but

bq. Ideally I'd like to see this enforced by 
cassandra-release/prepare_release.sh in one of its pre-condition checks.

maybe we should do this first?

> Add snapshot repo aliasing to build.properties.default and build-resolver.xml
> -
>
> Key: CASSANDRA-19810
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19810
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: Normal
> Attachments: fix_snapshot_repo.diff
>
>
> Currently {{resolver-apache-snapshot}} is hard-coded in 
> {{build-resolver.xml}} and overriding it isn't supported in 
> {{build.properties.default}}. Further, the comment on that repository in 
> {{build-resolver.xml}} is... rather confusing, since it states we don't 
> support it and to uncomment it out if you need it, but then it's already 
> uncommented out.
> We should just support aliasing the snapshot repo the way we do the others.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19810) Add snapshot repo aliasing to build.properties.default and build-resolver.xml

2024-08-07 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871700#comment-17871700
 ] 

Brandon Williams commented on CASSANDRA-19810:
--

That looks good to me, but

bq. Ideally I'd like to see this enforced by 
cassandra-release/prepare_release.sh in one of its pre-condition checks.

maybe we should do this first?

> Add snapshot repo aliasing to build.properties.default and build-resolver.xml
> -
>
> Key: CASSANDRA-19810
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19810
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: Normal
> Attachments: fix_snapshot_repo.diff
>
>
> Currently {{resolver-apache-snapshot}} is hard-coded in 
> {{build-resolver.xml}} and overriding it isn't supported in 
> {{build.properties.default}}. Further, the comment on that repository in 
> {{build-resolver.xml}} is... rather confusing, since it states we don't 
> support it and to uncomment it out if you need it, but then it's already 
> uncommented out.
> We should just support aliasing the snapshot repo the way we do the others.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19813) timeout on upgrade_tests/upgrade_through_versions_test.py::TestProtoV3Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD::test_parallel_upgrade

2024-08-05 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19813:
-
Status: Ready to Commit  (was: Review In Progress)

+1

> timeout on  
> upgrade_tests/upgrade_through_versions_test.py::TestProtoV3Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD::test_parallel_upgrade
> -
>
> Key: CASSANDRA-19813
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19813
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
> Attachments: Cassandra-devbranch-5_40_ci_summary.html, 
> Cassandra-devbranch-5_40_results_details.tar.xz, 
> test_rolling_upgrade.122.log, test_rolling_upgrade.123-1.log, 
> test_rolling_upgrade.123-2.log, test_rolling_upgrade.134.log, 
> test_rolling_upgrade.257.good.log, test_rolling_upgrade.261-1.log, 
> test_rolling_upgrade.261-2.log
>
>
>  The 
> upgrade_tests/upgrade_through_versions_test.py::TestProtoV3Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD::test_parallel_upgrade
>   test is taking longer than usual.
> CI in 5.0 is timing out the dtest-upgrade-large 59/64 split always now.
> Our 5.0-rc testing results are tainted because these time outs always abort 
> the 5.0 pipeline runs.
> It looks like that split is taking 16x times as long now… 
> This appears to be caused from either/both
> - 
> https://github.com/apache/cassandra/commit/08e1fecf36507397cf3122d77f84aa23150da588
> - 
> https://github.com/apache/cassandra-dtest/commit/2b17c1293056068bb3e94c332d6fb99df6a0b0fa
> Example of good run.
>  [^test_rolling_upgrade.257.good.log] (ci-cassandra.a.o)
> Examples of bad runs.
>  [^test_rolling_upgrade.122.log],  [^test_rolling_upgrade.123-2.log],  
> [^test_rolling_upgrade.123-1.log],  [^test_rolling_upgrade.134.log]  
> (internal ci)
>  [^test_rolling_upgrade.261-1.log] ,  [^test_rolling_upgrade.261-2.log]  
> (ci-cassandra.a.o)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19813) timeout on upgrade_tests/upgrade_through_versions_test.py::TestProtoV3Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD::test_parallel_upgrade

2024-08-05 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19813:
-
Reviewers: Brandon Williams, Brandon Williams
   Brandon Williams, Brandon Williams  (was: Brandon Williams)
   Status: Review In Progress  (was: Patch Available)

> timeout on  
> upgrade_tests/upgrade_through_versions_test.py::TestProtoV3Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD::test_parallel_upgrade
> -
>
> Key: CASSANDRA-19813
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19813
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
> Attachments: Cassandra-devbranch-5_40_ci_summary.html, 
> Cassandra-devbranch-5_40_results_details.tar.xz, 
> test_rolling_upgrade.122.log, test_rolling_upgrade.123-1.log, 
> test_rolling_upgrade.123-2.log, test_rolling_upgrade.134.log, 
> test_rolling_upgrade.257.good.log, test_rolling_upgrade.261-1.log, 
> test_rolling_upgrade.261-2.log
>
>
>  The 
> upgrade_tests/upgrade_through_versions_test.py::TestProtoV3Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD::test_parallel_upgrade
>   test is taking longer than usual.
> CI in 5.0 is timing out the dtest-upgrade-large 59/64 split always now.
> Our 5.0-rc testing results are tainted because these time outs always abort 
> the 5.0 pipeline runs.
> It looks like that split is taking 16x times as long now… 
> This appears to be caused from either/both
> - 
> https://github.com/apache/cassandra/commit/08e1fecf36507397cf3122d77f84aa23150da588
> - 
> https://github.com/apache/cassandra-dtest/commit/2b17c1293056068bb3e94c332d6fb99df6a0b0fa
> Example of good run.
>  [^test_rolling_upgrade.257.good.log] (ci-cassandra.a.o)
> Examples of bad runs.
>  [^test_rolling_upgrade.122.log],  [^test_rolling_upgrade.123-2.log],  
> [^test_rolling_upgrade.123-1.log],  [^test_rolling_upgrade.134.log]  
> (internal ci)
>  [^test_rolling_upgrade.261-1.log] ,  [^test_rolling_upgrade.261-2.log]  
> (ci-cassandra.a.o)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time

2024-08-02 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870469#comment-17870469
 ] 

Brandon Williams commented on CASSANDRA-19448:
--

No problem, next time I'll just multiplex that test to help speed things along.

> CommitlogArchiver only has granularity to seconds for restore_point_in_time
> ---
>
> Key: CASSANDRA-19448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19448
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jeremy Hanna
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Commitlog archiver allows users to backup commitlog files for the purpose of 
> doing point in time restores.  The [configuration 
> file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties]
>  gives an example of down to the seconds granularity but then asks what 
> whether the timestamps are microseconds or milliseconds - defaulting to 
> microseconds.  Because the [CommitLogArchiver uses a second based date 
> format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52],
>  if a user specifies to restore at something at a lower granularity like 
> milliseconds or microseconds, that means that the it will truncate everything 
> after the second and restore to that second.  So say you specify a 
> restore_point_in_time like this:
> restore_point_in_time=2024:01:18 17:01:01.623392
> it will silently truncate everything after the 01 seconds.  So effectively to 
> the user, it is missing updates between 01 and 01.623392.
> This appears to be a bug in the intent.  We should allow users to specify 
> down to the millisecond or even microsecond level. If we allow them to 
> specify down to microseconds for the restore point in time, then it may 
> internally need to change from a long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time

2024-08-01 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19448:
-
Status: Open  (was: Patch Available)

> CommitlogArchiver only has granularity to seconds for restore_point_in_time
> ---
>
> Key: CASSANDRA-19448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19448
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jeremy Hanna
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Commitlog archiver allows users to backup commitlog files for the purpose of 
> doing point in time restores.  The [configuration 
> file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties]
>  gives an example of down to the seconds granularity but then asks what 
> whether the timestamps are microseconds or milliseconds - defaulting to 
> microseconds.  Because the [CommitLogArchiver uses a second based date 
> format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52],
>  if a user specifies to restore at something at a lower granularity like 
> milliseconds or microseconds, that means that the it will truncate everything 
> after the second and restore to that second.  So say you specify a 
> restore_point_in_time like this:
> restore_point_in_time=2024:01:18 17:01:01.623392
> it will silently truncate everything after the 01 seconds.  So effectively to 
> the user, it is missing updates between 01 and 01.623392.
> This appears to be a bug in the intent.  We should allow users to specify 
> down to the millisecond or even microsecond level. If we allow them to 
> specify down to microseconds for the restore point in time, then it may 
> internally need to change from a long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time

2024-08-01 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870228#comment-17870228
 ] 

Brandon Williams commented on CASSANDRA-19448:
--

Looks like the Map.of usage in j8 is back, and the CommitLogArchiverTest in 
trunk is still 
[flakey|https://app.circleci.com/pipelines/github/driftx/cassandra/1700/workflows/83aba4d6-df61-434e-9a87-c23a86c5e834/jobs/97143/tests].

> CommitlogArchiver only has granularity to seconds for restore_point_in_time
> ---
>
> Key: CASSANDRA-19448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19448
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jeremy Hanna
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Commitlog archiver allows users to backup commitlog files for the purpose of 
> doing point in time restores.  The [configuration 
> file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties]
>  gives an example of down to the seconds granularity but then asks what 
> whether the timestamps are microseconds or milliseconds - defaulting to 
> microseconds.  Because the [CommitLogArchiver uses a second based date 
> format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52],
>  if a user specifies to restore at something at a lower granularity like 
> milliseconds or microseconds, that means that the it will truncate everything 
> after the second and restore to that second.  So say you specify a 
> restore_point_in_time like this:
> restore_point_in_time=2024:01:18 17:01:01.623392
> it will silently truncate everything after the 01 seconds.  So effectively to 
> the user, it is missing updates between 01 and 01.623392.
> This appears to be a bug in the intent.  We should allow users to specify 
> down to the millisecond or even microsecond level. If we allow them to 
> specify down to microseconds for the restore point in time, then it may 
> internally need to change from a long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time

2024-08-01 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870195#comment-17870195
 ] 

Brandon Williams edited comment on CASSANDRA-19448 at 8/1/24 1:36 PM:
--

||Branch||CI||
|[4.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-4.0]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1697/workflows/50c00fe4-f2d8-452e-bc7d-16900d973987],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1697/workflows/7aec5100-4489-4a13-8150-45007a87dd43]|
|[4.1|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-4.1]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1699/workflows/9477f095-357e-497b-ab9c-d826c55266a0],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1699/workflows/631b7063-06c3-4b1a-a115-9e77da553119]|
|[5.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-5.0]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1698/workflows/c0213025-523a-4804-91dd-6bd676c36481],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1698/workflows/92071150-52df-419a-960f-ad82bd41f482]|
|[trunk|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-trunk]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1700/workflows/a65570df-fe2d-4422-978f-51ebee68a2b6],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1700/workflows/83aba4d6-df61-434e-9a87-c23a86c5e834]|

Circle is currently down for maintenance, I'll start these when it's back.


was (Author: brandon.williams):
||Branch||CI||
|[4.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-4.0]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1697/workflows/50c00fe4-f2d8-452e-bc7d-16900d973987],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1697/workflows/7aec5100-4489-4a13-8150-45007a87dd43]|
|[4.1|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-4.1]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1699/workflows/9477f095-357e-497b-ab9c-d826c55266a0],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1699/workflows/631b7063-06c3-4b1a-a115-9e77da553119]|
|[5.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-5.0]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1698/workflows/c0213025-523a-4804-91dd-6bd676c36481],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1698/workflows/92071150-52df-419a-960f-ad82bd41f482]|
|[trunk|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-trunk]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1700/workflows/a65570df-fe2d-4422-978f-51ebee68a2b6],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1700/workflows/83aba4d6-df61-434e-9a87-c23a86c5e834]|

> CommitlogArchiver only has granularity to seconds for restore_point_in_time
> ---
>
> Key: CASSANDRA-19448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19448
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jeremy Hanna
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Commitlog archiver allows users to backup commitlog files for the purpose of 
> doing point in time restores.  The [configuration 
> file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties]
>  gives an example of down to the seconds granularity but then asks what 
> whether the timestamps are microseconds or milliseconds - defaulting to 
> microseconds.  Because the [CommitLogArchiver uses a second based date 
> format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52],
>  if a user specifies to restore at something at a lower granularity like 
> milliseconds or microseconds, that means that the it will truncate everything 
> after the second and restore to that second.  So say you specify a 
> restore_point_in_time like this:
> restore_point_in_time=2024:01:18 17:01:01.623392
> it will silently truncate everything after the 01 seconds.  So effectively to 
> the user, it is missing updates between 01 and 01.623392.
> This appears to be a bug in the intent.  We should allow users to specify 
> down to the millisecond or even microsecond level. If we allow them to 
> specify down to microseconds for the restore point in time, then it may 
> internally need to change from a long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For ad

[jira] [Commented] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time

2024-08-01 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870195#comment-17870195
 ] 

Brandon Williams commented on CASSANDRA-19448:
--

||Branch||CI||
|[4.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-4.0]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1697/workflows/50c00fe4-f2d8-452e-bc7d-16900d973987],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1697/workflows/7aec5100-4489-4a13-8150-45007a87dd43]|
|[4.1|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-4.1]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1699/workflows/9477f095-357e-497b-ab9c-d826c55266a0],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1699/workflows/631b7063-06c3-4b1a-a115-9e77da553119]|
|[5.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-5.0]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1698/workflows/c0213025-523a-4804-91dd-6bd676c36481],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1698/workflows/92071150-52df-419a-960f-ad82bd41f482]|
|[trunk|https://github.com/driftx/cassandra/tree/CASSANDRA-19448-trunk]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1700/workflows/a65570df-fe2d-4422-978f-51ebee68a2b6],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1700/workflows/83aba4d6-df61-434e-9a87-c23a86c5e834]|

> CommitlogArchiver only has granularity to seconds for restore_point_in_time
> ---
>
> Key: CASSANDRA-19448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19448
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jeremy Hanna
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Commitlog archiver allows users to backup commitlog files for the purpose of 
> doing point in time restores.  The [configuration 
> file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties]
>  gives an example of down to the seconds granularity but then asks what 
> whether the timestamps are microseconds or milliseconds - defaulting to 
> microseconds.  Because the [CommitLogArchiver uses a second based date 
> format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52],
>  if a user specifies to restore at something at a lower granularity like 
> milliseconds or microseconds, that means that the it will truncate everything 
> after the second and restore to that second.  So say you specify a 
> restore_point_in_time like this:
> restore_point_in_time=2024:01:18 17:01:01.623392
> it will silently truncate everything after the 01 seconds.  So effectively to 
> the user, it is missing updates between 01 and 01.623392.
> This appears to be a bug in the intent.  We should allow users to specify 
> down to the millisecond or even microsecond level. If we allow them to 
> specify down to microseconds for the restore point in time, then it may 
> internally need to change from a long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19810) Add snapshot repo aliasing to build.properties.default and build-resolver.xml

2024-07-31 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870010#comment-17870010
 ] 

Brandon Williams commented on CASSANDRA-19810:
--

I don't think ccm/dtest knows about of any of this.  I'm less sure of the 
entire CI stack, but I don't see why it would need to care about the string 
itself either.

> Add snapshot repo aliasing to build.properties.default and build-resolver.xml
> -
>
> Key: CASSANDRA-19810
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19810
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: Normal
> Attachments: fix_snapshot_repo.diff
>
>
> Currently {{resolver-apache-snapshot}} is hard-coded in 
> {{build-resolver.xml}} and overriding it isn't supported in 
> {{build.properties.default}}. Further, the comment on that repository in 
> {{build-resolver.xml}} is... rather confusing, since it states we don't 
> support it and to uncomment it out if you need it, but then it's already 
> uncommented out.
> We should just support aliasing the snapshot repo the way we do the others.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19808) Gossiper (micro) optmizations

2024-07-31 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870007#comment-17870007
 ] 

Brandon Williams commented on CASSANDRA-19808:
--

# Unfortunately, no.  You kind of have to build that knowledge from developing 
or operating C* (or both.)
 # I think all CEPs have to be created with thousands of nodes in mind out of 
necessity since there are installations at that scale today.

> Gossiper (micro) optmizations
> -
>
> Key: CASSANDRA-19808
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19808
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cesar Andres Stuardo Moraga
>Priority: Normal
>
> I am wondering about optimizations that can be done in the gossiper class, 
> especially targeting performance at larger scales. 
>  # getLiveMembers: Creating a hashmap every time this is called. Aside from 
> creating some trash, its linear in terms of the number of peers.
>  # getMaxEndpointStateVersion: I feel this can be calculated and cached 
> rather than looking for it. It is also linear on the number of peers.
>  # getGossipStatus: Linear in the number of peers, also source of garbage at 
> larger scales.
> It seems optmizing these methods can contribute to a cleaner/performant 
> membership protocol. Is there any plans on having these types of optmizations?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19808) Gossiper (micro) optmizations

2024-07-31 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19808:
-
Resolution: Won't Fix
Status: Resolved  (was: Triage Needed)

> Gossiper (micro) optmizations
> -
>
> Key: CASSANDRA-19808
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19808
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cesar Andres Stuardo Moraga
>Priority: Normal
>
> I am wondering about optimizations that can be done in the gossiper class, 
> especially targeting performance at larger scales. 
>  # getLiveMembers: Creating a hashmap every time this is called. Aside from 
> creating some trash, its linear in terms of the number of peers.
>  # getMaxEndpointStateVersion: I feel this can be calculated and cached 
> rather than looking for it. It is also linear on the number of peers.
>  # getGossipStatus: Linear in the number of peers, also source of garbage at 
> larger scales.
> It seems optmizing these methods can contribute to a cleaner/performant 
> membership protocol. Is there any plans on having these types of optmizations?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19808) Gossiper (micro) optmizations

2024-07-31 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869866#comment-17869866
 ] 

Brandon Williams commented on CASSANDRA-19808:
--

bq.  Is there any plans on having these types of optmizations?

These methods are generally not called often enough to be worth the added 
complexity.  Also gossip will largely be obviated and have a much lesser role 
after 
[TCM|https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata]
 lands in 5.1.

> Gossiper (micro) optmizations
> -
>
> Key: CASSANDRA-19808
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19808
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cesar Andres Stuardo Moraga
>Priority: Normal
>
> I am wondering about optimizations that can be done in the gossiper class, 
> especially targeting performance at larger scales. 
>  # getLiveMembers: Creating a hashmap every time this is called. Aside from 
> creating some trash, its linear in terms of the number of peers.
>  # getMaxEndpointStateVersion: I feel this can be calculated and cached 
> rather than looking for it. It is also linear on the number of peers.
>  # getGossipStatus: Linear in the number of peers, also source of garbage at 
> larger scales.
> It seems optmizing these methods can contribute to a cleaner/performant 
> membership protocol. Is there any plans on having these types of optmizations?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19780) Illegal access warning logs visible on bin/tools invocations

2024-07-23 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868034#comment-17868034
 ] 

Brandon Williams commented on CASSANDRA-19780:
--

We have the 5.0.0 branch so just go ahead and commit to 5.0.

> Illegal access warning logs visible on bin/tools invocations
> 
>
> Key: CASSANDRA-19780
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19780
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Tools
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There is discrepancy between what opens we have for tools under Java 17 and 
> what we open under Java 11.
> For example this does not emit any warnings when we are on Java 17:
> {code:java}
> ./tools/bin/auditlogviewer /tmp/diagnostics -f {code}
> But it does emit these warnings when we are on Java 11
> {code:java}
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access using Lookup on 
> net.openhft.chronicle.core.Jvm 
> (file:/tmp/apache-test/apache-cassandra-5.0-rc1-SNAPSHOT/lib/chronicle-core-2.23.36.jar)
>  to class java.lang.reflect.AccessibleObject
> WARNING: Please consider reporting this to the maintainers of 
> net.openhft.chronicle.core.Jvm
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future 
> release{code}
> When I compared what that tool runs with on Java 17 and Java 11, I see this:
>  
> Java 17
> {code:java}
> --add-exports java.base/jdk.internal.misc=ALL-UNNAMED 
> --add-exports java.management.rmi/com.sun.jmx.remote.internal.rmi=ALL-UNNAMED 
> --add-exports java.rmi/sun.rmi.registry=ALL-UNNAMED 
> --add-exports java.rmi/sun.rmi.server=ALL-UNNAMED 
> --add-exports java.sql/java.sql=ALL-UNNAMED 
> --add-exports java.base/java.lang.ref=ALL-UNNAMED 
> --add-exports jdk.unsupported/sun.misc=ALL-UNNAMED 
> --add-opens java.base/java.lang.module=ALL-UNNAMED 
> --add-opens java.base/jdk.internal.loader=ALL-UNNAMED 
> --add-opens java.base/jdk.internal.ref=ALL-UNNAMED 
> --add-opens java.base/jdk.internal.reflect=ALL-UNNAMED 
> --add-opens java.base/jdk.internal.math=ALL-UNNAMED 
> --add-opens java.base/jdk.internal.module=ALL-UNNAMED 
> --add-opens java.base/jdk.internal.util.jar=ALL-UNNAMED 
> --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED 
> --add-opens java.base/sun.nio.ch=ALL-UNNAMED 
> --add-opens java.base/java.io=ALL-UNNAMED 
> --add-opens java.base/java.lang=ALL-UNNAMED 
> --add-opens java.base/java.lang.reflect=ALL-UNNAMED 
> --add-opens java.base/java.util=ALL-UNNAMED 
> --add-opens java.base/java.nio=ALL-UNNAMED 
> --add-exports jdk.attach/sun.tools.attach=ALL-UNNAMED 
> --add-exports jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED 
> --add-opens jdk.compiler/com.sun.tools.javac=ALL-UNNAMED {code}
> For Java 11
> {code:java}
> --add-exports java.base/jdk.internal.misc=ALL-UNNAMED 
> --add-exports java.base/jdk.internal.ref=ALL-UNNAMED 
> --add-exports java.base/sun.nio.ch=ALL-UNNAMED 
> --add-exports java.management.rmi/com.sun.jmx.remote.internal.rmi=ALL-UNNAMED 
> --add-exports java.rmi/sun.rmi.registry=ALL-UNNAMED 
> --add-exports java.rmi/sun.rmi.server=ALL-UNNAMED 
> --add-exports java.sql/java.sql=ALL-UNNAMED 
> --add-opens java.base/java.lang.module=ALL-UNNAMED 
> --add-opens java.base/jdk.internal.loader=ALL-UNNAMED 
> --add-opens java.base/jdk.internal.ref=ALL-UNNAMED 
> --add-opens java.base/jdk.internal.reflect=ALL-UNNAMED 
> --add-opens java.base/jdk.internal.math=ALL-UNNAMED 
> --add-opens java.base/jdk.internal.module=ALL-UNNAMED 
> --add-opens java.base/jdk.internal.util.jar=ALL-UNNAMED 
> --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED {code}
> The difference is that we are not exporting this for Java 11
> {code:java}
> --add-opens java.base/sun.nio.ch=ALL-UNNAMED 
> --add-opens java.base/java.io=ALL-UNNAMED 
> --add-opens java.base/java.lang=ALL-UNNAMED 
> --add-opens java.base/java.lang.reflect=ALL-UNNAMED 
> --add-opens java.base/java.util=ALL-UNNAMED 
> --add-opens java.base/java.nio=ALL-UNNAMED 
> --add-opens java.base/java.lang.reflect=ALL-UNNAMED {code}
> For Java 17, we explicitly add only these which are not applicable for 11 
> (check the end of tools/bin/cassandra.in.sh)
> {code:java}
> --add-exports jdk.attach/sun.tools.attach=ALL-UNNAMED
> --add-exports jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED
> --add-opens jdk.compiler/com.sun.tools.javac=ALL-UNNAMED  {code}
>  
> So, what I propose is that we add the missing opens to cassandra.in.sh for 
> Java 11.
> Even better, I would add this to

[jira] [Commented] (CASSANDRA-19785) Possible memory leak in BTree.FastBuilder

2024-07-19 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867328#comment-17867328
 ] 

Brandon Williams commented on CASSANDRA-19785:
--

I would recommend seeing if this persists on 4.0.13, but I would also be 
looking at anything that might make this deployment different from the norm 
since 4.0.11 has been out for a year and nobody has seen this before.

> Possible memory leak in BTree.FastBuilder 
> --
>
> Key: CASSANDRA-19785
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19785
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Paul Chandler
>Priority: Normal
> Fix For: 4.0.x
>
> Attachments: image-2024-07-19-08-44-56-714.png, 
> image-2024-07-19-08-45-17-289.png, image-2024-07-19-08-45-33-933.png, 
> image-2024-07-19-08-45-50-383.png, image-2024-07-19-08-46-06-919.png, 
> image-2024-07-19-08-46-42-979.png, image-2024-07-19-08-46-56-594.png, 
> image-2024-07-19-08-47-19-517.png, image-2024-07-19-08-47-34-582.png
>
>
> We are having a problem with the heap growing in size, This is a large 
> cluster > 1,000 nodes across a large number of dc’s. This is running version 
> 4.0.11.
>  
> Each node has a 32GB heap, and the amount used continues to grow until it 
> reaches 30GB, it then struggles with multiple Full GC pauses, as can be seen 
> here:
> !image-2024-07-19-08-44-56-714.png!
> We took 2 heap dumps on one node a few days after it was restarted, and the 
> heap had grown by 2.7GB
>  
> 9{^}th{^} July
> !image-2024-07-19-08-45-17-289.png!
> 11{^}th{^} July
> !image-2024-07-19-08-45-33-933.png!
> This can be seen as mainly an increase of memory used by 
> FastThreadLocalThread, increasing from 5.92GB to 8.53GB
> !image-2024-07-19-08-45-50-383.png!
> !image-2024-07-19-08-46-06-919.png!
> Looking deeper into this it can be seen that the growing heap is contained 
> within the threads for the MutationStage, Native-transport-Requests, 
> ReadStage etc. We would expect the memory used within these threads to be 
> short lived, and not grow as time goes on.  We recently increased the size of 
> theses threadpools, and that has increased the size of the problem.
>  
> Top memory usage for FastThreadLocalThread
> 9{^}th{^} July
> !image-2024-07-19-08-46-42-979.png!
> 11{^}th{^} July
> !image-2024-07-19-08-46-56-594.png!
> This has led us to investigate whether there could be a memory leak, and we 
> have found the following issues within the retained references in 
> BTree.FastBuilder objects. The issue appears to stem from the reset() method, 
> which does not properly clear all buffers.  We are not really sure how the 
> BTree.FastBuilder works, but this this is our analysis of where a leak might 
> occur.
>  
> Specifically:
> Leaf Buffer Not Being Cleared:
> When leaf().count is 0, the statement Arrays.fill(leaf().buffer, 0, 
> leaf().count, null); does not clear the buffer because the end index is 0. 
> This leaves the buffer with references to potentially large objects, 
> preventing garbage collection and increasing heap usage.
> Branch inUse Property:
> If the inUse property of the branch is set to false elsewhere in the code, 
> the while loop while (branch != null && branch.inUse) does not execute, 
> resulting in uncleared branch buffers and retained references.
>  
> This is based on the following observations:
>     Heap Dumps: Analysis of heap dumps shows that leaf().count is often 0, 
> and as a result, the buffer is not being cleared, leading to high heap 
> utilization.
> !image-2024-07-19-08-47-19-517.png!
>     Remote Debugging: Debugging sessions indicate that the drain() method 
> sets count to 0, and the inUse flag for the parent branch is set to false, 
> preventing the while loop in reset() from clearing the branch buffers.
> !image-2024-07-19-08-47-34-582.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19785) Possible memory leak in BTree.FastBuilder

2024-07-19 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19785:
-
 Bug Category: Parent values: Degradation(12984)Level 1 values: Resource 
Management(12995)
   Complexity: Normal
  Component/s: Legacy/Core
Discovered By: User Report
Fix Version/s: 4.0.x
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Possible memory leak in BTree.FastBuilder 
> --
>
> Key: CASSANDRA-19785
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19785
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Paul Chandler
>Priority: Normal
> Fix For: 4.0.x
>
> Attachments: image-2024-07-19-08-44-56-714.png, 
> image-2024-07-19-08-45-17-289.png, image-2024-07-19-08-45-33-933.png, 
> image-2024-07-19-08-45-50-383.png, image-2024-07-19-08-46-06-919.png, 
> image-2024-07-19-08-46-42-979.png, image-2024-07-19-08-46-56-594.png, 
> image-2024-07-19-08-47-19-517.png, image-2024-07-19-08-47-34-582.png
>
>
> We are having a problem with the heap growing in size, This is a large 
> cluster > 1,000 nodes across a large number of dc’s. This is running version 
> 4.0.11.
>  
> Each node has a 32GB heap, and the amount used continues to grow until it 
> reaches 30GB, it then struggles with multiple Full GC pauses, as can be seen 
> here:
> !image-2024-07-19-08-44-56-714.png!
> We took 2 heap dumps on one node a few days after it was restarted, and the 
> heap had grown by 2.7GB
>  
> 9{^}th{^} July
> !image-2024-07-19-08-45-17-289.png!
> 11{^}th{^} July
> !image-2024-07-19-08-45-33-933.png!
> This can be seen as mainly an increase of memory used by 
> FastThreadLocalThread, increasing from 5.92GB to 8.53GB
> !image-2024-07-19-08-45-50-383.png!
> !image-2024-07-19-08-46-06-919.png!
> Looking deeper into this it can be seen that the growing heap is contained 
> within the threads for the MutationStage, Native-transport-Requests, 
> ReadStage etc. We would expect the memory used within these threads to be 
> short lived, and not grow as time goes on.  We recently increased the size of 
> theses threadpools, and that has increased the size of the problem.
>  
> Top memory usage for FastThreadLocalThread
> 9{^}th{^} July
> !image-2024-07-19-08-46-42-979.png!
> 11{^}th{^} July
> !image-2024-07-19-08-46-56-594.png!
> This has led us to investigate whether there could be a memory leak, and we 
> have found the following issues within the retained references in 
> BTree.FastBuilder objects. The issue appears to stem from the reset() method, 
> which does not properly clear all buffers.  We are not really sure how the 
> BTree.FastBuilder works, but this this is our analysis of where a leak might 
> occur.
>  
> Specifically:
> Leaf Buffer Not Being Cleared:
> When leaf().count is 0, the statement Arrays.fill(leaf().buffer, 0, 
> leaf().count, null); does not clear the buffer because the end index is 0. 
> This leaves the buffer with references to potentially large objects, 
> preventing garbage collection and increasing heap usage.
> Branch inUse Property:
> If the inUse property of the branch is set to false elsewhere in the code, 
> the while loop while (branch != null && branch.inUse) does not execute, 
> resulting in uncleared branch buffers and retained references.
>  
> This is based on the following observations:
>     Heap Dumps: Analysis of heap dumps shows that leaf().count is often 0, 
> and as a result, the buffer is not being cleared, leading to high heap 
> utilization.
> !image-2024-07-19-08-47-19-517.png!
>     Remote Debugging: Debugging sessions indicate that the drain() method 
> sets count to 0, and the inUse flag for the parent branch is set to false, 
> preventing the while loop in reset() from clearing the branch buffers.
> !image-2024-07-19-08-47-34-582.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13428) Security: provide keystore_password_file and truststore_password_file options

2024-07-19 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867319#comment-17867319
 ] 

Brandon Williams commented on CASSANDRA-13428:
--

I think that all makes sense, especially avoiding the class bloat.  Thanks for 
taking this on!

> Security: provide keystore_password_file and truststore_password_file options
> -
>
> Key: CASSANDRA-13428
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13428
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Encryption, Local/Config
>Reporter: Bas van Dijk
>Assignee: Maulin Vasavada
>Priority: Normal
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Currently passwords are stored in plaintext in the configuration file as in:
> {code}
> server_encryption_options:
>   keystore_password: secret
>   truststore_password: secret
> client_encryption_options:
>   keystore_password: secret
> {code}
> This has the disadvantage that, in order to protect the secrets, the whole 
> configuration file needs to have restricted ownership and permissions. This 
> is problematic in operating systems like NixOS where configuration files are 
> usually stored in world-readable locations.
> A secure option would be to store secrets in files (with restricted ownership 
> and permissions) and reference those files from the unrestricted 
> configuration file as in for example:
> {code}
> server_encryption_options:
>   keystore_password_file: /run/keys/keystore-password
>   truststore_password_file: /run/keys/truststore-password
> client_encryption_options:
>   keystore_password_file: /run/keys/keystore-password
> {code}
> This is trivial to implement and provides a big gain in security.
> So in summary I'm proposing to add the {{keystore_password_file}} and 
> {{truststore_password_file}} options besides the existing 
> {{keystore_password}} and {{truststore_password options}}. The former will 
> take precedence over the latter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19784) Commitlog leak leads to multi-node outage

2024-07-18 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19784:
-
 Bug Category: Parent values: Availability(12983)Level 1 values: Cluster 
Crash(12993)
   Complexity: Normal
  Component/s: Local/Commit Log
Discovered By: User Report
Fix Version/s: 4.1.x
   5.0.x
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Commitlog leak leads to multi-node outage
> -
>
> Key: CASSANDRA-19784
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19784
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Dan Sarisky
>Priority: Normal
> Fix For: 4.1.x, 5.0.x
>
>
> After days of sustained write traffic, our nodes get into a state where 
> hundreds of the below commitlog error are happening a second.  The node 
> becomes basically unwritable.  This happens to multiple nodes at a time, 
> effectively causing the entire cluster to be unusable.
> java.lang.Error: Maximum permit count exceeded
>     at 
> java.base/java.util.concurrent.Semaphore$Sync.tryReleaseShared(Semaphore.java:198)
>     at 
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1382)
>     at java.base/java.util.concurrent.Semaphore.release(Semaphore.java:619)
>     at 
> org.apache.cassandra.db.commitlog.AbstractCommitLogService.requestExtraSync(AbstractCommitLogService.java:297)
>     at 
> org.apache.cassandra.db.commitlog.BatchCommitLogService.maybeWaitForSync(BatchCommitLogService.java:40)
>     at 
> org.apache.cassandra.db.commitlog.AbstractCommitLogService.finishWriteFor(AbstractCommitLogService.java:284)
>     at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:330)
>     at 
> org.apache.cassandra.db.CassandraKeyspaceWriteHandler.addToCommitLog(CassandraKeyspaceWriteHandler.java:100)
>     at 
> org.apache.cassandra.db.CassandraKeyspaceWriteHandler.beginWrite(CassandraKeyspaceWriteHandler.java:54)
>     at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:641)
>     at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:525)
>     at org.apache.cassandra.db.Mutation.apply(Mutation.java:228)
>     at org.apache.cassandra.db.Mutation.apply(Mutation.java:248)
>     at 
> org.apache.cassandra.service.StorageProxy$4.runMayThrow(StorageProxy.java:1652)
>     at 
> org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2611)
>     at 
> org.apache.cassandra.concurrent.ExecutionFailure$2.run(ExecutionFailure.java:163)
>     at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:142)
>     at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>     at java.base/java.lang.Thread.run(Thread.java:829)
> This happens on large 80 core machines and with the following yaml settings.
> commitlog_sync: batch
> concurrent_writes: 640
> native_transport_max_threads: 640
> This happens on 4.1.5 and 5.0beta1.  I have not tested other branches.
> I am NOT able to reproduce it with smaller 12 core machines.
> I am NOT able to reproduce it with commitlog_sync: periodic
> I added some small instrumentation to periodically printout the value of 
> org.apache.cassandra.db.commitlog.AbstractCommitLogService.haveWork.permits().
>   Under sustained write traffic, this value continually grows (by a million 
> or so a minute using my cassandra-stress workload) until it hits MAXINT and 
> the error starts occuring.  If write traffic slows, the value of 
> haveWork.permits() drops.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19767) Fix storage_compatibility_mode and startup_checks documentation

2024-07-12 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865415#comment-17865415
 ] 

Brandon Williams commented on CASSANDRA-19767:
--

LGTM +1

> Fix storage_compatibility_mode and startup_checks documentation
> ---
>
> Key: CASSANDRA-19767
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19767
> Project: Cassandra
>  Issue Type: Task
>  Components: Documentation
>Reporter: Jackson Fleming
>Assignee: Jackson Fleming
>Priority: Normal
>  Labels: Documentation
> Fix For: 4.1.x, 5.0.x, 5.x
>
> Attachments: image-2024-07-12-09-38-16-284.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The documentation for storage_compatibility_mode 
> ([https://cassandra.apache.org/doc/latest/cassandra/managing/configuration/cass_yaml_file.html#storage_compatibility_mode])
>  is very difficult to read. The below highlighted text seems incorrect.
> !image-2024-07-12-09-38-16-284.png|width=505,height=487!
>  
> It appears that the entry for the YAML option above it is causing entries to 
> get clobbered together (startup_checks)
>  
> This is actually a very useful and important feature for people upgrading to 
> Cassandra 5 to understand how to use properly - It would be good for it to be 
> easier to read, we should be encouraging use of the safest possible upgrade 
> path, which from my understanding would be:
> {{CASSANDRA_4 -> UPGRADING -> NONE}}
>  
>  
> Update - seems like the startup_checks docs is also missing in the 4.1 pages, 
> I'll fix that as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19767) Fix storage_compatibility_mode and startup_checks documentation

2024-07-12 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19767:
-
Change Category: Semantic
 Complexity: Normal
  Fix Version/s: 4.1.x
 5.0.x
 5.x
 Status: Open  (was: Triage Needed)

> Fix storage_compatibility_mode and startup_checks documentation
> ---
>
> Key: CASSANDRA-19767
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19767
> Project: Cassandra
>  Issue Type: Task
>  Components: Documentation
>Reporter: Jackson Fleming
>Assignee: Jackson Fleming
>Priority: Normal
>  Labels: Documentation
> Fix For: 4.1.x, 5.0.x, 5.x
>
> Attachments: image-2024-07-12-09-38-16-284.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The documentation for storage_compatibility_mode 
> ([https://cassandra.apache.org/doc/latest/cassandra/managing/configuration/cass_yaml_file.html#storage_compatibility_mode])
>  is very difficult to read. The below highlighted text seems incorrect.
> !image-2024-07-12-09-38-16-284.png|width=505,height=487!
>  
> It appears that the entry for the YAML option above it is causing entries to 
> get clobbered together (startup_checks)
>  
> This is actually a very useful and important feature for people upgrading to 
> Cassandra 5 to understand how to use properly - It would be good for it to be 
> easier to read, we should be encouraging use of the safest possible upgrade 
> path, which from my understanding would be:
> {{CASSANDRA_4 -> UPGRADING -> NONE}}
>  
>  
> Update - seems like the startup_checks docs is also missing in the 4.1 pages, 
> I'll fix that as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19770) Incorrect latency metrics reported by metric-reporter

2024-07-12 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19770:
-
 Bug Category: Parent values: Correctness(12982)
   Complexity: Normal
Discovered By: User Report
Fix Version/s: 4.1.x
   5.0.x
   5.x
 Severity: Normal
   Status: Open  (was: Triage Needed)

[~yifanc] can you take a look?

> Incorrect latency metrics reported by metric-reporter
> -
>
> Key: CASSANDRA-19770
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19770
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Aswin Karthik
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>
> Cassandra version: 4.1.5
> Since [CASSANDRA-16760|https://issues.apache.org/jira/browse/CASSANDRA-16760] 
> and [these 
> changes|https://github.com/apache/cassandra/pull/1091/files#diff-07f330b65d5335967ea96f80674b25415c70994d99b97795ed4db696c92b3ff5L532],
>  the metric reporter is dividing the microseconds metrics by 10^6 and 
> reporting it as  milliseconds unit (it should be divided by 10^3). This means 
> an additional division of 10^3 happens causing the metrics to be wrong.
> The sample configuration or documentation does not include how to configure 
> the metrics reporter to report it correctly.
> Steps to reproduce:
> Contents of metrics-reporter-config-sample.yaml
> {noformat}
> console:
>   -
> outfile: '/tmp/metrics.out'
> period: 10
> timeunit: 'SECONDS'
> predicate:
>   color: "white"
>   useQualifiedName: true
>   patterns:
> - "^org.apache.cassandra.metrics.ClientRequest.+" # includes 
> ClientRequestMetrics
> {noformat}
> Cassandra started with flag
> {noformat}
> -Dcassandra.metricsReporterConfigFile=metrics-reporter-config-sample.yaml
> {noformat}
> Run cassandra-stress to generate load
> {noformat}
> tools/bin/cassandra-stress write duration=1m cl=ONE -rate threads=1000
> {noformat}
> Post that
> If you check via nodetool
> {noformat}
> bin/nodetool sjk mxdump -q 
> org.apache.cassandra.metrics:type=ClientRequest,scope=Write-ONE,name=Latency
> {
>   "beans" : [ {
> "name" : 
> "org.apache.cassandra.metrics:type=ClientRequest,scope=Write-ONE,name=Latency",
> "modelerType" : 
> "org.apache.cassandra.metrics.CassandraMetricsRegistry$JmxTimer",
> "Max" : 654949.0,
> "999thPercentile" : 11864.0,
> "DurationUnit" : "microseconds",
> 
>   } ]
> }
> {noformat}
> The max is 654949.0 micros which  654 millis.
> However, the metric reporter emits 0.65 millis because of the division of 
> additional 10^3 factor
> {noformat}
> ❯ tail -n100 /tmp/metrics.out | grep -A 20 Latency.Write-ONE
> org.apache.cassandra.metrics.ClientRequest.Latency.Write-ONE
> count = 17053398
> max = 0.65 milliseconds
> 99.9% <= 0.01 milliseconds
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18321) distutils Version classes are deprecated. Use packaging.version instead.

2024-07-08 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-18321:
-
Status: Patch Available  (was: Needs Committer)

> distutils Version classes are deprecated. Use packaging.version instead.
> 
>
> Key: CASSANDRA-18321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18321
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Dhanush Ananthkar
>Priority: Low
> Fix For: 5.x
>
>
> Lately I see a lot in Python DTests the below warning:
> {code:java}
> DeprecationWarning: distutils Version classes are deprecated. Use 
> packaging.version instead.{code}
> Example from running  
> auditlog_test.py::TestAuditlog::test_archive_on_shutdown with trunk:
>  
> {code:java}
> dtest.py:48
>   /cassandra-dtest/dtest.py:48: DeprecationWarning: distutils Version classes 
> are deprecated. Use packaging.version instead.
>     MAJOR_VERSION_4 = LooseVersion('4.0')
>  
> ../../dtest/lib/python3.8/site-packages/ccmlib/common.py:773
> auditlog_test.py::TestAuditlog::test_archive_on_shutdown
> auditlog_test.py::TestAuditlog::test_archive_on_shutdown
> auditlog_test.py::TestAuditlog::test_archive_on_shutdown
> auditlog_test.py::TestAuditlog::test_archive_on_shutdown
> auditlog_test.py::TestAuditlog::test_archive_on_shutdown
>   /dtest/lib/python3.8/site-packages/ccmlib/common.py:773: 
> DeprecationWarning: distutils Version classes are deprecated. Use 
> packaging.version instead.
>     return LooseVersion(match.group(1))
>  
> auditlog_test.py: 42 warnings
>   /dtest/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 
> DeprecationWarning: distutils Version classes are deprecated. Use 
> packaging.version instead.
>     other = LooseVersion(other)
>  
> auditlog_test.py::TestAuditlog::test_archive_on_shutdown
>   /cassandra-dtest/conftest.py:437: DeprecationWarning: distutils Version 
> classes are deprecated. Use packaging.version instead.
>     since = LooseVersion(since_str_or_list)
>  
> auditlog_test.py::TestAuditlog::test_archive_on_shutdown
> auditlog_test.py::TestAuditlog::test_archive_on_shutdown
> auditlog_test.py::TestAuditlog::test_archive_on_shutdown
> auditlog_test.py::TestAuditlog::test_archive_on_shutdown
> auditlog_test.py::TestAuditlog::test_archive_on_shutdown
> auditlog_test.py::TestAuditlog::test_archive_on_shutdown
> auditlog_test.py::TestAuditlog::test_archive_on_shutdown
>   /dtest/lib/python3.8/site-packages/ccmlib/common.py:481: 
> DeprecationWarning: distutils Version classes are deprecated. Use 
> packaging.version instead.
>     version = LooseVersion(str(version))
>  
> -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
> ===Flaky Test Report===
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19749) ALTER USER | ROLE IF EXISTS creates a user / role if it does not exist

2024-07-08 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863791#comment-17863791
 ] 

Brandon Williams commented on CASSANDRA-19749:
--

+1

> ALTER USER | ROLE IF EXISTS creates a user / role if it does not exist
> --
>
> Key: CASSANDRA-19749
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19749
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Let's have:
> {code}
> authenticator:
>   class_name : org.apache.cassandra.auth.PasswordAuthenticator
> authorizer: CassandraAuthorizer
> role_manager: CassandraRoleManager
> {code}
> and do this:
> {code}
> cassandra@cqlsh> select * from system_auth.roles;
>  role  | can_login | is_superuser | member_of | salted_hash
> ---+---+--+---+--
>  cassandra |  True | True |  null | 
> $2a$10$sFCKeluid5MlW/Z0CU1ygO1U5qpLW4Rgivmu8rZNmNNQ8WeC2y92S
> {code}
> Then 
> {code}
> cassandra@cqlsh> ALTER USER IF EXISTS this_does_not_exist SUPERUSER ;
> cassandra@cqlsh> select * from system_auth.roles where role = 
> 'this_does_not_exist';
>  role| can_login | is_superuser | member_of | salted_hash
> -+---+--+---+-
>  this_does_not_exist |  null | True |  null |null
> {code}
> It seems to be same behaviour for ALTER ROLE too.
> {code}
> cassandra@cqlsh> ALTER ROLE IF EXISTS this_role_is_not_there WITH SUPERUSER = 
> true ;
> cassandra@cqlsh> select * from system_auth.roles where role = 
> 'this_role_is_not_there';
>  role   | can_login | is_superuser | member_of | salted_hash
> +---+--+---+-
>  this_role_is_not_there |  null | True |  null |null
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19750) /etc/cassandra/cassandra-jaas.config does not exist in Packages install

2024-07-04 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19750:
-
 Bug Category: Parent values: Packaging(13660)
   Complexity: Normal
  Component/s: Packaging
Discovered By: User Report
Fix Version/s: 3.11.x
   4.0.x
   4.1.x
   5.0.x
   5.x
 Severity: Normal
   Status: Open  (was: Triage Needed)

> /etc/cassandra/cassandra-jaas.config does not exist in Packages install
> ---
>
> Key: CASSANDRA-19750
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19750
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Johnny Miller
>Priority: Normal
> Fix For: 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> When installing cassandra 4.1.5 wth the debian packaged installer the file 
> cassandra-jaas.config  does not exist.
> When enabling JMX authentication in the cassandra-env.sh you uncomment the 
> line JVM_OPTS="$JVM_OPTS 
> -Djava.security.auth.login.config=$CASSANDRA_CONF/cassandra-jaas.config" 
> It is expecting this file to be in this location (which maps to 
> /etc/cassandra/) and it is not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19750) /etc/cassandra/cassandra-jaas.config does not exist in Packages install

2024-07-04 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863047#comment-17863047
 ] 

Brandon Williams commented on CASSANDRA-19750:
--

This isn't in the redhat packaging either.

> /etc/cassandra/cassandra-jaas.config does not exist in Packages install
> ---
>
> Key: CASSANDRA-19750
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19750
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Johnny Miller
>Priority: Normal
> Fix For: 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> When installing cassandra 4.1.5 wth the debian packaged installer the file 
> cassandra-jaas.config  does not exist.
> When enabling JMX authentication in the cassandra-env.sh you uncomment the 
> line JVM_OPTS="$JVM_OPTS 
> -Djava.security.auth.login.config=$CASSANDRA_CONF/cassandra-jaas.config" 
> It is expecting this file to be in this location (which maps to 
> /etc/cassandra/) and it is not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19750) /etc/cassandra/cassandra-jaas.config does not exist in Packages install

2024-07-04 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19750:
-
Summary: /etc/cassandra/cassandra-jaas.config does not exist in Packages 
install  (was: /etc/cassandra/cassandra-jaas.config does not exist in 
debian/ubuntu Packages install for 4.1.5)

> /etc/cassandra/cassandra-jaas.config does not exist in Packages install
> ---
>
> Key: CASSANDRA-19750
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19750
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Johnny Miller
>Priority: Normal
>
> When installing cassandra 4.1.5 wth the debian packaged installer the file 
> cassandra-jaas.config  does not exist.
> When enabling JMX authentication in the cassandra-env.sh you uncomment the 
> line JVM_OPTS="$JVM_OPTS 
> -Djava.security.auth.login.config=$CASSANDRA_CONF/cassandra-jaas.config" 
> It is expecting this file to be in this location (which maps to 
> /etc/cassandra/) and it is not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19708) Remove sid from bullseye docker images

2024-07-03 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862819#comment-17862819
 ] 

Brandon Williams commented on CASSANDRA-19708:
--

+1

> Remove sid from bullseye docker images
> --
>
> Key: CASSANDRA-19708
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19708
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x
>
> Attachments: CASSANDRA-19708_50_105_ci_summary.html, 
> CASSANDRA-19708_50_105_results_details.tar.xz
>
>
> sid is flakey, often broken and takes days for correct packages to be 
> uploaded.
> ref: 
> https://ci-cassandra.apache.org/job/Cassandra-4.1-artifacts/jdk=jdk_1.8_latest,label=cassandra/611/
>  
> sid is only used for jdk8
> looks like replacing it with temurin might be a safer/stable choice.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19747) Invalid schema.cql created by snapshot after dropping more than one field

2024-07-03 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19747:
-
 Bug Category: Parent values: Correctness(12982)Level 1 values: Recoverable 
Corruption / Loss(12986)
   Complexity: Normal
  Component/s: Local/Snapshots
Discovered By: User Report
Fix Version/s: 4.1.x
   5.0.x
   5.x
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Invalid schema.cql created by snapshot after dropping more than one field
> -
>
> Key: CASSANDRA-19747
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19747
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Snapshots
>Reporter: Frank vissing
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>
> After dropping at least 2 fields the schema.cql produced by _nodetool 
> snapshot_ is invalid (it is missing a comma)
> {code:sql}
> CREATE TABLE IF NOT EXISTS test.testtable (
>     field1 text PRIMARY KEY,
>     field2 text
>     field3 text
> ) WITH ID ...{code}
> expected outcome
> {code:sql}
> CREATE TABLE IF NOT EXISTS test.testtable (
>     field1 text PRIMARY KEY,
>     field2 text,
>     field3 text
> ) WITH ID ...{code}
> reproducing the isue is simple by running the following commands
> {code:sh}
> docker run -d --name cassandra cassandra:4.1.5
> echo "Wait for the container to start"
> until docker exec -ti cassandra nodetool status | grep UN;do sleep 
> 1;done;sleep 10
> echo "Create keyspace and table for test"
> docker exec -ti cassandra cqlsh -e "CREATE KEYSPACE IF NOT EXISTS test WITH 
> replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}; CREATE 
> TABLE IF NOT EXISTS test.testtable (field1 text PRIMARY KEY,field2 
> text,field3 text);"
> echo "Drop 2 fields"
> docker exec -ti cassandra cqlsh -e "ALTER TABLE test.testtable DROP (field2, 
> field3);"
> echo "Create snapshot and view schema.cql"
> docker exec -ti cassandra /opt/cassandra/bin/nodetool snapshot -t my_snapshot
> docker exec -ti cassandra find /var/lib/cassandra/data -name schema.cql  
> -exec cat {} +   {code}
> the full output of the sql generated by the reproduce is below
> {code:sql}
> CREATE TABLE IF NOT EXISTS test.testtable (
> field1 text PRIMARY KEY,
> field2 text
> field3 text
> ) WITH ID = 0e9aa540-391f-11ef-945e-0be1221ff441
> AND additional_write_policy = '99p'
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND cdc = false
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '16', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND memtable = 'default'
> AND crc_check_chance = 1.0
> AND default_time_to_live = 0
> AND extensions = {}
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair = 'BLOCKING'
> AND speculative_retry = '99p';
> ALTER TABLE test.testtable DROP field2 USING TIMESTAMP 171102807000;
> ALTER TABLE test.testtable DROP field3 USING TIMESTAMP 171102807001;
> {code}
> Found this bug while trying to restore the schema from a backup  created by 
> copying a snapshot from a running node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19739) Move bcprov-jdk18on-1.76.jar to build deps

2024-07-02 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862594#comment-17862594
 ] 

Brandon Williams commented on CASSANDRA-19739:
--

+1

> Move bcprov-jdk18on-1.76.jar to build deps
> --
>
> Key: CASSANDRA-19739
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19739
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.x
>
>
> This came up after I bumped dependency-check version to 10.0.0 as suggested 
> in CASSANDRA-19738.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19739) Investigate bcprov-jdk18on-1.76.jar: CVE-2024-30172, CVE-2024-30171, CVE-2024-29857, CVE-2024-34447

2024-07-01 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17861275#comment-17861275
 ] 

Brandon Williams commented on CASSANDRA-19739:
--

bouncycastle was added in CASSANDRA-17992.  Looks like we just need to upgrade 
it to 1.78 and run CI.

> Investigate bcprov-jdk18on-1.76.jar: CVE-2024-30172, CVE-2024-30171, 
> CVE-2024-29857, CVE-2024-34447
> ---
>
> Key: CASSANDRA-19739
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19739
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> This came up after I bumped dependency-check version to 10.0.0 as suggested 
> in CASSANDRA-19738.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19739) Investigate bcprov-jdk18on-1.76.jar: CVE-2024-30172, CVE-2024-30171, CVE-2024-29857, CVE-2024-34447

2024-07-01 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19739:
-
Summary: Investigate bcprov-jdk18on-1.76.jar: CVE-2024-30172, 
CVE-2024-30171, CVE-2024-29857, CVE-2024-34447  (was: Investigate 
CVE-2024-30172, CVE-2024-30171, CVE-2024-29857, CVE-2024-34447)

> Investigate bcprov-jdk18on-1.76.jar: CVE-2024-30172, CVE-2024-30171, 
> CVE-2024-29857, CVE-2024-34447
> ---
>
> Key: CASSANDRA-19739
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19739
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> This came up after I bumped dependency-check version to 10.0.0 as suggested 
> in CASSANDRA-19738.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19738) Update dependency-check library to version 10.0.0

2024-07-01 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17861274#comment-17861274
 ] 

Brandon Williams commented on CASSANDRA-19738:
--

+1

> Update dependency-check library to version 10.0.0
> -
>
> Key: CASSANDRA-19738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19738
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.x
>
>
> Currently, we are at 9.0.5, which gives me locally basically this (1)
> Version 10.0.0 was released today and check is passing again.
> We should update it everywhere to 10.0.0
> (1) https://github.com/jeremylong/DependencyCheck/issues/6515



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19668) SIGSEGV originating in Paxos V2 Scheduled Task

2024-06-28 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19668:
-
Summary: SIGSEGV originating in Paxos V2 Scheduled Task  (was: SIGSEV 
originating in Paxos V2 Scheduled Task)

> SIGSEGV originating in Paxos V2 Scheduled Task
> --
>
> Key: CASSANDRA-19668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19668
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions
>Reporter: Jon Haddad
>Assignee: Jon Haddad
>Priority: Urgent
> Fix For: 4.1.x, 5.0-rc, 5.x
>
>
> I haven't gotten to the root cause of this yet. Several 4.1 nodes have 
> crashed in in production.  I'm not sure if this is related to Paxos v2 or 
> not, but it is enabled.  offheap_objects also enabled. 
> I'm not sure if this affects 5.0, yet.
> Most of the crashes don't have a stacktrace - they only reference this
> {noformat}
> Stack: [0x7fabf4c34000,0x7fabf4d34000],  sp=0x7fabf4d31f00,  free 
> space=1015k
> Native frames: (J=compiled Java code, A=aot compiled Java code, 
> j=interpreted, Vv=VM code, C=native code)
> v  ~StubRoutines::jint_disjoint_arraycopy
> {noformat}
> They all are in the {{ScheduledTasks}} thread.
> However, one node does have this in the crash log:
> {noformat}
> ---  T H R E A D  ---
> Current thread (0x78b375eac800):  JavaThread "ScheduledTasks:1" daemon 
> [_thread_in_Java, id=151791, stack(0x78b34b78,0x78b34b88)]
> Stack: [0x78b34b78,0x78b34b88],  sp=0x78b34b87c350,  free 
> space=1008k
> Native frames: (J=compiled Java code, A=aot compiled Java code, 
> j=interpreted, Vv=VM code, C=native code)
> J 29467 c2 
> org.apache.cassandra.db.rows.AbstractCell.clone(Lorg/apache/cassandra/utils/memory/ByteBufferCloner;)Lorg/apache/cassandra/db/rows/Cell;
>  (50 bytes) @ 0x78b3dd40a42f [0x78b3dd409de0+0x064f]
> J 17669 c2 
> org.apache.cassandra.db.rows.Cell.clone(Lorg/apache/cassandra/utils/memory/Cloner;)Lorg/apache/cassandra/db/rows/ColumnData;
>  (6 bytes) @ 0x78b3dc54edc0 [0x78b3dc54ed40+0x0080]
> J 17816 c2 
> org.apache.cassandra.db.rows.BTreeRow$$Lambda$845.apply(Ljava/lang/Object;)Ljava/lang/Object;
>  (12 bytes) @ 0x78b3dbed01a4 [0x78b3dbed0120+0x0084]
> J 17828 c2 
> org.apache.cassandra.utils.btree.BTree.transform([Ljava/lang/Object;Ljava/util/function/Function;)[Ljava/lang/Object;
>  (194 bytes) @ 0x78b3dc5f35f0 [0x78b3dc5f34a0+0x0150]
> J 35096 c2 
> org.apache.cassandra.db.rows.BTreeRow.clone(Lorg/apache/cassandra/utils/memory/Cloner;)Lorg/apache/cassandra/db/rows/Row;
>  (37 bytes) @ 0x78b3dda9111c [0x78b3dda90fe0+0x013c]
> J 30500 c2 
> org.apache.cassandra.utils.memory.EnsureOnHeap$CloneToHeap.applyToRow(Lorg/apache/cassandra/db/rows/Row;)Lorg/apache/cassandra/db/rows/Row;
>  (16 bytes) @ 0x78b3dd59b91c [0x78b3dd59b8c0+0x005c]
> J 26498 c2 org.apache.cassandra.db.transform.BaseRows.hasNext()Z (215 bytes) 
> @ 0x78b3dcf1c454 [0x78b3dcf1c180+0x02d4]
> J 30775 c2 
> org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext()Ljava/lang/Object;
>  (49 bytes) @ 0x78b3dc789020 [0x78b3dc788fc0+0x0060]
> J 9082 c2 org.apache.cassandra.utils.AbstractIterator.hasNext()Z (80 bytes) @ 
> 0x78b3dbb3c544 [0x78b3dbb3c440+0x0104]
> J 35593 c2 
> org.apache.cassandra.service.paxos.uncommitted.PaxosRows$PaxosMemtableToKeyStateIterator.computeNext()Lorg/apache/cassandra/service/paxos/uncommitted/PaxosKeyState;
>  (126 bytes) @ 0x78b3dc7ceeec [0x78b3dc7cee20+0x00cc]
> J 35591 c2 
> org.apache.cassandra.service.paxos.uncommitted.PaxosRows$PaxosMemtableToKeyStateIterator.computeNext()Ljava/lang/Object;
>  (5 bytes) @ 0x78b3dc7d09e4 [0x78b3dc7d09a0+0x0044]
> J 9082 c2 org.apache.cassandra.utils.AbstractIterator.hasNext()Z (80 bytes) @ 
> 0x78b3dbb3c544 [0x78b3dbb3c440+0x0104]
> J 34146 c2 
> com.google.common.collect.Iterators.addAll(Ljava/util/Collection;Ljava/util/Iterator;)Z
>  (41 bytes) @ 0x78b3dd9197e8 [0x78b3dd919680+0x0168]
> J 38256 c1 
> org.apache.cassandra.service.paxos.uncommitted.PaxosRows.toIterator(Lorg/apache/cassandra/db/partitions/UnfilteredPartitionIterator;Lorg/apache/cassandra/schema/TableId;Z)Lorg/apache/cassandra/utils/CloseableIterator;
>  (49 bytes) @ 0x78b3d6b677ac [0x78b3d6b672e0+0x04cc]
> J 34823 c1 
> org.apache.cassandra.service.paxos.uncommitted.PaxosUncommittedIndex.repairIterator(Lorg/apache/cassandra/schema/TableId;Ljava/util/Collection;)Lorg/apache/cassandra/utils/CloseableIterator;
>  (212 bytes) @ 0x00

[jira] [Updated] (CASSANDRA-19668) SIGSEV originating in Paxos V2 Scheduled Task

2024-06-28 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19668:
-
Fix Version/s: 4.1.x
   5.0-rc
   5.x

> SIGSEV originating in Paxos V2 Scheduled Task
> -
>
> Key: CASSANDRA-19668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19668
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions
>Reporter: Jon Haddad
>Assignee: Jon Haddad
>Priority: Urgent
> Fix For: 4.1.x, 5.0-rc, 5.x
>
>
> I haven't gotten to the root cause of this yet. Several 4.1 nodes have 
> crashed in in production.  I'm not sure if this is related to Paxos v2 or 
> not, but it is enabled.  offheap_objects also enabled. 
> I'm not sure if this affects 5.0, yet.
> Most of the crashes don't have a stacktrace - they only reference this
> {noformat}
> Stack: [0x7fabf4c34000,0x7fabf4d34000],  sp=0x7fabf4d31f00,  free 
> space=1015k
> Native frames: (J=compiled Java code, A=aot compiled Java code, 
> j=interpreted, Vv=VM code, C=native code)
> v  ~StubRoutines::jint_disjoint_arraycopy
> {noformat}
> They all are in the {{ScheduledTasks}} thread.
> However, one node does have this in the crash log:
> {noformat}
> ---  T H R E A D  ---
> Current thread (0x78b375eac800):  JavaThread "ScheduledTasks:1" daemon 
> [_thread_in_Java, id=151791, stack(0x78b34b78,0x78b34b88)]
> Stack: [0x78b34b78,0x78b34b88],  sp=0x78b34b87c350,  free 
> space=1008k
> Native frames: (J=compiled Java code, A=aot compiled Java code, 
> j=interpreted, Vv=VM code, C=native code)
> J 29467 c2 
> org.apache.cassandra.db.rows.AbstractCell.clone(Lorg/apache/cassandra/utils/memory/ByteBufferCloner;)Lorg/apache/cassandra/db/rows/Cell;
>  (50 bytes) @ 0x78b3dd40a42f [0x78b3dd409de0+0x064f]
> J 17669 c2 
> org.apache.cassandra.db.rows.Cell.clone(Lorg/apache/cassandra/utils/memory/Cloner;)Lorg/apache/cassandra/db/rows/ColumnData;
>  (6 bytes) @ 0x78b3dc54edc0 [0x78b3dc54ed40+0x0080]
> J 17816 c2 
> org.apache.cassandra.db.rows.BTreeRow$$Lambda$845.apply(Ljava/lang/Object;)Ljava/lang/Object;
>  (12 bytes) @ 0x78b3dbed01a4 [0x78b3dbed0120+0x0084]
> J 17828 c2 
> org.apache.cassandra.utils.btree.BTree.transform([Ljava/lang/Object;Ljava/util/function/Function;)[Ljava/lang/Object;
>  (194 bytes) @ 0x78b3dc5f35f0 [0x78b3dc5f34a0+0x0150]
> J 35096 c2 
> org.apache.cassandra.db.rows.BTreeRow.clone(Lorg/apache/cassandra/utils/memory/Cloner;)Lorg/apache/cassandra/db/rows/Row;
>  (37 bytes) @ 0x78b3dda9111c [0x78b3dda90fe0+0x013c]
> J 30500 c2 
> org.apache.cassandra.utils.memory.EnsureOnHeap$CloneToHeap.applyToRow(Lorg/apache/cassandra/db/rows/Row;)Lorg/apache/cassandra/db/rows/Row;
>  (16 bytes) @ 0x78b3dd59b91c [0x78b3dd59b8c0+0x005c]
> J 26498 c2 org.apache.cassandra.db.transform.BaseRows.hasNext()Z (215 bytes) 
> @ 0x78b3dcf1c454 [0x78b3dcf1c180+0x02d4]
> J 30775 c2 
> org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext()Ljava/lang/Object;
>  (49 bytes) @ 0x78b3dc789020 [0x78b3dc788fc0+0x0060]
> J 9082 c2 org.apache.cassandra.utils.AbstractIterator.hasNext()Z (80 bytes) @ 
> 0x78b3dbb3c544 [0x78b3dbb3c440+0x0104]
> J 35593 c2 
> org.apache.cassandra.service.paxos.uncommitted.PaxosRows$PaxosMemtableToKeyStateIterator.computeNext()Lorg/apache/cassandra/service/paxos/uncommitted/PaxosKeyState;
>  (126 bytes) @ 0x78b3dc7ceeec [0x78b3dc7cee20+0x00cc]
> J 35591 c2 
> org.apache.cassandra.service.paxos.uncommitted.PaxosRows$PaxosMemtableToKeyStateIterator.computeNext()Ljava/lang/Object;
>  (5 bytes) @ 0x78b3dc7d09e4 [0x78b3dc7d09a0+0x0044]
> J 9082 c2 org.apache.cassandra.utils.AbstractIterator.hasNext()Z (80 bytes) @ 
> 0x78b3dbb3c544 [0x78b3dbb3c440+0x0104]
> J 34146 c2 
> com.google.common.collect.Iterators.addAll(Ljava/util/Collection;Ljava/util/Iterator;)Z
>  (41 bytes) @ 0x78b3dd9197e8 [0x78b3dd919680+0x0168]
> J 38256 c1 
> org.apache.cassandra.service.paxos.uncommitted.PaxosRows.toIterator(Lorg/apache/cassandra/db/partitions/UnfilteredPartitionIterator;Lorg/apache/cassandra/schema/TableId;Z)Lorg/apache/cassandra/utils/CloseableIterator;
>  (49 bytes) @ 0x78b3d6b677ac [0x78b3d6b672e0+0x04cc]
> J 34823 c1 
> org.apache.cassandra.service.paxos.uncommitted.PaxosUncommittedIndex.repairIterator(Lorg/apache/cassandra/schema/TableId;Ljava/util/Collection;)Lorg/apache/cassandra/utils/CloseableIterator;
>  (212 bytes) @ 0x78b3d5675e0c [0x78b3d5673be0+0x00

[jira] [Updated] (CASSANDRA-19668) SIGSEV originating in Paxos V2 Scheduled Task

2024-06-27 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19668:
-
Summary: SIGSEV originating in Paxos V2 Scheduled Task  (was: SIGSEV 
originating in Paxos Scheduled Task)

> SIGSEV originating in Paxos V2 Scheduled Task
> -
>
> Key: CASSANDRA-19668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19668
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions
>Reporter: Jon Haddad
>Assignee: Jon Haddad
>Priority: Urgent
>
> I haven't gotten to the root cause of this yet. Several 4.1 nodes have 
> crashed in in production.  I'm not sure if this is related to Paxos v2 or 
> not, but it is enabled.  offheap_objects also enabled. 
> I'm not sure if this affects 5.0, yet.
> Most of the crashes don't have a stacktrace - they only reference this
> {noformat}
> Stack: [0x7fabf4c34000,0x7fabf4d34000],  sp=0x7fabf4d31f00,  free 
> space=1015k
> Native frames: (J=compiled Java code, A=aot compiled Java code, 
> j=interpreted, Vv=VM code, C=native code)
> v  ~StubRoutines::jint_disjoint_arraycopy
> {noformat}
> They all are in the {{ScheduledTasks}} thread.
> However, one node does have this in the crash log:
> {noformat}
> ---  T H R E A D  ---
> Current thread (0x78b375eac800):  JavaThread "ScheduledTasks:1" daemon 
> [_thread_in_Java, id=151791, stack(0x78b34b78,0x78b34b88)]
> Stack: [0x78b34b78,0x78b34b88],  sp=0x78b34b87c350,  free 
> space=1008k
> Native frames: (J=compiled Java code, A=aot compiled Java code, 
> j=interpreted, Vv=VM code, C=native code)
> J 29467 c2 
> org.apache.cassandra.db.rows.AbstractCell.clone(Lorg/apache/cassandra/utils/memory/ByteBufferCloner;)Lorg/apache/cassandra/db/rows/Cell;
>  (50 bytes) @ 0x78b3dd40a42f [0x78b3dd409de0+0x064f]
> J 17669 c2 
> org.apache.cassandra.db.rows.Cell.clone(Lorg/apache/cassandra/utils/memory/Cloner;)Lorg/apache/cassandra/db/rows/ColumnData;
>  (6 bytes) @ 0x78b3dc54edc0 [0x78b3dc54ed40+0x0080]
> J 17816 c2 
> org.apache.cassandra.db.rows.BTreeRow$$Lambda$845.apply(Ljava/lang/Object;)Ljava/lang/Object;
>  (12 bytes) @ 0x78b3dbed01a4 [0x78b3dbed0120+0x0084]
> J 17828 c2 
> org.apache.cassandra.utils.btree.BTree.transform([Ljava/lang/Object;Ljava/util/function/Function;)[Ljava/lang/Object;
>  (194 bytes) @ 0x78b3dc5f35f0 [0x78b3dc5f34a0+0x0150]
> J 35096 c2 
> org.apache.cassandra.db.rows.BTreeRow.clone(Lorg/apache/cassandra/utils/memory/Cloner;)Lorg/apache/cassandra/db/rows/Row;
>  (37 bytes) @ 0x78b3dda9111c [0x78b3dda90fe0+0x013c]
> J 30500 c2 
> org.apache.cassandra.utils.memory.EnsureOnHeap$CloneToHeap.applyToRow(Lorg/apache/cassandra/db/rows/Row;)Lorg/apache/cassandra/db/rows/Row;
>  (16 bytes) @ 0x78b3dd59b91c [0x78b3dd59b8c0+0x005c]
> J 26498 c2 org.apache.cassandra.db.transform.BaseRows.hasNext()Z (215 bytes) 
> @ 0x78b3dcf1c454 [0x78b3dcf1c180+0x02d4]
> J 30775 c2 
> org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext()Ljava/lang/Object;
>  (49 bytes) @ 0x78b3dc789020 [0x78b3dc788fc0+0x0060]
> J 9082 c2 org.apache.cassandra.utils.AbstractIterator.hasNext()Z (80 bytes) @ 
> 0x78b3dbb3c544 [0x78b3dbb3c440+0x0104]
> J 35593 c2 
> org.apache.cassandra.service.paxos.uncommitted.PaxosRows$PaxosMemtableToKeyStateIterator.computeNext()Lorg/apache/cassandra/service/paxos/uncommitted/PaxosKeyState;
>  (126 bytes) @ 0x78b3dc7ceeec [0x78b3dc7cee20+0x00cc]
> J 35591 c2 
> org.apache.cassandra.service.paxos.uncommitted.PaxosRows$PaxosMemtableToKeyStateIterator.computeNext()Ljava/lang/Object;
>  (5 bytes) @ 0x78b3dc7d09e4 [0x78b3dc7d09a0+0x0044]
> J 9082 c2 org.apache.cassandra.utils.AbstractIterator.hasNext()Z (80 bytes) @ 
> 0x78b3dbb3c544 [0x78b3dbb3c440+0x0104]
> J 34146 c2 
> com.google.common.collect.Iterators.addAll(Ljava/util/Collection;Ljava/util/Iterator;)Z
>  (41 bytes) @ 0x78b3dd9197e8 [0x78b3dd919680+0x0168]
> J 38256 c1 
> org.apache.cassandra.service.paxos.uncommitted.PaxosRows.toIterator(Lorg/apache/cassandra/db/partitions/UnfilteredPartitionIterator;Lorg/apache/cassandra/schema/TableId;Z)Lorg/apache/cassandra/utils/CloseableIterator;
>  (49 bytes) @ 0x78b3d6b677ac [0x78b3d6b672e0+0x04cc]
> J 34823 c1 
> org.apache.cassandra.service.paxos.uncommitted.PaxosUncommittedIndex.repairIterator(Lorg/apache/cassandra/schema/TableId;Ljava/util/Collection;)Lorg/apache/cassandra/utils/CloseableIterator;
>  (212 bytes) @ 0x78b3d5675e0c [0x78b3d5673be0+0x2

[jira] [Updated] (CASSANDRA-19668) SIGSEV originating in Paxos Scheduled Task

2024-06-27 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19668:
-
Summary: SIGSEV originating in Paxos Scheduled Task  (was: SIGSEV 
origininating in Paxos Scheduled Task)

> SIGSEV originating in Paxos Scheduled Task
> --
>
> Key: CASSANDRA-19668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19668
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions
>Reporter: Jon Haddad
>Assignee: Jon Haddad
>Priority: Urgent
>
> I haven't gotten to the root cause of this yet. Several 4.1 nodes have 
> crashed in in production.  I'm not sure if this is related to Paxos v2 or 
> not, but it is enabled.  offheap_objects also enabled. 
> I'm not sure if this affects 5.0, yet.
> Most of the crashes don't have a stacktrace - they only reference this
> {noformat}
> Stack: [0x7fabf4c34000,0x7fabf4d34000],  sp=0x7fabf4d31f00,  free 
> space=1015k
> Native frames: (J=compiled Java code, A=aot compiled Java code, 
> j=interpreted, Vv=VM code, C=native code)
> v  ~StubRoutines::jint_disjoint_arraycopy
> {noformat}
> They all are in the {{ScheduledTasks}} thread.
> However, one node does have this in the crash log:
> {noformat}
> ---  T H R E A D  ---
> Current thread (0x78b375eac800):  JavaThread "ScheduledTasks:1" daemon 
> [_thread_in_Java, id=151791, stack(0x78b34b78,0x78b34b88)]
> Stack: [0x78b34b78,0x78b34b88],  sp=0x78b34b87c350,  free 
> space=1008k
> Native frames: (J=compiled Java code, A=aot compiled Java code, 
> j=interpreted, Vv=VM code, C=native code)
> J 29467 c2 
> org.apache.cassandra.db.rows.AbstractCell.clone(Lorg/apache/cassandra/utils/memory/ByteBufferCloner;)Lorg/apache/cassandra/db/rows/Cell;
>  (50 bytes) @ 0x78b3dd40a42f [0x78b3dd409de0+0x064f]
> J 17669 c2 
> org.apache.cassandra.db.rows.Cell.clone(Lorg/apache/cassandra/utils/memory/Cloner;)Lorg/apache/cassandra/db/rows/ColumnData;
>  (6 bytes) @ 0x78b3dc54edc0 [0x78b3dc54ed40+0x0080]
> J 17816 c2 
> org.apache.cassandra.db.rows.BTreeRow$$Lambda$845.apply(Ljava/lang/Object;)Ljava/lang/Object;
>  (12 bytes) @ 0x78b3dbed01a4 [0x78b3dbed0120+0x0084]
> J 17828 c2 
> org.apache.cassandra.utils.btree.BTree.transform([Ljava/lang/Object;Ljava/util/function/Function;)[Ljava/lang/Object;
>  (194 bytes) @ 0x78b3dc5f35f0 [0x78b3dc5f34a0+0x0150]
> J 35096 c2 
> org.apache.cassandra.db.rows.BTreeRow.clone(Lorg/apache/cassandra/utils/memory/Cloner;)Lorg/apache/cassandra/db/rows/Row;
>  (37 bytes) @ 0x78b3dda9111c [0x78b3dda90fe0+0x013c]
> J 30500 c2 
> org.apache.cassandra.utils.memory.EnsureOnHeap$CloneToHeap.applyToRow(Lorg/apache/cassandra/db/rows/Row;)Lorg/apache/cassandra/db/rows/Row;
>  (16 bytes) @ 0x78b3dd59b91c [0x78b3dd59b8c0+0x005c]
> J 26498 c2 org.apache.cassandra.db.transform.BaseRows.hasNext()Z (215 bytes) 
> @ 0x78b3dcf1c454 [0x78b3dcf1c180+0x02d4]
> J 30775 c2 
> org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext()Ljava/lang/Object;
>  (49 bytes) @ 0x78b3dc789020 [0x78b3dc788fc0+0x0060]
> J 9082 c2 org.apache.cassandra.utils.AbstractIterator.hasNext()Z (80 bytes) @ 
> 0x78b3dbb3c544 [0x78b3dbb3c440+0x0104]
> J 35593 c2 
> org.apache.cassandra.service.paxos.uncommitted.PaxosRows$PaxosMemtableToKeyStateIterator.computeNext()Lorg/apache/cassandra/service/paxos/uncommitted/PaxosKeyState;
>  (126 bytes) @ 0x78b3dc7ceeec [0x78b3dc7cee20+0x00cc]
> J 35591 c2 
> org.apache.cassandra.service.paxos.uncommitted.PaxosRows$PaxosMemtableToKeyStateIterator.computeNext()Ljava/lang/Object;
>  (5 bytes) @ 0x78b3dc7d09e4 [0x78b3dc7d09a0+0x0044]
> J 9082 c2 org.apache.cassandra.utils.AbstractIterator.hasNext()Z (80 bytes) @ 
> 0x78b3dbb3c544 [0x78b3dbb3c440+0x0104]
> J 34146 c2 
> com.google.common.collect.Iterators.addAll(Ljava/util/Collection;Ljava/util/Iterator;)Z
>  (41 bytes) @ 0x78b3dd9197e8 [0x78b3dd919680+0x0168]
> J 38256 c1 
> org.apache.cassandra.service.paxos.uncommitted.PaxosRows.toIterator(Lorg/apache/cassandra/db/partitions/UnfilteredPartitionIterator;Lorg/apache/cassandra/schema/TableId;Z)Lorg/apache/cassandra/utils/CloseableIterator;
>  (49 bytes) @ 0x78b3d6b677ac [0x78b3d6b672e0+0x04cc]
> J 34823 c1 
> org.apache.cassandra.service.paxos.uncommitted.PaxosUncommittedIndex.repairIterator(Lorg/apache/cassandra/schema/TableId;Ljava/util/Collection;)Lorg/apache/cassandra/utils/CloseableIterator;
>  (212 bytes) @ 0x78b3d5675e0c [0x78b3d5673be0+0x222c]
> 

[jira] [Commented] (CASSANDRA-19704) UnsupportedOperationException is thrown when no space for LCS

2024-06-25 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860041#comment-17860041
 ] 

Brandon Williams commented on CASSANDRA-19704:
--

+1 if 5.0/trunk CI look good, which I expect they will.

> UnsupportedOperationException is thrown when no space for LCS
> -
>
> Key: CASSANDRA-19704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19704
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction/LCS
>Reporter: Zhao Yang
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> In {{CompactionTask#buildCompactionCandidatesForAvailableDiskSpace}} with 
> LCS, if node has limited disk space and can't remove any sstable from L0 or 
> L1 in {{{}LeveledCompactionTask#reduceScopeForLimitedSpace{}}}, 
> {{LeveledCompactionTask#partialCompactionsAcceptable}} will throw 
> {{UnsupportedOperationException}}.
> We should handle {{LeveledCompactionTask#partialCompactionsAcceptable}} more 
> gracefully with {{return level <= 1}} or simply {{true}} since 
> {{reduceScopeForLimitedSpace}} only removes sstable from L0 or L1.
> Related https://issues.apache.org/jira/browse/CASSANDRA-17272



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19722) Metric for Cross Node DC Latency records external DCs as UNKNOWN_DC until restart

2024-06-21 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19722:
-
 Bug Category: Parent values: Correctness(12982)
   Complexity: Normal
  Component/s: Observability/Metrics
Discovered By: User Report
Fix Version/s: 4.1.x
   5.0.x
   5.x
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Metric for Cross Node DC Latency records external DCs as UNKNOWN_DC until 
> restart
> -
>
> Key: CASSANDRA-19722
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19722
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Andrew
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>
> When creating a new DC for a different region (In this case, on AWS), I've 
> noticed that the metric (JMX Exporter): 
> `org.apache.cassandra.metrics:type=Messaging,name=*-Latency` shows 
> `UNKNOWN_DC` for external datacenters until the node is restarted, at which 
> point it seems to resolve to the actual datacenter name.
> I'm using GossipingFileSnitch and nodetool status and nodetool gossipinfo 
> always show the correct DC for the nodes, which makes me suspect there's some 
> variable that's responsible for the metric that isn't updated



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18288) Test Failure: TopPartitionsTest.basicRegularTombstonesTest

2024-06-20 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-18288:
-
Resolution: Cannot Reproduce
Status: Resolved  (was: Open)

> Test Failure: TopPartitionsTest.basicRegularTombstonesTest
> --
>
> Key: CASSANDRA-18288
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18288
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Michael Semb Wever
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> from
> - 
> https://ci-cassandra.apache.org/job/Cassandra-trunk/1469/testReport/org.apache.cassandra.distributed.test/TopPartitionsTest/basicRegularTombstonesTest_Incremental___jdk11/
> - 
> https://ci-cassandra.apache.org/job/Cassandra-trunk-jvm-dtest/1534/jdk=jdk_11_latest,label=cassandra,split=8/testReport/junit/org.apache.cassandra.distributed.test/TopPartitionsTest/basicRegularTombstonesTest_Incremental___jdk11/
> Stacktrace
> {noformat}
> java.lang.RuntimeException: 
>   at org.psjava.util.AssertStatus.assertTrue(AssertStatus.java:18)
>   at org.psjava.util.AssertStatus.assertTrue(AssertStatus.java:5)
>   at 
> org.apache.cassandra.distributed.test.TopPartitionsTest.lambda$basicRegularTombstonesTest$e2f532b5$1(TopPartitionsTest.java:226)
>   at org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96)
>   at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)
>   at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> {noformat}
> Standard Output
> {noformat}
> INFO  [main]  2023-02-25 00:15:20,407 Reflections.java:219 - 
> Reflections took 1661 ms to scan 9 urls, producing 1832 keys and 7268 values
> INFO  [main]  2023-02-25 00:15:21,602 Reflections.java:219 - 
> Reflections took 1131 ms to scan 9 urls, producing 1832 keys and 7268 values
> Node id topology:
> node 1: dc = datacenter0, rack = rack0
> node 2: dc = datacenter0, rack = rack0
> Configured node count: 2, nodeIdTopology size: 2
> DEBUG [main] node1 2023-02-25 00:15:22,419 InternalLoggerFactory.java:63 - 
> Using SLF4J as the default logging framework
> DEBUG [main] node1 2023-02-25 00:15:22,426 PlatformDependent0.java:417 - 
> -Dio.netty.noUnsafe: false
> DEBUG [main] node1 2023-02-25 00:15:22,427 PlatformDependent0.java:897 - Java 
> version: 11
> DEBUG [main] node1 2023-02-25 00:15:22,428 PlatformDependent0.java:130 - 
> sun.misc.Unsafe.theUnsafe: available
> DEBUG [main] node1 2023-02-25 00:15:22,429 PlatformDependent0.java:154 - 
> sun.misc.Unsafe.copyMemory: available
> DEBUG [main] node1 2023-02-25 00:15:22,430 PlatformDependent0.java:192 - 
> java.nio.Buffer.address: available
> DEBUG [main] node1 2023-02-25 00:15:22,431 PlatformDependent0.java:257 - 
> direct buffer constructor: available
> DEBUG [main] node1 2023-02-25 00:15:22,432 PlatformDependent0.java:331 - 
> java.nio.Bits.unaligned: available, true
> DEBUG [main] node1 2023-02-25 00:15:22,434 PlatformDependent0.java:393 - 
> jdk.internal.misc.Unsafe.allocateUninitializedArray(int): available
> DEBUG [main] node1 2023-02-25 00:15:22,435 PlatformDependent0.java:403 - 
> java.nio.DirectByteBuffer.(long, int): available
> DEBUG [main] node1 2023-02-25 00:15:22,435 PlatformDependent.java:1079 - 
> sun.misc.Unsafe: available
> DEBUG [main] node1 2023-02-25 00:15:22,448 PlatformDependent.java:1181 - 
> maxDirectMemory: 1056309248 bytes (maybe)
> DEBUG [main] node1 2023-02-25 00:15:22,449 PlatformDependent.java:1200 - 
> -Dio.netty.tmpdir: /home/cassandra/cassandra/tmp (java.io.tmpdir)
> DEBUG [main] node1 2023-02-25 00:15:22,449 PlatformDependent.java:1279 - 
> -Dio.netty.bitMode: 64 (sun.arch.data.model)
> DEBUG [main] node1 2023-02-25 00:15:22,450 PlatformDependent.java:177 - 
> -Dio.netty.maxDirectMemory: 1056309248 bytes
> DEBUG [main] node1 2023-02-25 00:15:22,451 PlatformDependent.java:184 - 
> -Dio.netty.uninitializedArrayAllocationThreshold: 1024
> DEBUG [main] node1 2023-02-25 00:15:22,452 CleanerJava9.java:71 - 
> java.nio.ByteBuffer.cleaner(): available
> DEBUG [main] node1 2023-02-25 00:15:22,453 PlatformDependent.java:204 - 
> -Dio.netty.noPreferDirect: false
> DEBUG [main] node2 2023-02-25 00:15:23,126 InternalLoggerFactory.java:63 - 
> Using SLF4J as the default logging framework
> DEBUG [main] node2 2023-02-25 00:15:23,133 PlatformDependent0.java:417 - 
> -Dio.netty.noUnsafe: false
> DEBUG [main] node2 2023-02-2

[jira] [Updated] (CASSANDRA-18288) Test Failure: TopPartitionsTest.basicRegularTombstonesTest

2024-06-20 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-18288:
-
Resolution: (was: Fixed)
Status: Open  (was: Resolved)

> Test Failure: TopPartitionsTest.basicRegularTombstonesTest
> --
>
> Key: CASSANDRA-18288
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18288
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Michael Semb Wever
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> from
> - 
> https://ci-cassandra.apache.org/job/Cassandra-trunk/1469/testReport/org.apache.cassandra.distributed.test/TopPartitionsTest/basicRegularTombstonesTest_Incremental___jdk11/
> - 
> https://ci-cassandra.apache.org/job/Cassandra-trunk-jvm-dtest/1534/jdk=jdk_11_latest,label=cassandra,split=8/testReport/junit/org.apache.cassandra.distributed.test/TopPartitionsTest/basicRegularTombstonesTest_Incremental___jdk11/
> Stacktrace
> {noformat}
> java.lang.RuntimeException: 
>   at org.psjava.util.AssertStatus.assertTrue(AssertStatus.java:18)
>   at org.psjava.util.AssertStatus.assertTrue(AssertStatus.java:5)
>   at 
> org.apache.cassandra.distributed.test.TopPartitionsTest.lambda$basicRegularTombstonesTest$e2f532b5$1(TopPartitionsTest.java:226)
>   at org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96)
>   at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)
>   at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> {noformat}
> Standard Output
> {noformat}
> INFO  [main]  2023-02-25 00:15:20,407 Reflections.java:219 - 
> Reflections took 1661 ms to scan 9 urls, producing 1832 keys and 7268 values
> INFO  [main]  2023-02-25 00:15:21,602 Reflections.java:219 - 
> Reflections took 1131 ms to scan 9 urls, producing 1832 keys and 7268 values
> Node id topology:
> node 1: dc = datacenter0, rack = rack0
> node 2: dc = datacenter0, rack = rack0
> Configured node count: 2, nodeIdTopology size: 2
> DEBUG [main] node1 2023-02-25 00:15:22,419 InternalLoggerFactory.java:63 - 
> Using SLF4J as the default logging framework
> DEBUG [main] node1 2023-02-25 00:15:22,426 PlatformDependent0.java:417 - 
> -Dio.netty.noUnsafe: false
> DEBUG [main] node1 2023-02-25 00:15:22,427 PlatformDependent0.java:897 - Java 
> version: 11
> DEBUG [main] node1 2023-02-25 00:15:22,428 PlatformDependent0.java:130 - 
> sun.misc.Unsafe.theUnsafe: available
> DEBUG [main] node1 2023-02-25 00:15:22,429 PlatformDependent0.java:154 - 
> sun.misc.Unsafe.copyMemory: available
> DEBUG [main] node1 2023-02-25 00:15:22,430 PlatformDependent0.java:192 - 
> java.nio.Buffer.address: available
> DEBUG [main] node1 2023-02-25 00:15:22,431 PlatformDependent0.java:257 - 
> direct buffer constructor: available
> DEBUG [main] node1 2023-02-25 00:15:22,432 PlatformDependent0.java:331 - 
> java.nio.Bits.unaligned: available, true
> DEBUG [main] node1 2023-02-25 00:15:22,434 PlatformDependent0.java:393 - 
> jdk.internal.misc.Unsafe.allocateUninitializedArray(int): available
> DEBUG [main] node1 2023-02-25 00:15:22,435 PlatformDependent0.java:403 - 
> java.nio.DirectByteBuffer.(long, int): available
> DEBUG [main] node1 2023-02-25 00:15:22,435 PlatformDependent.java:1079 - 
> sun.misc.Unsafe: available
> DEBUG [main] node1 2023-02-25 00:15:22,448 PlatformDependent.java:1181 - 
> maxDirectMemory: 1056309248 bytes (maybe)
> DEBUG [main] node1 2023-02-25 00:15:22,449 PlatformDependent.java:1200 - 
> -Dio.netty.tmpdir: /home/cassandra/cassandra/tmp (java.io.tmpdir)
> DEBUG [main] node1 2023-02-25 00:15:22,449 PlatformDependent.java:1279 - 
> -Dio.netty.bitMode: 64 (sun.arch.data.model)
> DEBUG [main] node1 2023-02-25 00:15:22,450 PlatformDependent.java:177 - 
> -Dio.netty.maxDirectMemory: 1056309248 bytes
> DEBUG [main] node1 2023-02-25 00:15:22,451 PlatformDependent.java:184 - 
> -Dio.netty.uninitializedArrayAllocationThreshold: 1024
> DEBUG [main] node1 2023-02-25 00:15:22,452 CleanerJava9.java:71 - 
> java.nio.ByteBuffer.cleaner(): available
> DEBUG [main] node1 2023-02-25 00:15:22,453 PlatformDependent.java:204 - 
> -Dio.netty.noPreferDirect: false
> DEBUG [main] node2 2023-02-25 00:15:23,126 InternalLoggerFactory.java:63 - 
> Using SLF4J as the default logging framework
> DEBUG [main] node2 2023-02-25 00:15:23,133 PlatformDependent0.java:417 - 
> -Dio.netty.noUnsafe: false
> DEBUG [main] node2 2023-02-2

  1   2   3   4   5   6   7   8   9   10   >