[jira] [Updated] (CASSANDRA-18736) Streaming exception race creates corrupt transaction log files that prevent restart

2024-04-25 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18736:
-
  Fix Version/s: 4.0.13
 4.1.5
 5.0-beta2
 (was: 4.0.x)
 (was: 4.1.x)
 (was: 5.0.x)
  Since Version: 4.0
Source Control Link:  
https://github.com/apache/cassandra/commit/9157d98e4cc5c00d74cef6128c16659ff43f3585
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Streaming exception race creates corrupt transaction log files that prevent 
> restart
> ---
>
> Key: CASSANDRA-18736
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18736
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Streaming, Local/Startup and Shutdown
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0.13, 4.1.5, 5.0-beta2
>
> Attachments: 4.0-ci_summary.html, 4.1-ci_summary.html, 
> 5.0-ci_summary.html, trunk-ci_summary.html
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> On restart, Cassandra logs this message and terminates.
> {code:java}
> ERROR 2023-07-17T17:17:22,931 [main] 
> org.apache.cassandra.db.lifecycle.LogTransaction:561 - Unexpected disk state: 
> failed to read transaction log 
> [nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log in 
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043]
> Files and contents follow:
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log
> ABORT:[,0,0][737437348]
> ***This record should have been the last one in all replicas
> 
> ADD:[/datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb-284490-big-,0,8][2493503833]
> {code}
> The root cause is a race during streaming exception handling.
> Although concurrent modification of to the {{LogTransaction}} was added for 
> CASSANDRA-16225, there is nothing to prevent usage after the transaction is 
> completed (committed/aborted) once it has been processed by 
> {{TransactionTidier}} (after the last reference is released). Before the 
> transaction is tidied, the {{LogFile}} keeps a list of records that are 
> checked for completion before adding new entries. In {{TransactionTidier}} 
> {{LogFile.records}} are cleared as no longer needed, however the 
> LogTransaction/LogFile is still accessible to the stream.
> The changes in CASSANDRA-17273 added a parallel set of {{onDiskRecords}} that 
> could be used to reliably recreate the transaction log at any new datadirs 
> the same as the existing
> datadirs - regardless of the effect of 
> {{LogTransaction.untrackNew/LogFile.remove}}
> If a streaming exception causes the LogTransaction to be aborted and tidied 
> just before {{SimpleSSTableMultiWriter}} calls trackNew to add a new sstable. 
> At the time of the call, the {{LogFile}} will not contain any {{LogReplicas}},
> {{LogFile.records}} will be empty, and {{LogFile.onDiskRecords}} will contain 
> an {{ABORT}}.
> When {{LogTransaction.trackNew/LogFile.add}} is called, the check for 
> completed transaction fails as records is empty, there are no replicas on the 
> datadir, so {{maybeCreateReplicas}} creates a new txnlog file replica 
> containing ABORT, then
> appends an ADD record.
> The LogFile has already been tidied after the abort so the txnlog file is not 
> removed and sits on disk until a restart, causing the faiulre.
> There is a related exception caused with a different interleaving of aborts, 
> after an sstable is added, however this is just a nuisance in the logs as the 
> LogRelica is already created with an {{ADD}} record first.
> {code:java}
> java.lang.AssertionError: 
> [ADD:[/datadir1/keyspace/table/nb-23314378-big-,0,8][1869379820]] is not 
> tracked by 55be35b0-35d1-11ee-865d-8b1e3c48ca06
> at org.apache.cassandra.db.lifecycle.LogFile.remove(LogFile.java:388)
> at 
> org.apache.cassandra.db.lifecycle.LogTransaction.untrackNew(LogTransaction.java:158)
> at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.untrackNew(LifecycleTransaction.java:577)
> at 
> org.apache.cassandra.db.streaming.CassandraStreamReceiver$1.untrackNew(CassandraStreamReceiver.java:149)
> at 
> org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.abort(SimpleSSTableMultiWriter.java:95)
> at 
> org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter.abort(RangeAwareSSTableWriter.java:191)
> at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamReader.read(CassandraCompressedStreamReader.java:115)
> at 
> 

[jira] [Updated] (CASSANDRA-18736) Streaming exception race creates corrupt transaction log files that prevent restart

2024-04-25 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18736:
-
Status: Ready to Commit  (was: Review In Progress)

> Streaming exception race creates corrupt transaction log files that prevent 
> restart
> ---
>
> Key: CASSANDRA-18736
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18736
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Streaming, Local/Startup and Shutdown
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
> Attachments: 4.0-ci_summary.html, 4.1-ci_summary.html, 
> 5.0-ci_summary.html, trunk-ci_summary.html
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> On restart, Cassandra logs this message and terminates.
> {code:java}
> ERROR 2023-07-17T17:17:22,931 [main] 
> org.apache.cassandra.db.lifecycle.LogTransaction:561 - Unexpected disk state: 
> failed to read transaction log 
> [nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log in 
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043]
> Files and contents follow:
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log
> ABORT:[,0,0][737437348]
> ***This record should have been the last one in all replicas
> 
> ADD:[/datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb-284490-big-,0,8][2493503833]
> {code}
> The root cause is a race during streaming exception handling.
> Although concurrent modification of to the {{LogTransaction}} was added for 
> CASSANDRA-16225, there is nothing to prevent usage after the transaction is 
> completed (committed/aborted) once it has been processed by 
> {{TransactionTidier}} (after the last reference is released). Before the 
> transaction is tidied, the {{LogFile}} keeps a list of records that are 
> checked for completion before adding new entries. In {{TransactionTidier}} 
> {{LogFile.records}} are cleared as no longer needed, however the 
> LogTransaction/LogFile is still accessible to the stream.
> The changes in CASSANDRA-17273 added a parallel set of {{onDiskRecords}} that 
> could be used to reliably recreate the transaction log at any new datadirs 
> the same as the existing
> datadirs - regardless of the effect of 
> {{LogTransaction.untrackNew/LogFile.remove}}
> If a streaming exception causes the LogTransaction to be aborted and tidied 
> just before {{SimpleSSTableMultiWriter}} calls trackNew to add a new sstable. 
> At the time of the call, the {{LogFile}} will not contain any {{LogReplicas}},
> {{LogFile.records}} will be empty, and {{LogFile.onDiskRecords}} will contain 
> an {{ABORT}}.
> When {{LogTransaction.trackNew/LogFile.add}} is called, the check for 
> completed transaction fails as records is empty, there are no replicas on the 
> datadir, so {{maybeCreateReplicas}} creates a new txnlog file replica 
> containing ABORT, then
> appends an ADD record.
> The LogFile has already been tidied after the abort so the txnlog file is not 
> removed and sits on disk until a restart, causing the faiulre.
> There is a related exception caused with a different interleaving of aborts, 
> after an sstable is added, however this is just a nuisance in the logs as the 
> LogRelica is already created with an {{ADD}} record first.
> {code:java}
> java.lang.AssertionError: 
> [ADD:[/datadir1/keyspace/table/nb-23314378-big-,0,8][1869379820]] is not 
> tracked by 55be35b0-35d1-11ee-865d-8b1e3c48ca06
> at org.apache.cassandra.db.lifecycle.LogFile.remove(LogFile.java:388)
> at 
> org.apache.cassandra.db.lifecycle.LogTransaction.untrackNew(LogTransaction.java:158)
> at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.untrackNew(LifecycleTransaction.java:577)
> at 
> org.apache.cassandra.db.streaming.CassandraStreamReceiver$1.untrackNew(CassandraStreamReceiver.java:149)
> at 
> org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.abort(SimpleSSTableMultiWriter.java:95)
> at 
> org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter.abort(RangeAwareSSTableWriter.java:191)
> at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamReader.read(CassandraCompressedStreamReader.java:115)
> at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:85)
> at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:53)
> at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:38)
> at 
> 

[jira] [Updated] (CASSANDRA-18736) Streaming exception race creates corrupt transaction log files that prevent restart

2024-04-25 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18736:
-
Attachment: 4.0-ci_summary.html
4.1-ci_summary.html
5.0-ci_summary.html
trunk-ci_summary.html

> Streaming exception race creates corrupt transaction log files that prevent 
> restart
> ---
>
> Key: CASSANDRA-18736
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18736
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Streaming, Local/Startup and Shutdown
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
> Attachments: 4.0-ci_summary.html, 4.1-ci_summary.html, 
> 5.0-ci_summary.html, trunk-ci_summary.html
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> On restart, Cassandra logs this message and terminates.
> {code:java}
> ERROR 2023-07-17T17:17:22,931 [main] 
> org.apache.cassandra.db.lifecycle.LogTransaction:561 - Unexpected disk state: 
> failed to read transaction log 
> [nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log in 
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043]
> Files and contents follow:
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log
> ABORT:[,0,0][737437348]
> ***This record should have been the last one in all replicas
> 
> ADD:[/datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb-284490-big-,0,8][2493503833]
> {code}
> The root cause is a race during streaming exception handling.
> Although concurrent modification of to the {{LogTransaction}} was added for 
> CASSANDRA-16225, there is nothing to prevent usage after the transaction is 
> completed (committed/aborted) once it has been processed by 
> {{TransactionTidier}} (after the last reference is released). Before the 
> transaction is tidied, the {{LogFile}} keeps a list of records that are 
> checked for completion before adding new entries. In {{TransactionTidier}} 
> {{LogFile.records}} are cleared as no longer needed, however the 
> LogTransaction/LogFile is still accessible to the stream.
> The changes in CASSANDRA-17273 added a parallel set of {{onDiskRecords}} that 
> could be used to reliably recreate the transaction log at any new datadirs 
> the same as the existing
> datadirs - regardless of the effect of 
> {{LogTransaction.untrackNew/LogFile.remove}}
> If a streaming exception causes the LogTransaction to be aborted and tidied 
> just before {{SimpleSSTableMultiWriter}} calls trackNew to add a new sstable. 
> At the time of the call, the {{LogFile}} will not contain any {{LogReplicas}},
> {{LogFile.records}} will be empty, and {{LogFile.onDiskRecords}} will contain 
> an {{ABORT}}.
> When {{LogTransaction.trackNew/LogFile.add}} is called, the check for 
> completed transaction fails as records is empty, there are no replicas on the 
> datadir, so {{maybeCreateReplicas}} creates a new txnlog file replica 
> containing ABORT, then
> appends an ADD record.
> The LogFile has already been tidied after the abort so the txnlog file is not 
> removed and sits on disk until a restart, causing the faiulre.
> There is a related exception caused with a different interleaving of aborts, 
> after an sstable is added, however this is just a nuisance in the logs as the 
> LogRelica is already created with an {{ADD}} record first.
> {code:java}
> java.lang.AssertionError: 
> [ADD:[/datadir1/keyspace/table/nb-23314378-big-,0,8][1869379820]] is not 
> tracked by 55be35b0-35d1-11ee-865d-8b1e3c48ca06
> at org.apache.cassandra.db.lifecycle.LogFile.remove(LogFile.java:388)
> at 
> org.apache.cassandra.db.lifecycle.LogTransaction.untrackNew(LogTransaction.java:158)
> at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.untrackNew(LifecycleTransaction.java:577)
> at 
> org.apache.cassandra.db.streaming.CassandraStreamReceiver$1.untrackNew(CassandraStreamReceiver.java:149)
> at 
> org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.abort(SimpleSSTableMultiWriter.java:95)
> at 
> org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter.abort(RangeAwareSSTableWriter.java:191)
> at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamReader.read(CassandraCompressedStreamReader.java:115)
> at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:85)
> at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:53)
> at 
> 

[jira] [Commented] (CASSANDRA-18736) Streaming exception race creates corrupt transaction log files that prevent restart

2024-04-25 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840897#comment-17840897
 ] 

Jon Meredith commented on CASSANDRA-18736:
--

Starting commit. I believe all of the test failures are known flakes or 
unrelated issues. This patch has been running in production for me for many 
months without issue so I have high confidence in it.

> Streaming exception race creates corrupt transaction log files that prevent 
> restart
> ---
>
> Key: CASSANDRA-18736
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18736
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Streaming, Local/Startup and Shutdown
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> On restart, Cassandra logs this message and terminates.
> {code:java}
> ERROR 2023-07-17T17:17:22,931 [main] 
> org.apache.cassandra.db.lifecycle.LogTransaction:561 - Unexpected disk state: 
> failed to read transaction log 
> [nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log in 
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043]
> Files and contents follow:
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log
> ABORT:[,0,0][737437348]
> ***This record should have been the last one in all replicas
> 
> ADD:[/datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb-284490-big-,0,8][2493503833]
> {code}
> The root cause is a race during streaming exception handling.
> Although concurrent modification of to the {{LogTransaction}} was added for 
> CASSANDRA-16225, there is nothing to prevent usage after the transaction is 
> completed (committed/aborted) once it has been processed by 
> {{TransactionTidier}} (after the last reference is released). Before the 
> transaction is tidied, the {{LogFile}} keeps a list of records that are 
> checked for completion before adding new entries. In {{TransactionTidier}} 
> {{LogFile.records}} are cleared as no longer needed, however the 
> LogTransaction/LogFile is still accessible to the stream.
> The changes in CASSANDRA-17273 added a parallel set of {{onDiskRecords}} that 
> could be used to reliably recreate the transaction log at any new datadirs 
> the same as the existing
> datadirs - regardless of the effect of 
> {{LogTransaction.untrackNew/LogFile.remove}}
> If a streaming exception causes the LogTransaction to be aborted and tidied 
> just before {{SimpleSSTableMultiWriter}} calls trackNew to add a new sstable. 
> At the time of the call, the {{LogFile}} will not contain any {{LogReplicas}},
> {{LogFile.records}} will be empty, and {{LogFile.onDiskRecords}} will contain 
> an {{ABORT}}.
> When {{LogTransaction.trackNew/LogFile.add}} is called, the check for 
> completed transaction fails as records is empty, there are no replicas on the 
> datadir, so {{maybeCreateReplicas}} creates a new txnlog file replica 
> containing ABORT, then
> appends an ADD record.
> The LogFile has already been tidied after the abort so the txnlog file is not 
> removed and sits on disk until a restart, causing the faiulre.
> There is a related exception caused with a different interleaving of aborts, 
> after an sstable is added, however this is just a nuisance in the logs as the 
> LogRelica is already created with an {{ADD}} record first.
> {code:java}
> java.lang.AssertionError: 
> [ADD:[/datadir1/keyspace/table/nb-23314378-big-,0,8][1869379820]] is not 
> tracked by 55be35b0-35d1-11ee-865d-8b1e3c48ca06
> at org.apache.cassandra.db.lifecycle.LogFile.remove(LogFile.java:388)
> at 
> org.apache.cassandra.db.lifecycle.LogTransaction.untrackNew(LogTransaction.java:158)
> at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.untrackNew(LifecycleTransaction.java:577)
> at 
> org.apache.cassandra.db.streaming.CassandraStreamReceiver$1.untrackNew(CassandraStreamReceiver.java:149)
> at 
> org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.abort(SimpleSSTableMultiWriter.java:95)
> at 
> org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter.abort(RangeAwareSSTableWriter.java:191)
> at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamReader.read(CassandraCompressedStreamReader.java:115)
> at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:85)
> at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:53)
> at 
> 

[jira] [Updated] (CASSANDRA-18736) Streaming exception race creates corrupt transaction log files that prevent restart

2024-04-25 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18736:
-
Attachment: (was: ci_summary.html)

> Streaming exception race creates corrupt transaction log files that prevent 
> restart
> ---
>
> Key: CASSANDRA-18736
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18736
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Streaming, Local/Startup and Shutdown
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> On restart, Cassandra logs this message and terminates.
> {code:java}
> ERROR 2023-07-17T17:17:22,931 [main] 
> org.apache.cassandra.db.lifecycle.LogTransaction:561 - Unexpected disk state: 
> failed to read transaction log 
> [nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log in 
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043]
> Files and contents follow:
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log
> ABORT:[,0,0][737437348]
> ***This record should have been the last one in all replicas
> 
> ADD:[/datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb-284490-big-,0,8][2493503833]
> {code}
> The root cause is a race during streaming exception handling.
> Although concurrent modification of to the {{LogTransaction}} was added for 
> CASSANDRA-16225, there is nothing to prevent usage after the transaction is 
> completed (committed/aborted) once it has been processed by 
> {{TransactionTidier}} (after the last reference is released). Before the 
> transaction is tidied, the {{LogFile}} keeps a list of records that are 
> checked for completion before adding new entries. In {{TransactionTidier}} 
> {{LogFile.records}} are cleared as no longer needed, however the 
> LogTransaction/LogFile is still accessible to the stream.
> The changes in CASSANDRA-17273 added a parallel set of {{onDiskRecords}} that 
> could be used to reliably recreate the transaction log at any new datadirs 
> the same as the existing
> datadirs - regardless of the effect of 
> {{LogTransaction.untrackNew/LogFile.remove}}
> If a streaming exception causes the LogTransaction to be aborted and tidied 
> just before {{SimpleSSTableMultiWriter}} calls trackNew to add a new sstable. 
> At the time of the call, the {{LogFile}} will not contain any {{LogReplicas}},
> {{LogFile.records}} will be empty, and {{LogFile.onDiskRecords}} will contain 
> an {{ABORT}}.
> When {{LogTransaction.trackNew/LogFile.add}} is called, the check for 
> completed transaction fails as records is empty, there are no replicas on the 
> datadir, so {{maybeCreateReplicas}} creates a new txnlog file replica 
> containing ABORT, then
> appends an ADD record.
> The LogFile has already been tidied after the abort so the txnlog file is not 
> removed and sits on disk until a restart, causing the faiulre.
> There is a related exception caused with a different interleaving of aborts, 
> after an sstable is added, however this is just a nuisance in the logs as the 
> LogRelica is already created with an {{ADD}} record first.
> {code:java}
> java.lang.AssertionError: 
> [ADD:[/datadir1/keyspace/table/nb-23314378-big-,0,8][1869379820]] is not 
> tracked by 55be35b0-35d1-11ee-865d-8b1e3c48ca06
> at org.apache.cassandra.db.lifecycle.LogFile.remove(LogFile.java:388)
> at 
> org.apache.cassandra.db.lifecycle.LogTransaction.untrackNew(LogTransaction.java:158)
> at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.untrackNew(LifecycleTransaction.java:577)
> at 
> org.apache.cassandra.db.streaming.CassandraStreamReceiver$1.untrackNew(CassandraStreamReceiver.java:149)
> at 
> org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.abort(SimpleSSTableMultiWriter.java:95)
> at 
> org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter.abort(RangeAwareSSTableWriter.java:191)
> at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamReader.read(CassandraCompressedStreamReader.java:115)
> at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:85)
> at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:53)
> at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:38)
> at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:53)
> at 
> 

[jira] [Updated] (CASSANDRA-18736) Streaming exception race creates corrupt transaction log files that prevent restart

2024-04-25 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18736:
-
Attachment: ci_summary.html

> Streaming exception race creates corrupt transaction log files that prevent 
> restart
> ---
>
> Key: CASSANDRA-18736
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18736
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Streaming, Local/Startup and Shutdown
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> On restart, Cassandra logs this message and terminates.
> {code:java}
> ERROR 2023-07-17T17:17:22,931 [main] 
> org.apache.cassandra.db.lifecycle.LogTransaction:561 - Unexpected disk state: 
> failed to read transaction log 
> [nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log in 
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043]
> Files and contents follow:
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log
> ABORT:[,0,0][737437348]
> ***This record should have been the last one in all replicas
> 
> ADD:[/datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb-284490-big-,0,8][2493503833]
> {code}
> The root cause is a race during streaming exception handling.
> Although concurrent modification of to the {{LogTransaction}} was added for 
> CASSANDRA-16225, there is nothing to prevent usage after the transaction is 
> completed (committed/aborted) once it has been processed by 
> {{TransactionTidier}} (after the last reference is released). Before the 
> transaction is tidied, the {{LogFile}} keeps a list of records that are 
> checked for completion before adding new entries. In {{TransactionTidier}} 
> {{LogFile.records}} are cleared as no longer needed, however the 
> LogTransaction/LogFile is still accessible to the stream.
> The changes in CASSANDRA-17273 added a parallel set of {{onDiskRecords}} that 
> could be used to reliably recreate the transaction log at any new datadirs 
> the same as the existing
> datadirs - regardless of the effect of 
> {{LogTransaction.untrackNew/LogFile.remove}}
> If a streaming exception causes the LogTransaction to be aborted and tidied 
> just before {{SimpleSSTableMultiWriter}} calls trackNew to add a new sstable. 
> At the time of the call, the {{LogFile}} will not contain any {{LogReplicas}},
> {{LogFile.records}} will be empty, and {{LogFile.onDiskRecords}} will contain 
> an {{ABORT}}.
> When {{LogTransaction.trackNew/LogFile.add}} is called, the check for 
> completed transaction fails as records is empty, there are no replicas on the 
> datadir, so {{maybeCreateReplicas}} creates a new txnlog file replica 
> containing ABORT, then
> appends an ADD record.
> The LogFile has already been tidied after the abort so the txnlog file is not 
> removed and sits on disk until a restart, causing the faiulre.
> There is a related exception caused with a different interleaving of aborts, 
> after an sstable is added, however this is just a nuisance in the logs as the 
> LogRelica is already created with an {{ADD}} record first.
> {code:java}
> java.lang.AssertionError: 
> [ADD:[/datadir1/keyspace/table/nb-23314378-big-,0,8][1869379820]] is not 
> tracked by 55be35b0-35d1-11ee-865d-8b1e3c48ca06
> at org.apache.cassandra.db.lifecycle.LogFile.remove(LogFile.java:388)
> at 
> org.apache.cassandra.db.lifecycle.LogTransaction.untrackNew(LogTransaction.java:158)
> at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.untrackNew(LifecycleTransaction.java:577)
> at 
> org.apache.cassandra.db.streaming.CassandraStreamReceiver$1.untrackNew(CassandraStreamReceiver.java:149)
> at 
> org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.abort(SimpleSSTableMultiWriter.java:95)
> at 
> org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter.abort(RangeAwareSSTableWriter.java:191)
> at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamReader.read(CassandraCompressedStreamReader.java:115)
> at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:85)
> at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:53)
> at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:38)
> at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:53)
> at 
> 

[jira] [Commented] (CASSANDRA-18736) Streaming exception race creates corrupt transaction log files that prevent restart

2024-04-15 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837487#comment-17837487
 ] 

Jon Meredith commented on CASSANDRA-18736:
--

As expected, the (modified) tests fail on 4.0 so I've backported there too. 
I'll try and post test runs tomorrow but wanted to share the changes in case 
anybody was waiting on it.

PRs
Trunk https://github.com/apache/cassandra/pull/3250
5.0 https://github.com/apache/cassandra/pull/3251
4.1 https://github.com/apache/cassandra/pull/3252
4.0 https://github.com/apache/cassandra/pull/3253

> Streaming exception race creates corrupt transaction log files that prevent 
> restart
> ---
>
> Key: CASSANDRA-18736
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18736
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Streaming, Local/Startup and Shutdown
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> On restart, Cassandra logs this message and terminates.
> {code:java}
> ERROR 2023-07-17T17:17:22,931 [main] 
> org.apache.cassandra.db.lifecycle.LogTransaction:561 - Unexpected disk state: 
> failed to read transaction log 
> [nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log in 
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043]
> Files and contents follow:
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log
> ABORT:[,0,0][737437348]
> ***This record should have been the last one in all replicas
> 
> ADD:[/datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb-284490-big-,0,8][2493503833]
> {code}
> The root cause is a race during streaming exception handling.
> Although concurrent modification of to the {{LogTransaction}} was added for 
> CASSANDRA-16225, there is nothing to prevent usage after the transaction is 
> completed (committed/aborted) once it has been processed by 
> {{TransactionTidier}} (after the last reference is released). Before the 
> transaction is tidied, the {{LogFile}} keeps a list of records that are 
> checked for completion before adding new entries. In {{TransactionTidier}} 
> {{LogFile.records}} are cleared as no longer needed, however the 
> LogTransaction/LogFile is still accessible to the stream.
> The changes in CASSANDRA-17273 added a parallel set of {{onDiskRecords}} that 
> could be used to reliably recreate the transaction log at any new datadirs 
> the same as the existing
> datadirs - regardless of the effect of 
> {{LogTransaction.untrackNew/LogFile.remove}}
> If a streaming exception causes the LogTransaction to be aborted and tidied 
> just before {{SimpleSSTableMultiWriter}} calls trackNew to add a new sstable. 
> At the time of the call, the {{LogFile}} will not contain any {{LogReplicas}},
> {{LogFile.records}} will be empty, and {{LogFile.onDiskRecords}} will contain 
> an {{ABORT}}.
> When {{LogTransaction.trackNew/LogFile.add}} is called, the check for 
> completed transaction fails as records is empty, there are no replicas on the 
> datadir, so {{maybeCreateReplicas}} creates a new txnlog file replica 
> containing ABORT, then
> appends an ADD record.
> The LogFile has already been tidied after the abort so the txnlog file is not 
> removed and sits on disk until a restart, causing the faiulre.
> There is a related exception caused with a different interleaving of aborts, 
> after an sstable is added, however this is just a nuisance in the logs as the 
> LogRelica is already created with an {{ADD}} record first.
> {code:java}
> java.lang.AssertionError: 
> [ADD:[/datadir1/keyspace/table/nb-23314378-big-,0,8][1869379820]] is not 
> tracked by 55be35b0-35d1-11ee-865d-8b1e3c48ca06
> at org.apache.cassandra.db.lifecycle.LogFile.remove(LogFile.java:388)
> at 
> org.apache.cassandra.db.lifecycle.LogTransaction.untrackNew(LogTransaction.java:158)
> at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.untrackNew(LifecycleTransaction.java:577)
> at 
> org.apache.cassandra.db.streaming.CassandraStreamReceiver$1.untrackNew(CassandraStreamReceiver.java:149)
> at 
> org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.abort(SimpleSSTableMultiWriter.java:95)
> at 
> org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter.abort(RangeAwareSSTableWriter.java:191)
> at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamReader.read(CassandraCompressedStreamReader.java:115)
> at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:85)
> at 
> 

[jira] (CASSANDRA-18736) Streaming exception race creates corrupt transaction log files that prevent restart

2024-04-15 Thread Jon Meredith (Jira)


[ https://issues.apache.org/jira/browse/CASSANDRA-18736 ]


Jon Meredith deleted comment on CASSANDRA-18736:
--

was (Author: jonmeredith):
Trunk [PR|https://github.com/apache/cassandra/pull/2565] 
[Branch|https://github.com/jonmeredith/cassandra/tree/C18733-trunk]
[5.0 Branch|https://github.com/jonmeredith/cassandra/tree/C18733-5.0]

Only minor difference is no \{{StreamSession.failureReason}}
[4.1 Branch|https://github.com/jonmeredith/cassandra/tree/C18733-4.1]

> Streaming exception race creates corrupt transaction log files that prevent 
> restart
> ---
>
> Key: CASSANDRA-18736
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18736
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Streaming, Local/Startup and Shutdown
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>
> On restart, Cassandra logs this message and terminates.
> {code:java}
> ERROR 2023-07-17T17:17:22,931 [main] 
> org.apache.cassandra.db.lifecycle.LogTransaction:561 - Unexpected disk state: 
> failed to read transaction log 
> [nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log in 
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043]
> Files and contents follow:
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log
> ABORT:[,0,0][737437348]
> ***This record should have been the last one in all replicas
> 
> ADD:[/datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb-284490-big-,0,8][2493503833]
> {code}
> The root cause is a race during streaming exception handling.
> Although concurrent modification of to the {{LogTransaction}} was added for 
> CASSANDRA-16225, there is nothing to prevent usage after the transaction is 
> completed (committed/aborted) once it has been processed by 
> {{TransactionTidier}} (after the last reference is released). Before the 
> transaction is tidied, the {{LogFile}} keeps a list of records that are 
> checked for completion before adding new entries. In {{TransactionTidier}} 
> {{LogFile.records}} are cleared as no longer needed, however the 
> LogTransaction/LogFile is still accessible to the stream.
> The changes in CASSANDRA-17273 added a parallel set of {{onDiskRecords}} that 
> could be used to reliably recreate the transaction log at any new datadirs 
> the same as the existing
> datadirs - regardless of the effect of 
> {{LogTransaction.untrackNew/LogFile.remove}}
> If a streaming exception causes the LogTransaction to be aborted and tidied 
> just before {{SimpleSSTableMultiWriter}} calls trackNew to add a new sstable. 
> At the time of the call, the {{LogFile}} will not contain any {{LogReplicas}},
> {{LogFile.records}} will be empty, and {{LogFile.onDiskRecords}} will contain 
> an {{ABORT}}.
> When {{LogTransaction.trackNew/LogFile.add}} is called, the check for 
> completed transaction fails as records is empty, there are no replicas on the 
> datadir, so {{maybeCreateReplicas}} creates a new txnlog file replica 
> containing ABORT, then
> appends an ADD record.
> The LogFile has already been tidied after the abort so the txnlog file is not 
> removed and sits on disk until a restart, causing the faiulre.
> There is a related exception caused with a different interleaving of aborts, 
> after an sstable is added, however this is just a nuisance in the logs as the 
> LogRelica is already created with an {{ADD}} record first.
> {code:java}
> java.lang.AssertionError: 
> [ADD:[/datadir1/keyspace/table/nb-23314378-big-,0,8][1869379820]] is not 
> tracked by 55be35b0-35d1-11ee-865d-8b1e3c48ca06
> at org.apache.cassandra.db.lifecycle.LogFile.remove(LogFile.java:388)
> at 
> org.apache.cassandra.db.lifecycle.LogTransaction.untrackNew(LogTransaction.java:158)
> at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.untrackNew(LifecycleTransaction.java:577)
> at 
> org.apache.cassandra.db.streaming.CassandraStreamReceiver$1.untrackNew(CassandraStreamReceiver.java:149)
> at 
> org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.abort(SimpleSSTableMultiWriter.java:95)
> at 
> org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter.abort(RangeAwareSSTableWriter.java:191)
> at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamReader.read(CassandraCompressedStreamReader.java:115)
> at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:85)
> at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:53)
> at 
> 

[jira] [Commented] (CASSANDRA-18736) Streaming exception race creates corrupt transaction log files that prevent restart

2024-04-15 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837333#comment-17837333
 ] 

Jon Meredith commented on CASSANDRA-18736:
--

I'm double-checking at the moment. I think the bug became an issue after some 
restructuring for the simulator, but need to convince myself of that. 

Also, I've deleted the original branches/PRs as I accidentally referenced ones 
from a different fix. I'll post new ones in a day or so.

> Streaming exception race creates corrupt transaction log files that prevent 
> restart
> ---
>
> Key: CASSANDRA-18736
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18736
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Streaming, Local/Startup and Shutdown
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>
> On restart, Cassandra logs this message and terminates.
> {code:java}
> ERROR 2023-07-17T17:17:22,931 [main] 
> org.apache.cassandra.db.lifecycle.LogTransaction:561 - Unexpected disk state: 
> failed to read transaction log 
> [nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log in 
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043]
> Files and contents follow:
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log
> ABORT:[,0,0][737437348]
> ***This record should have been the last one in all replicas
> 
> ADD:[/datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb-284490-big-,0,8][2493503833]
> {code}
> The root cause is a race during streaming exception handling.
> Although concurrent modification of to the {{LogTransaction}} was added for 
> CASSANDRA-16225, there is nothing to prevent usage after the transaction is 
> completed (committed/aborted) once it has been processed by 
> {{TransactionTidier}} (after the last reference is released). Before the 
> transaction is tidied, the {{LogFile}} keeps a list of records that are 
> checked for completion before adding new entries. In {{TransactionTidier}} 
> {{LogFile.records}} are cleared as no longer needed, however the 
> LogTransaction/LogFile is still accessible to the stream.
> The changes in CASSANDRA-17273 added a parallel set of {{onDiskRecords}} that 
> could be used to reliably recreate the transaction log at any new datadirs 
> the same as the existing
> datadirs - regardless of the effect of 
> {{LogTransaction.untrackNew/LogFile.remove}}
> If a streaming exception causes the LogTransaction to be aborted and tidied 
> just before {{SimpleSSTableMultiWriter}} calls trackNew to add a new sstable. 
> At the time of the call, the {{LogFile}} will not contain any {{LogReplicas}},
> {{LogFile.records}} will be empty, and {{LogFile.onDiskRecords}} will contain 
> an {{ABORT}}.
> When {{LogTransaction.trackNew/LogFile.add}} is called, the check for 
> completed transaction fails as records is empty, there are no replicas on the 
> datadir, so {{maybeCreateReplicas}} creates a new txnlog file replica 
> containing ABORT, then
> appends an ADD record.
> The LogFile has already been tidied after the abort so the txnlog file is not 
> removed and sits on disk until a restart, causing the faiulre.
> There is a related exception caused with a different interleaving of aborts, 
> after an sstable is added, however this is just a nuisance in the logs as the 
> LogRelica is already created with an {{ADD}} record first.
> {code:java}
> java.lang.AssertionError: 
> [ADD:[/datadir1/keyspace/table/nb-23314378-big-,0,8][1869379820]] is not 
> tracked by 55be35b0-35d1-11ee-865d-8b1e3c48ca06
> at org.apache.cassandra.db.lifecycle.LogFile.remove(LogFile.java:388)
> at 
> org.apache.cassandra.db.lifecycle.LogTransaction.untrackNew(LogTransaction.java:158)
> at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.untrackNew(LifecycleTransaction.java:577)
> at 
> org.apache.cassandra.db.streaming.CassandraStreamReceiver$1.untrackNew(CassandraStreamReceiver.java:149)
> at 
> org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.abort(SimpleSSTableMultiWriter.java:95)
> at 
> org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter.abort(RangeAwareSSTableWriter.java:191)
> at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamReader.read(CassandraCompressedStreamReader.java:115)
> at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:85)
> at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:53)
> at 
> 

[jira] [Commented] (CASSANDRA-19508) Getting tons of msgs "Failed to get peer certificates for peer /x.x.x.x:45796" when require_client_auth is set to false

2024-04-04 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833992#comment-17833992
 ] 

Jon Meredith commented on CASSANDRA-19508:
--

[~Aburadeh]Thanks for updating the patch, [~brandon.williams] thanks for 
rerunning CI, I ran out of time to kick off a run yesterday.

+1 from me.

> Getting tons of msgs "Failed to get peer certificates for peer 
> /x.x.x.x:45796" when require_client_auth is set to false
> ---
>
> Key: CASSANDRA-19508
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19508
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption
>Reporter: Mohammad Aburadeh
>Assignee: Mohammad Aburadeh
>Priority: Urgent
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> We recently upgraded our production clusters from 3.11.15 to 4.1.4. We 
> started seeing thousands of msgs "Failed to get peer certificates for peer 
> /x.x.x.x:45796". SSL is enabled but require_client_auth is disabled.  This is 
> causing a huge problem for us because cassandra log files are growing very 
> fast as our connections are short live connections, we open more than 1K 
> connections per second and they stay live for 1-2 seconds. 
> {code:java}
> DEBUG [Native-Transport-Requests-2] 2024-03-31 21:26:38,026 
> ServerConnection.java:140 - Failed to get peer certificates for peer 
> /172.31.2.23:45796
> javax.net.ssl.SSLPeerUnverifiedException: peer not verified
>         at 
> io.netty.handler.ssl.ReferenceCountedOpenSslEngine$DefaultOpenSslSession.getPeerCertificateChain(ReferenceCountedOpenSslEngine.java:2414)
>         at 
> io.netty.handler.ssl.ExtendedOpenSslSession.getPeerCertificateChain(ExtendedOpenSslSession.java:140)
>         at 
> org.apache.cassandra.transport.ServerConnection.certificates(ServerConnection.java:136)
>         at 
> org.apache.cassandra.transport.ServerConnection.getSaslNegotiator(ServerConnection.java:120)
>         at 
> org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:76)
>         at 
> org.apache.cassandra.transport.Message$Request.execute(Message.java:255)
>         at 
> org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:166)
>         at 
> org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:185)
>         at 
> org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:212)
>         at 
> org.apache.cassandra.transport.Dispatcher$RequestProcessor.run(Dispatcher.java:109)
>         at 
> org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96)
>         at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)
>         at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)
>         at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:142)
>         at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  {code}
> *Our SSL config:*
> {code:java}
> client_encryption_options:
>   enabled: true
>   keystore: /path/to/keystore
>   keystore_password: x
>   optional: false
>   require_client_auth: false {code}
>  
> We should stop throwing this msg when require_client_auth is set to false. Or 
> at least it should be logged in TRACE not DEBUG. 
> I'm working on preparing a PR. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19508) Getting tons of msgs "Failed to get peer certificates for peer /x.x.x.x:45796" when require_client_auth is set to false

2024-04-03 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833771#comment-17833771
 ] 

Jon Meredith commented on CASSANDRA-19508:
--

That's a lot of logs to deal with. Have you tried adding something like this to 
your {{logback.xml}} file to improve things in the short term?

{code:xml}

{code}

I don't think we should merge the patch as it stands because it disables 
retrieving the certificate if not required and it may be used by 
{{IAuthenticator}} implementions. We could drop the log level to {{TRACE}} -- 
although logging per socket connection event at {{DEBUG}} level doesn't seem 
unreasonable and it seems like other log events at that level could be added in 
the future.

something like this instead? It should be a simpler patch and not involve the 
config subsystem.

{code}
diff --git a/src/java/org/apache/cassandra/transport/ServerConnection.java 
b/src/java/org/apache/cassandra/transport/ServerConnection.java
index 21f2e0b0e6..b47d0d9c66 100644
--- a/src/java/org/apache/cassandra/transport/ServerConnection.java
+++ b/src/java/org/apache/cassandra/transport/ServerConnection.java
@@ -137,7 +137,8 @@ public class ServerConnection extends Connection
 }
 catch (SSLPeerUnverifiedException e)
 {
-logger.debug("Failed to get peer certificates for peer {}", 
channel().remoteAddress(), e);
+if (logger.isTraceEnabled())
+logger.trace("Failed to get peer certificates for peer 
{}", channel().remoteAddress(), e);
 }
 }
 return certificates;
{code}




> Getting tons of msgs "Failed to get peer certificates for peer 
> /x.x.x.x:45796" when require_client_auth is set to false
> ---
>
> Key: CASSANDRA-19508
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19508
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption
>Reporter: Mohammad Aburadeh
>Assignee: Mohammad Aburadeh
>Priority: Urgent
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> We recently upgraded our production clusters from 3.11.15 to 4.1.4. We 
> started seeing thousands of msgs "Failed to get peer certificates for peer 
> /x.x.x.x:45796". SSL is enabled but require_client_auth is disabled.  This is 
> causing a huge problem for us because cassandra log files are growing very 
> fast as our connections are short live connections, we open more than 1K 
> connections per second and they stay live for 1-2 seconds. 
> {code:java}
> DEBUG [Native-Transport-Requests-2] 2024-03-31 21:26:38,026 
> ServerConnection.java:140 - Failed to get peer certificates for peer 
> /172.31.2.23:45796
> javax.net.ssl.SSLPeerUnverifiedException: peer not verified
>         at 
> io.netty.handler.ssl.ReferenceCountedOpenSslEngine$DefaultOpenSslSession.getPeerCertificateChain(ReferenceCountedOpenSslEngine.java:2414)
>         at 
> io.netty.handler.ssl.ExtendedOpenSslSession.getPeerCertificateChain(ExtendedOpenSslSession.java:140)
>         at 
> org.apache.cassandra.transport.ServerConnection.certificates(ServerConnection.java:136)
>         at 
> org.apache.cassandra.transport.ServerConnection.getSaslNegotiator(ServerConnection.java:120)
>         at 
> org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:76)
>         at 
> org.apache.cassandra.transport.Message$Request.execute(Message.java:255)
>         at 
> org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:166)
>         at 
> org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:185)
>         at 
> org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:212)
>         at 
> org.apache.cassandra.transport.Dispatcher$RequestProcessor.run(Dispatcher.java:109)
>         at 
> org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96)
>         at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)
>         at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)
>         at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:142)
>         at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  {code}
> *Our SSL config:*
> {code:java}
> client_encryption_options:
>   enabled: true
>   keystore: /path/to/keystore
>   keystore_password: x
>   optional: false
>   require_client_auth: false {code}
>  
> We should stop throwing this msg when require_client_auth is set to false. Or 
> at least it should be logged in TRACE not DEBUG. 
> I'm working on preparing a PR. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CASSANDRA-19508) Getting tons of msgs "Failed to get peer certificates for peer /x.x.x.x:45796" when require_client_auth is set to false

2024-04-03 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833585#comment-17833585
 ] 

Jon Meredith commented on CASSANDRA-19508:
--

[~Aburadeh] sorry the logging is causing you issues on upgrade. Are you running 
DEBUG level logs on your production servers - is there some other logging you 
need access to that is not available at INFO level?  If not, you could adjust 
the logging configuration to switch to INFO for the ServerConnection logger.

I can see the temptation to disable the check if the client certificates aren't 
required, but we don't know whether {{IAuthenticator}} implementations outside 
the main source tree use that information -- one example could be during 
configuration migrations to see whether it is safe to require client 
authentication or not without breaking existing authentication flow.



> Getting tons of msgs "Failed to get peer certificates for peer 
> /x.x.x.x:45796" when require_client_auth is set to false
> ---
>
> Key: CASSANDRA-19508
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19508
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption
>Reporter: Mohammad Aburadeh
>Assignee: Mohammad Aburadeh
>Priority: Urgent
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> We recently upgraded our production clusters from 3.11.15 to 4.1.4. We 
> started seeing thousands of msgs "Failed to get peer certificates for peer 
> /x.x.x.x:45796". SSL is enabled but require_client_auth is disabled.  This is 
> causing a huge problem for us because cassandra log files are growing very 
> fast as our connections are short live connections, we open more than 1K 
> connections per second and they stay live for 1-2 seconds. 
> {code:java}
> DEBUG [Native-Transport-Requests-2] 2024-03-31 21:26:38,026 
> ServerConnection.java:140 - Failed to get peer certificates for peer 
> /172.31.2.23:45796
> javax.net.ssl.SSLPeerUnverifiedException: peer not verified
>         at 
> io.netty.handler.ssl.ReferenceCountedOpenSslEngine$DefaultOpenSslSession.getPeerCertificateChain(ReferenceCountedOpenSslEngine.java:2414)
>         at 
> io.netty.handler.ssl.ExtendedOpenSslSession.getPeerCertificateChain(ExtendedOpenSslSession.java:140)
>         at 
> org.apache.cassandra.transport.ServerConnection.certificates(ServerConnection.java:136)
>         at 
> org.apache.cassandra.transport.ServerConnection.getSaslNegotiator(ServerConnection.java:120)
>         at 
> org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:76)
>         at 
> org.apache.cassandra.transport.Message$Request.execute(Message.java:255)
>         at 
> org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:166)
>         at 
> org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:185)
>         at 
> org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:212)
>         at 
> org.apache.cassandra.transport.Dispatcher$RequestProcessor.run(Dispatcher.java:109)
>         at 
> org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96)
>         at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)
>         at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)
>         at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:142)
>         at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  {code}
> *Our SSL config:*
> {code:java}
> client_encryption_options:
>   enabled: true
>   keystore: /path/to/keystore
>   keystore_password: x
>   optional: false
>   require_client_auth: false {code}
>  
> We should stop throwing this msg when require_client_auth is set to false. Or 
> at least it should be logged in TRACE not DEBUG. 
> I'm working on preparing a PR. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18811) Set right client auth for creating SSL context in mTLS optional mode

2023-12-19 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18811:
-
Fix Version/s: (was: 4.1.x)
   (was: 5.0.x)

> Set right client auth for creating SSL context in mTLS optional mode
> 
>
> Key: CASSANDRA-18811
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18811
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client, Messaging/Internode
>Reporter: Jyothsna Konisa
>Assignee: Jyothsna Konisa
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Adding a new value `optional` for require_client_auth in Encryption options. 
> when require_client_auth is optional, the SSL context that is created will 
> allow client connections that provide a client certificate along with the 
> client connections that do not provide certificates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18811) Set right client auth for creating SSL context in mTLS optional mode

2023-12-19 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798742#comment-17798742
 ] 

Jon Meredith commented on CASSANDRA-18811:
--

+1, thanks. Merged.

> Set right client auth for creating SSL context in mTLS optional mode
> 
>
> Key: CASSANDRA-18811
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18811
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client, Messaging/Internode
>Reporter: Jyothsna Konisa
>Assignee: Jyothsna Konisa
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Adding a new value `optional` for require_client_auth in Encryption options. 
> when require_client_auth is optional, the SSL context that is created will 
> allow client connections that provide a client certificate along with the 
> client connections that do not provide certificates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18811) Set right client auth for creating SSL context in mTLS optional mode

2023-12-19 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18811:
-
  Since Version: 5.x
Source Control Link: 
https://github.com/apache/cassandra/commit/bfcb21fbebfef14fbfe626bfd39d66f5e5c51018
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Set right client auth for creating SSL context in mTLS optional mode
> 
>
> Key: CASSANDRA-18811
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18811
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client, Messaging/Internode
>Reporter: Jyothsna Konisa
>Assignee: Jyothsna Konisa
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Adding a new value `optional` for require_client_auth in Encryption options. 
> when require_client_auth is optional, the SSL context that is created will 
> allow client connections that provide a client certificate along with the 
> client connections that do not provide certificates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18811) Set right client auth for creating SSL context in mTLS optional mode

2023-12-19 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18811:
-
Status: Ready to Commit  (was: Review In Progress)

> Set right client auth for creating SSL context in mTLS optional mode
> 
>
> Key: CASSANDRA-18811
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18811
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client, Messaging/Internode
>Reporter: Jyothsna Konisa
>Assignee: Jyothsna Konisa
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Adding a new value `optional` for require_client_auth in Encryption options. 
> when require_client_auth is optional, the SSL context that is created will 
> allow client connections that provide a client certificate along with the 
> client connections that do not provide certificates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18733) Waiting indefinitely on ReceivedMessage response in StreamSession#receive() can cause deadlock

2023-09-29 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770548#comment-17770548
 ] 

Jon Meredith commented on CASSANDRA-18733:
--

Reran with 5.0 -- clean after rerunning. Issues due to docker hub rather than 
tests.

java11_separate_tests 
https://app.circleci.com/pipelines/github/jonmeredith/cassandra/960/workflows/6d34a65c-93cd-4b96-9f5f-171d13f326e8
 clean except j17_jvm_dtests_repeat 
https://app.circleci.com/pipelines/github/jonmeredith/cassandra/960/workflows/6d34a65c-93cd-4b96-9f5f-171d13f326e8/jobs/20393/parallel-runs/22?filterBy=FAILED
  -- 503 errors for docker hub on 14/25 pods.
rerun 
https://app.circleci.com/pipelines/github/jonmeredith/cassandra/960/workflows/108d649c-41a0-4073-871a-be5bca206bd1
java17_separate_tests 
https://app.circleci.com/pipelines/github/jonmeredith/cassandra/960/workflows/c41e05a6-a6cb-4e13-83fb-255f2a59d399
 -- clean

> Waiting indefinitely on ReceivedMessage response in StreamSession#receive() 
> can cause deadlock
> --
>
> Key: CASSANDRA-18733
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18733
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming
>Reporter: Caleb Rackliffe
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.1.4, 5.0, 5.0-alpha1, 5.1
>
>
> I've observed in a recent stack trace from a node running 4.1 what looks like 
> a deadlock around the {{StreamSession}} monitor lock when 
> {{StreamSession#receive()}} waits via {{syncUninteruptibly()}} for a response 
> to a control message.
> {noformat}
> "Messaging-EventLoop-3-10" #320 daemon prio=5 os_prio=0 cpu=57979617.98ms 
> elapsed=5587916.03s tid=0x7f056e88ae00 nid=0x80ec waiting for monitor 
> entry  [0x7f056d277000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:524)
> - waiting to lock <0x0006816fae70> (a 
> org.apache.cassandra.streaming.StreamSession)
> at 
> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:690)
> at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel.onMessageComplete(StreamingMultiplexedChannel.java:264)
> at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel.lambda$sendMessage$1(StreamingMultiplexedChannel.java:233)
> at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel$$Lambda$2029/0x0008007a0c40.operationComplete(Unknown
>  Source)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notifyListener(ListenerList.java:134)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notifyListener(ListenerList.java:148)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList$GenericFutureListenerList.notifySelf(ListenerList.java:190)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.lambda$notifyExclusive$0(ListenerList.java:124)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList$$Lambda$950/0x000800666040.accept(Unknown
>  Source)
> at 
> org.apache.cassandra.utils.concurrent.IntrusiveStack.forEach(IntrusiveStack.java:195)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notifyExclusive(ListenerList.java:124)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notify(ListenerList.java:96)
> at 
> org.apache.cassandra.utils.concurrent.AsyncFuture.trySet(AsyncFuture.java:104)
> at 
> org.apache.cassandra.utils.concurrent.AbstractFuture.tryFailure(AbstractFuture.java:148)
> at 
> org.apache.cassandra.utils.concurrent.AsyncPromise.tryFailure(AsyncPromise.java:139)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:1009)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:870)
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
> at 
> io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
> at 
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
> at 
> 

[jira] [Commented] (CASSANDRA-18725) IsolatedJMX should not release all TCPEndpoints on instance shutdown

2023-09-29 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770521#comment-17770521
 ] 

Jon Meredith commented on CASSANDRA-18725:
--

Reran against 5.0 branch. Clean runs.

java11_separate_tests 
https://app.circleci.com/pipelines/github/jonmeredith/cassandra/959/workflows/99204b8b-40a2-4ceb-becd-df6d3d0afdc2
java17_separate_tests 
https://app.circleci.com/pipelines/github/jonmeredith/cassandra/959/workflows/a1e6df8b-b61c-4359-aaef-9b3ff08ebbcd

> IsolatedJMX should not release all TCPEndpoints on instance shutdown
> 
>
> Key: CASSANDRA-18725
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18725
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
> Fix For: 3.11.17, 4.0.12, 4.1.4, 5.0-alpha, 5.x
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> In the original implementation of the JMX feature, we fixed some memory leaks 
> by clearing some internal state in Java’s TCPEndpoint. However, that 
> implementation was overly aggressive and cleared the whole map, vs. just 
> removing the endpoints created by the individual instances. This causes 
> issues when you remove a node from the cluster (as all of the endpoints are 
> cleared, not just the ones in use by that instance).
>  
> In stead, we should check if the endpoint was created by the instance in 
> question and only remove it if it was.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18681) Internode legacy SSL storage port certificate is not hot reloaded on update

2023-09-29 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770519#comment-17770519
 ] 

Jon Meredith commented on CASSANDRA-18681:
--

Reran against 5.0 branch. Clean runs.

java11_separate_tests 
https://app.circleci.com/pipelines/github/jonmeredith/cassandra/958/workflows/906a8642-f525-4d52-a981-eba879717aaa
java17_separate_tests 
https://app.circleci.com/pipelines/github/jonmeredith/cassandra/958/workflows/11af46d5-c996-409e-b9c2-4e1aea2a5881

> Internode legacy SSL storage port certificate is not hot reloaded on update
> ---
>
> Key: CASSANDRA-18681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18681
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.1.4, 5.0-alpha2
>
>
> In CASSANDRA-1 the SSLContext cache was changed to clear individual 
> {{EncryptionOptions}} from the SslContext cache if they needed reloading to 
> reduce resource consumption. Before the change if ANY cert needed hot 
> reloading, the SSLContext cache would be cleared for ALL certs.
> If the legacy SSL storage port is configured, a new {{EncryptionOptions}} 
> object is created in {{org.apache.cassandra.net.InboundSockets#addBindings}} 
> just for binding the socket, but never gets cleared as the change in port 
> means it no longer matches the configuration retrieved from 
> {{DatabaseDescriptor}} in 
> {{org.apache.cassandra.net.MessagingServiceMBeanImpl#reloadSslCertificates}}.
> This is unlikely to be an issue in practice as the legacy SSL internode 
> socket is only used in mixed version clusters with pre-4.0 nodes, so the cert 
> only needs to stay valid until all nodes upgrade to 4.x or above.
> One way to avoid this class of failures is to just check the entries present 
> in the SSLContext cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18816) Add support for repair coordinator to retry messages that timeout

2023-09-29 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770458#comment-17770458
 ] 

Jon Meredith commented on CASSANDRA-18816:
--

The bug in the script was my mistake. Thanks for catching it. 

I've searched JIRA and don't see any tickets for flaky tests filed against any 
of those issues except CASSANDRA-18733. Were you just mentioning the omission 
for future reference, or are you aware of any issues?

I'll rerun the repeated tests against CASSANDRA-18681, CASSANDRA-18725 and 
CASSANDRA-18733 for completeness.

> Add support for repair coordinator to retry messages that timeout
> -
>
> Key: CASSANDRA-18816
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18816
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 5.0-alpha2
>
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> Now that CASSANDRA-15399 is in, most of the repair messages have a state that 
> they can check against to make message delivery idempotent, allowing the 
> coordinator to retry such messages; a few of the most critical messages to 
> retry are: PREPARE_MSG, VALIDATION_REQ, VALIDATION_RSP, SYNC_REQ, and 
> SYNC_RSP.
> With this I propose making the coordinator able to retry these key messages 
> to try and make repair more resilient to ephemeral issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18811) Set right client auth for creating SSL context in mTLS optional mode

2023-09-25 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768910#comment-17768910
 ] 

Jon Meredith commented on CASSANDRA-18811:
--

Rebased after CASSANDRA-18681 and reran CI

CI Results (pending):
||Branch||Source||Circle CI||Jenkins||
|cassandra-5.0|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18811-cassandra-5.0-5FC0DA43-601D-43C5-AA55-8B64708CDDBB]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18811-cassandra-5.0-5FC0DA43-601D-43C5-AA55-8B64708CDDBB]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2603/]|
|trunk|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18811-trunk-5FC0DA43-601D-43C5-AA55-8B64708CDDBB]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18811-trunk-5FC0DA43-601D-43C5-AA55-8B64708CDDBB]|[build|unknown]|

> Set right client auth for creating SSL context in mTLS optional mode
> 
>
> Key: CASSANDRA-18811
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18811
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client, Messaging/Internode
>Reporter: Jyothsna Konisa
>Assignee: Jyothsna Konisa
>Priority: Normal
> Fix For: 4.1.x, 5.0-alpha, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Adding a new value `optional` for require_client_auth in Encryption options. 
> when require_client_auth is optional, the SSL context that is created will 
> allow client connections that provide a client certificate along with the 
> client connections that do not provide certificates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18811) Set right client auth for creating SSL context in mTLS optional mode

2023-09-25 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18811:
-
Reviewers: Andy Tolbert, Dinesh Joshi, Jon Meredith  (was: Jon Meredith)

> Set right client auth for creating SSL context in mTLS optional mode
> 
>
> Key: CASSANDRA-18811
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18811
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client, Messaging/Internode
>Reporter: Jyothsna Konisa
>Assignee: Jyothsna Konisa
>Priority: Normal
> Fix For: 4.1.x, 5.0-alpha, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Adding a new value `optional` for require_client_auth in Encryption options. 
> when require_client_auth is optional, the SSL context that is created will 
> allow client connections that provide a client certificate along with the 
> client connections that do not provide certificates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18681) Internode legacy SSL storage port certificate is not hot reloaded on update

2023-09-25 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18681:
-
  Fix Version/s: 4.1.4
 5.0-alpha2
  Since Version: 4.1.0
Source Control Link:  
https://github.com/apache/cassandra/commit/b9586501a6b6cdfe465302448018785652c9b966
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Also included a minor fix to examples/sslfactory/build.xml to resolve an error 
about duplicate logback libraries on the class path.

> Internode legacy SSL storage port certificate is not hot reloaded on update
> ---
>
> Key: CASSANDRA-18681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18681
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.1.4, 5.0-alpha2
>
>
> In CASSANDRA-1 the SSLContext cache was changed to clear individual 
> {{EncryptionOptions}} from the SslContext cache if they needed reloading to 
> reduce resource consumption. Before the change if ANY cert needed hot 
> reloading, the SSLContext cache would be cleared for ALL certs.
> If the legacy SSL storage port is configured, a new {{EncryptionOptions}} 
> object is created in {{org.apache.cassandra.net.InboundSockets#addBindings}} 
> just for binding the socket, but never gets cleared as the change in port 
> means it no longer matches the configuration retrieved from 
> {{DatabaseDescriptor}} in 
> {{org.apache.cassandra.net.MessagingServiceMBeanImpl#reloadSslCertificates}}.
> This is unlikely to be an issue in practice as the legacy SSL internode 
> socket is only used in mixed version clusters with pre-4.0 nodes, so the cert 
> only needs to stay valid until all nodes upgrade to 4.x or above.
> One way to avoid this class of failures is to just check the entries present 
> in the SSLContext cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18681) Internode legacy SSL storage port certificate is not hot reloaded on update

2023-09-25 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18681:
-
Status: Ready to Commit  (was: Changes Suggested)

> Internode legacy SSL storage port certificate is not hot reloaded on update
> ---
>
> Key: CASSANDRA-18681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18681
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> In CASSANDRA-1 the SSLContext cache was changed to clear individual 
> {{EncryptionOptions}} from the SslContext cache if they needed reloading to 
> reduce resource consumption. Before the change if ANY cert needed hot 
> reloading, the SSLContext cache would be cleared for ALL certs.
> If the legacy SSL storage port is configured, a new {{EncryptionOptions}} 
> object is created in {{org.apache.cassandra.net.InboundSockets#addBindings}} 
> just for binding the socket, but never gets cleared as the change in port 
> means it no longer matches the configuration retrieved from 
> {{DatabaseDescriptor}} in 
> {{org.apache.cassandra.net.MessagingServiceMBeanImpl#reloadSslCertificates}}.
> This is unlikely to be an issue in practice as the legacy SSL internode 
> socket is only used in mixed version clusters with pre-4.0 nodes, so the cert 
> only needs to stay valid until all nodes upgrade to 4.x or above.
> One way to avoid this class of failures is to just check the entries present 
> in the SSLContext cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18681) Internode legacy SSL storage port certificate is not hot reloaded on update

2023-09-21 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767747#comment-17767747
 ] 

Jon Meredith commented on CASSANDRA-18681:
--

Refactored to just explicitly add initialize the legacy ssl encryption options.

CI Results (pending):
||Branch||Source||Circle CI||Jenkins||
|cassandra-4.1|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18681-cassandra-4.1-B319E212-DEE9-4BD5-8FA1-CEB9D630C414]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18681-cassandra-4.1-B319E212-DEE9-4BD5-8FA1-CEB9D630C414]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2598/]|
|cassandra-5.0|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18681-cassandra-5.0-B319E212-DEE9-4BD5-8FA1-CEB9D630C414]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18681-cassandra-5.0-B319E212-DEE9-4BD5-8FA1-CEB9D630C414]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2599/]|
|trunk|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18681-trunk-B319E212-DEE9-4BD5-8FA1-CEB9D630C414]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18681-trunk-B319E212-DEE9-4BD5-8FA1-CEB9D630C414]|[build|unknown]|

> Internode legacy SSL storage port certificate is not hot reloaded on update
> ---
>
> Key: CASSANDRA-18681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18681
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> In CASSANDRA-1 the SSLContext cache was changed to clear individual 
> {{EncryptionOptions}} from the SslContext cache if they needed reloading to 
> reduce resource consumption. Before the change if ANY cert needed hot 
> reloading, the SSLContext cache would be cleared for ALL certs.
> If the legacy SSL storage port is configured, a new {{EncryptionOptions}} 
> object is created in {{org.apache.cassandra.net.InboundSockets#addBindings}} 
> just for binding the socket, but never gets cleared as the change in port 
> means it no longer matches the configuration retrieved from 
> {{DatabaseDescriptor}} in 
> {{org.apache.cassandra.net.MessagingServiceMBeanImpl#reloadSslCertificates}}.
> This is unlikely to be an issue in practice as the legacy SSL internode 
> socket is only used in mixed version clusters with pre-4.0 nodes, so the cert 
> only needs to stay valid until all nodes upgrade to 4.x or above.
> One way to avoid this class of failures is to just check the entries present 
> in the SSLContext cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18681) Internode legacy SSL storage port certificate is not hot reloaded on update

2023-09-20 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767219#comment-17767219
 ] 

Jon Meredith commented on CASSANDRA-18681:
--

I've remembered why I did it this way. The legacy ssl storage port encryption 
options are not registered for hot reloading, so you have to match invalidate 
if the original encryption options shouldReload returned true.


> Internode legacy SSL storage port certificate is not hot reloaded on update
> ---
>
> Key: CASSANDRA-18681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18681
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> In CASSANDRA-1 the SSLContext cache was changed to clear individual 
> {{EncryptionOptions}} from the SslContext cache if they needed reloading to 
> reduce resource consumption. Before the change if ANY cert needed hot 
> reloading, the SSLContext cache would be cleared for ALL certs.
> If the legacy SSL storage port is configured, a new {{EncryptionOptions}} 
> object is created in {{org.apache.cassandra.net.InboundSockets#addBindings}} 
> just for binding the socket, but never gets cleared as the change in port 
> means it no longer matches the configuration retrieved from 
> {{DatabaseDescriptor}} in 
> {{org.apache.cassandra.net.MessagingServiceMBeanImpl#reloadSslCertificates}}.
> This is unlikely to be an issue in practice as the legacy SSL internode 
> socket is only used in mixed version clusters with pre-4.0 nodes, so the cert 
> only needs to stay valid until all nodes upgrade to 4.x or above.
> One way to avoid this class of failures is to just check the entries present 
> in the SSLContext cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18681) Internode legacy SSL storage port certificate is not hot reloaded on update

2023-09-20 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18681:
-
Status: Changes Suggested  (was: Ready to Commit)

Going to rework a little, I don't like the different check between shouldReload 
and clearSslContext. It's ok for the default implementation, but may not be 
good for a custom SSLContextFactoryInstance.

> Internode legacy SSL storage port certificate is not hot reloaded on update
> ---
>
> Key: CASSANDRA-18681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18681
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> In CASSANDRA-1 the SSLContext cache was changed to clear individual 
> {{EncryptionOptions}} from the SslContext cache if they needed reloading to 
> reduce resource consumption. Before the change if ANY cert needed hot 
> reloading, the SSLContext cache would be cleared for ALL certs.
> If the legacy SSL storage port is configured, a new {{EncryptionOptions}} 
> object is created in {{org.apache.cassandra.net.InboundSockets#addBindings}} 
> just for binding the socket, but never gets cleared as the change in port 
> means it no longer matches the configuration retrieved from 
> {{DatabaseDescriptor}} in 
> {{org.apache.cassandra.net.MessagingServiceMBeanImpl#reloadSslCertificates}}.
> This is unlikely to be an issue in practice as the legacy SSL internode 
> socket is only used in mixed version clusters with pre-4.0 nodes, so the cert 
> only needs to stay valid until all nodes upgrade to 4.x or above.
> One way to avoid this class of failures is to just check the entries present 
> in the SSLContext cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18681) Internode legacy SSL storage port certificate is not hot reloaded on update

2023-09-20 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18681:
-
Status: Ready to Commit  (was: Review In Progress)

> Internode legacy SSL storage port certificate is not hot reloaded on update
> ---
>
> Key: CASSANDRA-18681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18681
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> In CASSANDRA-1 the SSLContext cache was changed to clear individual 
> {{EncryptionOptions}} from the SslContext cache if they needed reloading to 
> reduce resource consumption. Before the change if ANY cert needed hot 
> reloading, the SSLContext cache would be cleared for ALL certs.
> If the legacy SSL storage port is configured, a new {{EncryptionOptions}} 
> object is created in {{org.apache.cassandra.net.InboundSockets#addBindings}} 
> just for binding the socket, but never gets cleared as the change in port 
> means it no longer matches the configuration retrieved from 
> {{DatabaseDescriptor}} in 
> {{org.apache.cassandra.net.MessagingServiceMBeanImpl#reloadSslCertificates}}.
> This is unlikely to be an issue in practice as the legacy SSL internode 
> socket is only used in mixed version clusters with pre-4.0 nodes, so the cert 
> only needs to stay valid until all nodes upgrade to 4.x or above.
> One way to avoid this class of failures is to just check the entries present 
> in the SSLContext cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18681) Internode legacy SSL storage port certificate is not hot reloaded on update

2023-09-20 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18681:
-
Reviewers: Dinesh Joshi, Jon Meredith
   Status: Review In Progress  (was: Patch Available)

> Internode legacy SSL storage port certificate is not hot reloaded on update
> ---
>
> Key: CASSANDRA-18681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18681
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> In CASSANDRA-1 the SSLContext cache was changed to clear individual 
> {{EncryptionOptions}} from the SslContext cache if they needed reloading to 
> reduce resource consumption. Before the change if ANY cert needed hot 
> reloading, the SSLContext cache would be cleared for ALL certs.
> If the legacy SSL storage port is configured, a new {{EncryptionOptions}} 
> object is created in {{org.apache.cassandra.net.InboundSockets#addBindings}} 
> just for binding the socket, but never gets cleared as the change in port 
> means it no longer matches the configuration retrieved from 
> {{DatabaseDescriptor}} in 
> {{org.apache.cassandra.net.MessagingServiceMBeanImpl#reloadSslCertificates}}.
> This is unlikely to be an issue in practice as the legacy SSL internode 
> socket is only used in mixed version clusters with pre-4.0 nodes, so the cert 
> only needs to stay valid until all nodes upgrade to 4.x or above.
> One way to avoid this class of failures is to just check the entries present 
> in the SSLContext cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18681) Internode legacy SSL storage port certificate is not hot reloaded on update

2023-09-20 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18681:
-
Reviewers: Dinesh Joshi  (was: Dinesh Joshi, Jon Meredith)

> Internode legacy SSL storage port certificate is not hot reloaded on update
> ---
>
> Key: CASSANDRA-18681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18681
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> In CASSANDRA-1 the SSLContext cache was changed to clear individual 
> {{EncryptionOptions}} from the SslContext cache if they needed reloading to 
> reduce resource consumption. Before the change if ANY cert needed hot 
> reloading, the SSLContext cache would be cleared for ALL certs.
> If the legacy SSL storage port is configured, a new {{EncryptionOptions}} 
> object is created in {{org.apache.cassandra.net.InboundSockets#addBindings}} 
> just for binding the socket, but never gets cleared as the change in port 
> means it no longer matches the configuration retrieved from 
> {{DatabaseDescriptor}} in 
> {{org.apache.cassandra.net.MessagingServiceMBeanImpl#reloadSslCertificates}}.
> This is unlikely to be an issue in practice as the legacy SSL internode 
> socket is only used in mixed version clusters with pre-4.0 nodes, so the cert 
> only needs to stay valid until all nodes upgrade to 4.x or above.
> One way to avoid this class of failures is to just check the entries present 
> in the SSLContext cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18360) Test Failure: o.a.c.cql3.validation.operations.AlterTest#testDropListAndAddListWithSameName

2023-09-19 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766932#comment-17766932
 ] 

Jon Meredith commented on CASSANDRA-18360:
--

No progress on this yet. I'm planning to try and track down further cases of 
negative memory freed to follow on to CASSANDRA-18125 so we will see if that 
resolves this. Though I have no explanation other than timing for why that 
patch caused this flake.

> Test Failure: 
> o.a.c.cql3.validation.operations.AlterTest#testDropListAndAddListWithSameName
> ---
>
> Key: CASSANDRA-18360
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18360
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Andres de la Peña
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 5.x
>
>
> The unit test 
> {{org.apache.cassandra.cql3.validation.operations.AlterTest#testDropListAndAddListWithSameName}}
>  is flaky at least on trunk. Flakiness seems lesser than 1%.
> While I haven't seen it on Jenkins yet, it can easily be reproduced on 
> CircleCI with the multiplexer:
> * 
> https://app.circleci.com/pipelines/github/adelapena/cassandra/2731/workflows/58edc2f6-9a21-4d09-b783-b7fb15e1b320/jobs/32235
> * 
> https://app.circleci.com/pipelines/github/adelapena/cassandra/2731/workflows/58edc2f6-9a21-4d09-b783-b7fb15e1b320/jobs/32234
> * 
> https://app.circleci.com/pipelines/github/adelapena/cassandra/2731/workflows/739a95d3-8e42-4447-93dd-122fc16fdd7d/jobs/32233/tests
> Those runs show two types of errors:
> {code}
> junit.framework.AssertionFailedError: Invalid value for row 0 column 1 
> (content of type text), expected  but got 
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1506)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:102)
> {code}
> and: 
> {code}
> org.apache.cassandra.serializers.MarshalException: Invalid UTF-8 bytes 
> 00e0279515437f00
>   at 
> org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(AbstractTextSerializer.java:46)
>   at 
> org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(AbstractTextSerializer.java:29)
>   at 
> org.apache.cassandra.serializers.TypeSerializer.deserialize(TypeSerializer.java:37)
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1494)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:102)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}
> The CircleCI config I used to reproduce the test failure can be generated 
> with:
> {code}
> .circleci/generate.sh -p \
>   -e REPEATED_UTESTS_COUNT=500 \
>   -e REPEATED_UTESTS=org.apache.cassandra.cql3.validation.operations.AlterTest
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18681) Internode legacy SSL storage port certificate is not hot reloaded on update

2023-09-15 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765882#comment-17765882
 ] 

Jon Meredith commented on CASSANDRA-18681:
--

4.1 [Branch|https://github.com/jonmeredith/cassandra/tree/C18681-4.1] 
[PR|https://github.com/apache/cassandra/pull/2693]
5.0 [Branch|https://github.com/jonmeredith/cassandra/tree/C18681-5.0] 
[PR|https://github.com/apache/cassandra/pull/2694]
Trunk [Branch|https://github.com/jonmeredith/cassandra/tree/C18681-trunk] 
[PR|https://github.com/apache/cassandra/pull/2695]

CI Results (pending):
||Branch||Source||Circle CI||Jenkins||
|cassandra-4.1|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18681-cassandra-4.1-27E812B5-58D5-44D7-8C5E-3B0D3AA5F767]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18681-cassandra-4.1-27E812B5-58D5-44D7-8C5E-3B0D3AA5F767]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2595/]|
|cassandra-5.0|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18681-cassandra-5.0-27E812B5-58D5-44D7-8C5E-3B0D3AA5F767]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18681-cassandra-5.0-27E812B5-58D5-44D7-8C5E-3B0D3AA5F767]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2596/]|
|trunk|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18681-trunk-27E812B5-58D5-44D7-8C5E-3B0D3AA5F767]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18681-trunk-27E812B5-58D5-44D7-8C5E-3B0D3AA5F767]|[build|unknown]|


> Internode legacy SSL storage port certificate is not hot reloaded on update
> ---
>
> Key: CASSANDRA-18681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18681
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> In CASSANDRA-1 the SSLContext cache was changed to clear individual 
> {{EncryptionOptions}} from the SslContext cache if they needed reloading to 
> reduce resource consumption. Before the change if ANY cert needed hot 
> reloading, the SSLContext cache would be cleared for ALL certs.
> If the legacy SSL storage port is configured, a new {{EncryptionOptions}} 
> object is created in {{org.apache.cassandra.net.InboundSockets#addBindings}} 
> just for binding the socket, but never gets cleared as the change in port 
> means it no longer matches the configuration retrieved from 
> {{DatabaseDescriptor}} in 
> {{org.apache.cassandra.net.MessagingServiceMBeanImpl#reloadSslCertificates}}.
> This is unlikely to be an issue in practice as the legacy SSL internode 
> socket is only used in mixed version clusters with pre-4.0 nodes, so the cert 
> only needs to stay valid until all nodes upgrade to 4.x or above.
> One way to avoid this class of failures is to just check the entries present 
> in the SSLContext cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18681) Internode legacy SSL storage port certificate is not hot reloaded on update

2023-09-15 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18681:
-
Test and Documentation Plan: updated tests
 Status: Patch Available  (was: Open)

> Internode legacy SSL storage port certificate is not hot reloaded on update
> ---
>
> Key: CASSANDRA-18681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18681
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> In CASSANDRA-1 the SSLContext cache was changed to clear individual 
> {{EncryptionOptions}} from the SslContext cache if they needed reloading to 
> reduce resource consumption. Before the change if ANY cert needed hot 
> reloading, the SSLContext cache would be cleared for ALL certs.
> If the legacy SSL storage port is configured, a new {{EncryptionOptions}} 
> object is created in {{org.apache.cassandra.net.InboundSockets#addBindings}} 
> just for binding the socket, but never gets cleared as the change in port 
> means it no longer matches the configuration retrieved from 
> {{DatabaseDescriptor}} in 
> {{org.apache.cassandra.net.MessagingServiceMBeanImpl#reloadSslCertificates}}.
> This is unlikely to be an issue in practice as the legacy SSL internode 
> socket is only used in mixed version clusters with pre-4.0 nodes, so the cert 
> only needs to stay valid until all nodes upgrade to 4.x or above.
> One way to avoid this class of failures is to just check the entries present 
> in the SSLContext cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-18681) Internode legacy SSL storage port certificate is not hot reloaded on update

2023-09-15 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith reassigned CASSANDRA-18681:


Assignee: Jon Meredith

> Internode legacy SSL storage port certificate is not hot reloaded on update
> ---
>
> Key: CASSANDRA-18681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18681
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> In CASSANDRA-1 the SSLContext cache was changed to clear individual 
> {{EncryptionOptions}} from the SslContext cache if they needed reloading to 
> reduce resource consumption. Before the change if ANY cert needed hot 
> reloading, the SSLContext cache would be cleared for ALL certs.
> If the legacy SSL storage port is configured, a new {{EncryptionOptions}} 
> object is created in {{org.apache.cassandra.net.InboundSockets#addBindings}} 
> just for binding the socket, but never gets cleared as the change in port 
> means it no longer matches the configuration retrieved from 
> {{DatabaseDescriptor}} in 
> {{org.apache.cassandra.net.MessagingServiceMBeanImpl#reloadSslCertificates}}.
> This is unlikely to be an issue in practice as the legacy SSL internode 
> socket is only used in mixed version clusters with pre-4.0 nodes, so the cert 
> only needs to stay valid until all nodes upgrade to 4.x or above.
> One way to avoid this class of failures is to just check the entries present 
> in the SSLContext cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18841) InstanceClassLoader leak in 5.0/trunk

2023-09-15 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18841:
-
  Fix Version/s: 4.1.4
 5.0-alpha2
 5.1
 (was: 5.x)
 (was: 5.0.x)
  Since Version: 4.1-alpha1
Source Control Link: 
https://github.com/apache/cassandra/commit/8bfe0e5878c64ed25591aae50643187bc8ab7241
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> InstanceClassLoader leak in 5.0/trunk
> -
>
> Key: CASSANDRA-18841
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18841
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
> Fix For: 4.1.4, 5.0-alpha2, 5.1
>
> Attachments: trunk_ThreadLocal_leak.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Something in the 5.0/trunk branches has caused an in-jvm dtest 
> InstanceClassLoader leak - it appears to have something to do with the Mutual 
> TLS Authenticator (f078c02cb58bddd735490b07548f7352f0eb09aa) but nothing in 
> that commit, so far, has stood out as causing issues.
> The culprit class appears to be 
> {{io.netty.util.internal.InternalThreadLocalMap}}, which seems to no be 
> removed when the threads stops for some reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18841) InstanceClassLoader leak in 5.0/trunk

2023-09-15 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765836#comment-17765836
 ] 

Jon Meredith commented on CASSANDRA-18841:
--

Thanks for the review, and the extra info. I've gone ahead and merged to make 
testing the sidecar easier, we can follow up with a new JIRA for the other 
items once investigated.

> InstanceClassLoader leak in 5.0/trunk
> -
>
> Key: CASSANDRA-18841
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18841
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
> Attachments: trunk_ThreadLocal_leak.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Something in the 5.0/trunk branches has caused an in-jvm dtest 
> InstanceClassLoader leak - it appears to have something to do with the Mutual 
> TLS Authenticator (f078c02cb58bddd735490b07548f7352f0eb09aa) but nothing in 
> that commit, so far, has stood out as causing issues.
> The culprit class appears to be 
> {{io.netty.util.internal.InternalThreadLocalMap}}, which seems to no be 
> removed when the threads stops for some reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18841) InstanceClassLoader leak in 5.0/trunk

2023-09-15 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18841:
-
Status: Ready to Commit  (was: Review In Progress)

> InstanceClassLoader leak in 5.0/trunk
> -
>
> Key: CASSANDRA-18841
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18841
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
> Attachments: trunk_ThreadLocal_leak.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Something in the 5.0/trunk branches has caused an in-jvm dtest 
> InstanceClassLoader leak - it appears to have something to do with the Mutual 
> TLS Authenticator (f078c02cb58bddd735490b07548f7352f0eb09aa) but nothing in 
> that commit, so far, has stood out as causing issues.
> The culprit class appears to be 
> {{io.netty.util.internal.InternalThreadLocalMap}}, which seems to no be 
> removed when the threads stops for some reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18841) InstanceClassLoader leak in 5.0/trunk

2023-09-14 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765364#comment-17765364
 ] 

Jon Meredith edited comment on CASSANDRA-18841 at 9/14/23 9:14 PM:
---

CI Results (pending):
||Branch||Source||Circle CI||Jenkins||
|cassandra-4.1|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18841-cassandra-4.1-CC040354-8525-4D6E-B4FF-002AF85C683A]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18841-cassandra-4.1-CC040354-8525-4D6E-B4FF-002AF85C683A]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2592/]|
|cassandra-5.0|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18841-cassandra-5.0-CC040354-8525-4D6E-B4FF-002AF85C683A]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18841-cassandra-5.0-CC040354-8525-4D6E-B4FF-002AF85C683A]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2593/]|
|trunk|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18841-trunk-CC040354-8525-4D6E-B4FF-002AF85C683A]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18841-trunk-CC040354-8525-4D6E-B4FF-002AF85C683A]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2594/]|


was (Author: jonmeredith):
Starting commit

CI Results (pending):
||Branch||Source||Circle CI||Jenkins||
|cassandra-4.1|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18841-cassandra-4.1-CC040354-8525-4D6E-B4FF-002AF85C683A]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18841-cassandra-4.1-CC040354-8525-4D6E-B4FF-002AF85C683A]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2592/]|
|cassandra-5.0|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18841-cassandra-5.0-CC040354-8525-4D6E-B4FF-002AF85C683A]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18841-cassandra-5.0-CC040354-8525-4D6E-B4FF-002AF85C683A]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2593/]|
|trunk|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18841-trunk-CC040354-8525-4D6E-B4FF-002AF85C683A]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18841-trunk-CC040354-8525-4D6E-B4FF-002AF85C683A]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2594/]|

> InstanceClassLoader leak in 5.0/trunk
> -
>
> Key: CASSANDRA-18841
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18841
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
> Attachments: trunk_ThreadLocal_leak.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Something in the 5.0/trunk branches has caused an in-jvm dtest 
> InstanceClassLoader leak - it appears to have something to do with the Mutual 
> TLS Authenticator (f078c02cb58bddd735490b07548f7352f0eb09aa) but nothing in 
> that commit, so far, has stood out as causing issues.
> The culprit class appears to be 
> {{io.netty.util.internal.InternalThreadLocalMap}}, which seems to no be 
> removed when the threads stops for some reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18841) InstanceClassLoader leak in 5.0/trunk

2023-09-14 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765364#comment-17765364
 ] 

Jon Meredith commented on CASSANDRA-18841:
--

Starting commit

CI Results (pending):
||Branch||Source||Circle CI||Jenkins||
|cassandra-4.1|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18841-cassandra-4.1-CC040354-8525-4D6E-B4FF-002AF85C683A]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18841-cassandra-4.1-CC040354-8525-4D6E-B4FF-002AF85C683A]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2592/]|
|cassandra-5.0|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18841-cassandra-5.0-CC040354-8525-4D6E-B4FF-002AF85C683A]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18841-cassandra-5.0-CC040354-8525-4D6E-B4FF-002AF85C683A]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2593/]|
|trunk|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18841-trunk-CC040354-8525-4D6E-B4FF-002AF85C683A]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18841-trunk-CC040354-8525-4D6E-B4FF-002AF85C683A]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2594/]|

> InstanceClassLoader leak in 5.0/trunk
> -
>
> Key: CASSANDRA-18841
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18841
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
> Attachments: trunk_ThreadLocal_leak.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Something in the 5.0/trunk branches has caused an in-jvm dtest 
> InstanceClassLoader leak - it appears to have something to do with the Mutual 
> TLS Authenticator (f078c02cb58bddd735490b07548f7352f0eb09aa) but nothing in 
> that commit, so far, has stood out as causing issues.
> The culprit class appears to be 
> {{io.netty.util.internal.InternalThreadLocalMap}}, which seems to no be 
> removed when the threads stops for some reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18841) InstanceClassLoader leak in 5.0/trunk

2023-09-14 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765356#comment-17765356
 ] 

Jon Meredith commented on CASSANDRA-18841:
--

+1 from me, [~smiklosovic] do you have time to review this small PR as it's 
just a cleanup found after CASSANDRA-18725 that you helped review. I'll start a 
CI run.

> InstanceClassLoader leak in 5.0/trunk
> -
>
> Key: CASSANDRA-18841
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18841
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
> Attachments: trunk_ThreadLocal_leak.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Something in the 5.0/trunk branches has caused an in-jvm dtest 
> InstanceClassLoader leak - it appears to have something to do with the Mutual 
> TLS Authenticator (f078c02cb58bddd735490b07548f7352f0eb09aa) but nothing in 
> that commit, so far, has stood out as causing issues.
> The culprit class appears to be 
> {{io.netty.util.internal.InternalThreadLocalMap}}, which seems to no be 
> removed when the threads stops for some reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18841) InstanceClassLoader leak in 5.0/trunk

2023-09-14 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18841:
-
Reviewers: Francisco Guerrero, Jon Meredith  (was: Francisco Guerrero)

> InstanceClassLoader leak in 5.0/trunk
> -
>
> Key: CASSANDRA-18841
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18841
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
> Attachments: trunk_ThreadLocal_leak.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Something in the 5.0/trunk branches has caused an in-jvm dtest 
> InstanceClassLoader leak - it appears to have something to do with the Mutual 
> TLS Authenticator (f078c02cb58bddd735490b07548f7352f0eb09aa) but nothing in 
> that commit, so far, has stood out as causing issues.
> The culprit class appears to be 
> {{io.netty.util.internal.InternalThreadLocalMap}}, which seems to no be 
> removed when the threads stops for some reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-18360) Test Failure: o.a.c.cql3.validation.operations.AlterTest#testDropListAndAddListWithSameName

2023-09-11 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith reassigned CASSANDRA-18360:


Assignee: Jon Meredith

> Test Failure: 
> o.a.c.cql3.validation.operations.AlterTest#testDropListAndAddListWithSameName
> ---
>
> Key: CASSANDRA-18360
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18360
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Andres de la Peña
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 5.x
>
>
> The unit test 
> {{org.apache.cassandra.cql3.validation.operations.AlterTest#testDropListAndAddListWithSameName}}
>  is flaky at least on trunk. Flakiness seems lesser than 1%.
> While I haven't seen it on Jenkins yet, it can easily be reproduced on 
> CircleCI with the multiplexer:
> * 
> https://app.circleci.com/pipelines/github/adelapena/cassandra/2731/workflows/58edc2f6-9a21-4d09-b783-b7fb15e1b320/jobs/32235
> * 
> https://app.circleci.com/pipelines/github/adelapena/cassandra/2731/workflows/58edc2f6-9a21-4d09-b783-b7fb15e1b320/jobs/32234
> * 
> https://app.circleci.com/pipelines/github/adelapena/cassandra/2731/workflows/739a95d3-8e42-4447-93dd-122fc16fdd7d/jobs/32233/tests
> Those runs show two types of errors:
> {code}
> junit.framework.AssertionFailedError: Invalid value for row 0 column 1 
> (content of type text), expected  but got 
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1506)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:102)
> {code}
> and: 
> {code}
> org.apache.cassandra.serializers.MarshalException: Invalid UTF-8 bytes 
> 00e0279515437f00
>   at 
> org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(AbstractTextSerializer.java:46)
>   at 
> org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(AbstractTextSerializer.java:29)
>   at 
> org.apache.cassandra.serializers.TypeSerializer.deserialize(TypeSerializer.java:37)
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1494)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:102)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}
> The CircleCI config I used to reproduce the test failure can be generated 
> with:
> {code}
> .circleci/generate.sh -p \
>   -e REPEATED_UTESTS_COUNT=500 \
>   -e REPEATED_UTESTS=org.apache.cassandra.cql3.validation.operations.AlterTest
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18360) Test Failure: o.a.c.cql3.validation.operations.AlterTest#testDropListAndAddListWithSameName

2023-09-08 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763273#comment-17763273
 ] 

Jon Meredith commented on CASSANDRA-18360:
--

I've reproduced Andres original runs on CircleCI and agree the merge commit is 
where the issue is introduced. I tried applying the 4.1 change before the 
memory tracking improvements in CASSANDRA-17240 and still hit failures with the 
repeat test, so I don't think there's an interaction there. I'll keep looking.


> Test Failure: 
> o.a.c.cql3.validation.operations.AlterTest#testDropListAndAddListWithSameName
> ---
>
> Key: CASSANDRA-18360
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18360
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Andres de la Peña
>Priority: Normal
> Fix For: 5.x
>
>
> The unit test 
> {{org.apache.cassandra.cql3.validation.operations.AlterTest#testDropListAndAddListWithSameName}}
>  is flaky at least on trunk. Flakiness seems lesser than 1%.
> While I haven't seen it on Jenkins yet, it can easily be reproduced on 
> CircleCI with the multiplexer:
> * 
> https://app.circleci.com/pipelines/github/adelapena/cassandra/2731/workflows/58edc2f6-9a21-4d09-b783-b7fb15e1b320/jobs/32235
> * 
> https://app.circleci.com/pipelines/github/adelapena/cassandra/2731/workflows/58edc2f6-9a21-4d09-b783-b7fb15e1b320/jobs/32234
> * 
> https://app.circleci.com/pipelines/github/adelapena/cassandra/2731/workflows/739a95d3-8e42-4447-93dd-122fc16fdd7d/jobs/32233/tests
> Those runs show two types of errors:
> {code}
> junit.framework.AssertionFailedError: Invalid value for row 0 column 1 
> (content of type text), expected  but got 
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1506)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:102)
> {code}
> and: 
> {code}
> org.apache.cassandra.serializers.MarshalException: Invalid UTF-8 bytes 
> 00e0279515437f00
>   at 
> org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(AbstractTextSerializer.java:46)
>   at 
> org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(AbstractTextSerializer.java:29)
>   at 
> org.apache.cassandra.serializers.TypeSerializer.deserialize(TypeSerializer.java:37)
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1494)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:102)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}
> The CircleCI config I used to reproduce the test failure can be generated 
> with:
> {code}
> .circleci/generate.sh -p \
>   -e REPEATED_UTESTS_COUNT=500 \
>   -e REPEATED_UTESTS=org.apache.cassandra.cql3.validation.operations.AlterTest
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18360) Test Failure: o.a.c.cql3.validation.operations.AlterTest#testDropListAndAddListWithSameName

2023-09-06 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762526#comment-17762526
 ] 

Jon Meredith commented on CASSANDRA-18360:
--

I checked over the merge quickly and didn't see any issues. I'll try to 
investigate more tomorrow. 

> Test Failure: 
> o.a.c.cql3.validation.operations.AlterTest#testDropListAndAddListWithSameName
> ---
>
> Key: CASSANDRA-18360
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18360
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Andres de la Peña
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 5.x
>
>
> The unit test 
> {{org.apache.cassandra.cql3.validation.operations.AlterTest#testDropListAndAddListWithSameName}}
>  is flaky at least on trunk. Flakiness seems lesser than 1%.
> While I haven't seen it on Jenkins yet, it can easily be reproduced on 
> CircleCI with the multiplexer:
> * 
> https://app.circleci.com/pipelines/github/adelapena/cassandra/2731/workflows/58edc2f6-9a21-4d09-b783-b7fb15e1b320/jobs/32235
> * 
> https://app.circleci.com/pipelines/github/adelapena/cassandra/2731/workflows/58edc2f6-9a21-4d09-b783-b7fb15e1b320/jobs/32234
> * 
> https://app.circleci.com/pipelines/github/adelapena/cassandra/2731/workflows/739a95d3-8e42-4447-93dd-122fc16fdd7d/jobs/32233/tests
> Those runs show two types of errors:
> {code}
> junit.framework.AssertionFailedError: Invalid value for row 0 column 1 
> (content of type text), expected  but got 
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1506)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:102)
> {code}
> and: 
> {code}
> org.apache.cassandra.serializers.MarshalException: Invalid UTF-8 bytes 
> 00e0279515437f00
>   at 
> org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(AbstractTextSerializer.java:46)
>   at 
> org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(AbstractTextSerializer.java:29)
>   at 
> org.apache.cassandra.serializers.TypeSerializer.deserialize(TypeSerializer.java:37)
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1494)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:102)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}
> The CircleCI config I used to reproduce the test failure can be generated 
> with:
> {code}
> .circleci/generate.sh -p \
>   -e REPEATED_UTESTS_COUNT=500 \
>   -e REPEATED_UTESTS=org.apache.cassandra.cql3.validation.operations.AlterTest
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18815) Fix dtests: replace_address_test.TestReplaceAddress.test_restart_failed_replace and others

2023-09-01 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761407#comment-17761407
 ] 

Jon Meredith commented on CASSANDRA-18815:
--

The difference in behavior is interesting and I don't have a good explanation 
for. The clusters we saw the deadlock on were running 4.1 and the changes in 
18733 could definitely affect the order threads see the message. 

If the test is causing the log error, then adding a suppression for it seems 
like the way to go.

> Fix dtests: 
> replace_address_test.TestReplaceAddress.test_restart_failed_replace and others
> --
>
> Key: CASSANDRA-18815
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18815
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> https://ci-cassandra.apache.org/job/Cassandra-5.0/18/testReport/dtest-large.replace_address_test/TestReplaceAddress/test_restart_failed_replace/
> This and other similar tests recently started failing:
> {noformat}
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [[replacement] 'ERROR [Stream-Deserializer-/127.0.0.1:7000-91782e47] 
> 2023-08-29 23:05:51,677 StreamSession.java:700 - [Stream 
> #990152d0-46c0-11ee-9290-158c46e94542] Socket closed before session 
> completion, peer 127.0.0.1:7000 is probably 
> down.\njava.nio.channels.ClosedChannelException: null\n\tat 
> org.apache.cassandra.net.AsyncStreamingInputPlus.reBuffer(AsyncStreamingInputPlus.java:119)\n\tat
>  
> org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:178)\n\tat
>  
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49)\n\tat
>  
> org.apache.cassandra.streaming.StreamDeserializingTask.run(StreamDeserializingTask.java:59)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:833)']
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18811) Set right client auth for creating SSL context in mTLS optional mode

2023-08-31 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18811:
-
Test and Documentation Plan: Tests updated
 Status: Patch Available  (was: In Progress)

> Set right client auth for creating SSL context in mTLS optional mode
> 
>
> Key: CASSANDRA-18811
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18811
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client, Messaging/Internode
>Reporter: Jyothsna Konisa
>Assignee: Jyothsna Konisa
>Priority: Normal
> Fix For: 4.1.x, 5.0-alpha, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Adding a new value `optional` for require_client_auth in Encryption options. 
> when require_client_auth is optional, the SSL context that is created will 
> allow client connections that provide a client certificate along with the 
> client connections that do not provide certificates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18811) Set right client auth for creating SSL context in mTLS optional mode

2023-08-31 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18811:
-
Reviewers: Jon Meredith, Jon Meredith
   Jon Meredith, Jon Meredith  (was: Jon Meredith)
   Status: Review In Progress  (was: Patch Available)

> Set right client auth for creating SSL context in mTLS optional mode
> 
>
> Key: CASSANDRA-18811
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18811
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client, Messaging/Internode
>Reporter: Jyothsna Konisa
>Assignee: Jyothsna Konisa
>Priority: Normal
> Fix For: 4.1.x, 5.0-alpha, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Adding a new value `optional` for require_client_auth in Encryption options. 
> when require_client_auth is optional, the SSL context that is created will 
> allow client connections that provide a client certificate along with the 
> client connections that do not provide certificates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18811) Set right client auth for creating SSL context in mTLS optional mode

2023-08-31 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18811:
-
 Bug Category: Parent values: Availability(12983)
   Complexity: Normal
  Component/s: Messaging/Client
   Messaging/Internode
Discovered By: User Report
Fix Version/s: 4.1.x
   5.0-alpha
   5.x
 Severity: Low
   Status: Open  (was: Triage Needed)

> Set right client auth for creating SSL context in mTLS optional mode
> 
>
> Key: CASSANDRA-18811
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18811
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client, Messaging/Internode
>Reporter: Jyothsna Konisa
>Assignee: Jyothsna Konisa
>Priority: Normal
> Fix For: 4.1.x, 5.0-alpha, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Adding a new value `optional` for require_client_auth in Encryption options. 
> when require_client_auth is optional, the SSL context that is created will 
> allow client connections that provide a client certificate along with the 
> client connections that do not provide certificates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18722) Support Dynamic Port Allocation for in-jvm dtest framework

2023-08-29 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18722:
-
Fix Version/s: 4.0.12
   4.1.4
   5.0
   5.x
   (was: 4.0.x)
   (was: 4.1.x)
   (was: 5.0.x)

> Support Dynamic Port Allocation for in-jvm dtest framework
> --
>
> Key: CASSANDRA-18722
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18722
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Francisco Guerrero
>Assignee: Francisco Guerrero
>Priority: Normal
> Fix For: 4.0.12, 4.1.4, 5.0, 5.x
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently, {{INodeProvisionStrategy}} supports two strategies 
> {{OneNetworkInterface}} and {{MultipleNetworkInterfaces}}. However the 
> {{seedPort}}, {{storagePorts}}, {{nativeTransportPorts}}, and {{jmxPorts}} 
> are always fixed or a function of the node number.
> In order to better support parallel test runs, we need to support dynamic 
> port allocation for the {{seedPort}}, {{storagePorts}}, 
> {{nativeTransportPorts}}, and {{jmxPorts}}. This would enable us to more 
> easily write tests that can run in parallel. This effort is only a stepping 
> stone in what's required to run more tests in parallel, but it allows us to 
> begin somewhere with the in-jvm dtest framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18733) Waiting indefinitely on ReceivedMessage response in StreamSession#receive() can cause deadlock

2023-08-29 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18733:
-
  Fix Version/s: 4.1.4
 5.0
 (was: 4.1.x)
 (was: 5.0.x)
  Since Version: 4.1.0
Source Control Link:  
https://github.com/apache/cassandra/commit/bde4fa0013eb8cec5b1d88b21ca4463bc07272bb
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Waiting indefinitely on ReceivedMessage response in StreamSession#receive() 
> can cause deadlock
> --
>
> Key: CASSANDRA-18733
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18733
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming
>Reporter: Caleb Rackliffe
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.1.4, 5.0, 5.x
>
>
> I've observed in a recent stack trace from a node running 4.1 what looks like 
> a deadlock around the {{StreamSession}} monitor lock when 
> {{StreamSession#receive()}} waits via {{syncUninteruptibly()}} for a response 
> to a control message.
> {noformat}
> "Messaging-EventLoop-3-10" #320 daemon prio=5 os_prio=0 cpu=57979617.98ms 
> elapsed=5587916.03s tid=0x7f056e88ae00 nid=0x80ec waiting for monitor 
> entry  [0x7f056d277000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:524)
> - waiting to lock <0x0006816fae70> (a 
> org.apache.cassandra.streaming.StreamSession)
> at 
> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:690)
> at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel.onMessageComplete(StreamingMultiplexedChannel.java:264)
> at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel.lambda$sendMessage$1(StreamingMultiplexedChannel.java:233)
> at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel$$Lambda$2029/0x0008007a0c40.operationComplete(Unknown
>  Source)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notifyListener(ListenerList.java:134)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notifyListener(ListenerList.java:148)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList$GenericFutureListenerList.notifySelf(ListenerList.java:190)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.lambda$notifyExclusive$0(ListenerList.java:124)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList$$Lambda$950/0x000800666040.accept(Unknown
>  Source)
> at 
> org.apache.cassandra.utils.concurrent.IntrusiveStack.forEach(IntrusiveStack.java:195)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notifyExclusive(ListenerList.java:124)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notify(ListenerList.java:96)
> at 
> org.apache.cassandra.utils.concurrent.AsyncFuture.trySet(AsyncFuture.java:104)
> at 
> org.apache.cassandra.utils.concurrent.AbstractFuture.tryFailure(AbstractFuture.java:148)
> at 
> org.apache.cassandra.utils.concurrent.AsyncPromise.tryFailure(AsyncPromise.java:139)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:1009)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:870)
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
> at 
> io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
> at 
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(java.base@11.0.16/Thread.java:829)
> {noformat}
> It seems that while {{receive()} is holding the monitor lock on 
> 

[jira] [Updated] (CASSANDRA-18733) Waiting indefinitely on ReceivedMessage response in StreamSession#receive() can cause deadlock

2023-08-28 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18733:
-
Status: Ready to Commit  (was: Review In Progress)

Starting commit

CI Results (pending):
||Branch||Source||Circle CI||Jenkins||
|cassandra-4.1|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18733-cassandra-4.1-0B9591D6-E2C9-4E53-83DF-85E573176C6E]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18733-cassandra-4.1-0B9591D6-E2C9-4E53-83DF-85E573176C6E]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2586/]|
|cassandra-5.0|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18733-cassandra-5.0-0B9591D6-E2C9-4E53-83DF-85E573176C6E]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18733-cassandra-5.0-0B9591D6-E2C9-4E53-83DF-85E573176C6E]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2587/]|
|trunk|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18733-trunk-0B9591D6-E2C9-4E53-83DF-85E573176C6E]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18733-trunk-0B9591D6-E2C9-4E53-83DF-85E573176C6E]|[build|unknown]|

> Waiting indefinitely on ReceivedMessage response in StreamSession#receive() 
> can cause deadlock
> --
>
> Key: CASSANDRA-18733
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18733
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming
>Reporter: Caleb Rackliffe
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>
> I've observed in a recent stack trace from a node running 4.1 what looks like 
> a deadlock around the {{StreamSession}} monitor lock when 
> {{StreamSession#receive()}} waits via {{syncUninteruptibly()}} for a response 
> to a control message.
> {noformat}
> "Messaging-EventLoop-3-10" #320 daemon prio=5 os_prio=0 cpu=57979617.98ms 
> elapsed=5587916.03s tid=0x7f056e88ae00 nid=0x80ec waiting for monitor 
> entry  [0x7f056d277000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:524)
> - waiting to lock <0x0006816fae70> (a 
> org.apache.cassandra.streaming.StreamSession)
> at 
> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:690)
> at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel.onMessageComplete(StreamingMultiplexedChannel.java:264)
> at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel.lambda$sendMessage$1(StreamingMultiplexedChannel.java:233)
> at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel$$Lambda$2029/0x0008007a0c40.operationComplete(Unknown
>  Source)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notifyListener(ListenerList.java:134)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notifyListener(ListenerList.java:148)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList$GenericFutureListenerList.notifySelf(ListenerList.java:190)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.lambda$notifyExclusive$0(ListenerList.java:124)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList$$Lambda$950/0x000800666040.accept(Unknown
>  Source)
> at 
> org.apache.cassandra.utils.concurrent.IntrusiveStack.forEach(IntrusiveStack.java:195)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notifyExclusive(ListenerList.java:124)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notify(ListenerList.java:96)
> at 
> org.apache.cassandra.utils.concurrent.AsyncFuture.trySet(AsyncFuture.java:104)
> at 
> org.apache.cassandra.utils.concurrent.AbstractFuture.tryFailure(AbstractFuture.java:148)
> at 
> org.apache.cassandra.utils.concurrent.AsyncPromise.tryFailure(AsyncPromise.java:139)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:1009)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:870)
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
> at 
> 

[jira] [Updated] (CASSANDRA-18722) Support Dynamic Port Allocation for in-jvm dtest framework

2023-08-28 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18722:
-
Source Control Link: 
https://github.com/apache/cassandra/commit/6ffa43f68b8d10ca84d4a00bf81269527b4e14df
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Support Dynamic Port Allocation for in-jvm dtest framework
> --
>
> Key: CASSANDRA-18722
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18722
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Francisco Guerrero
>Assignee: Francisco Guerrero
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, {{INodeProvisionStrategy}} supports two strategies 
> {{OneNetworkInterface}} and {{MultipleNetworkInterfaces}}. However the 
> {{seedPort}}, {{storagePorts}}, {{nativeTransportPorts}}, and {{jmxPorts}} 
> are always fixed or a function of the node number.
> In order to better support parallel test runs, we need to support dynamic 
> port allocation for the {{seedPort}}, {{storagePorts}}, 
> {{nativeTransportPorts}}, and {{jmxPorts}}. This would enable us to more 
> easily write tests that can run in parallel. This effort is only a stepping 
> stone in what's required to run more tests in parallel, but it allows us to 
> begin somewhere with the in-jvm dtest framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18722) Support Dynamic Port Allocation for in-jvm dtest framework

2023-08-25 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759169#comment-17759169
 ] 

Jon Meredith commented on CASSANDRA-18722:
--

Rebased after CASSANDRA-18725 and kicked CI

CI Results (pending):
||Branch||Source||Circle CI||Jenkins||
|cassandra-4.0|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18722-cassandra-4.0-6F6F4FED-DACE-48BC-AF92-65E55DC63F13]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18722-cassandra-4.0-6F6F4FED-DACE-48BC-AF92-65E55DC63F13]|[build|unknown]|
|cassandra-4.1|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18722-cassandra-4.1-6F6F4FED-DACE-48BC-AF92-65E55DC63F13]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18722-cassandra-4.1-6F6F4FED-DACE-48BC-AF92-65E55DC63F13]|[build|unknown]|
|cassandra-5.0|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18722-cassandra-5.0-6F6F4FED-DACE-48BC-AF92-65E55DC63F13]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18722-cassandra-5.0-6F6F4FED-DACE-48BC-AF92-65E55DC63F13]|[build|unknown]|
|trunk|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18722-trunk-6F6F4FED-DACE-48BC-AF92-65E55DC63F13]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18722-trunk-6F6F4FED-DACE-48BC-AF92-65E55DC63F13]|[build|unknown]|


> Support Dynamic Port Allocation for in-jvm dtest framework
> --
>
> Key: CASSANDRA-18722
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18722
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Francisco Guerrero
>Assignee: Francisco Guerrero
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, {{INodeProvisionStrategy}} supports two strategies 
> {{OneNetworkInterface}} and {{MultipleNetworkInterfaces}}. However the 
> {{seedPort}}, {{storagePorts}}, {{nativeTransportPorts}}, and {{jmxPorts}} 
> are always fixed or a function of the node number.
> In order to better support parallel test runs, we need to support dynamic 
> port allocation for the {{seedPort}}, {{storagePorts}}, 
> {{nativeTransportPorts}}, and {{jmxPorts}}. This would enable us to more 
> easily write tests that can run in parallel. This effort is only a stepping 
> stone in what's required to run more tests in parallel, but it allows us to 
> begin somewhere with the in-jvm dtest framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18725) IsolatedJMX should not release all TCPEndpoints on instance shutdown

2023-08-24 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18725:
-
  Fix Version/s: 3.11.17
 4.0.12
 4.1.4
 5.0-alpha
 (was: 3.11.x)
 (was: 4.0.x)
 (was: 4.1.x)
 (was: 5.0.x)
Source Control Link:  
https://github.com/apache/cassandra/commit/c6d7d070c59d81db8949683d3e5670b909efb48c
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> IsolatedJMX should not release all TCPEndpoints on instance shutdown
> 
>
> Key: CASSANDRA-18725
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18725
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
> Fix For: 3.11.17, 4.0.12, 4.1.4, 5.0-alpha, 5.x
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In the original implementation of the JMX feature, we fixed some memory leaks 
> by clearing some internal state in Java’s TCPEndpoint. However, that 
> implementation was overly aggressive and cleared the whole map, vs. just 
> removing the endpoints created by the individual instances. This causes 
> issues when you remove a node from the cluster (as all of the endpoints are 
> cleared, not just the ones in use by that instance).
>  
> In stead, we should check if the endpoint was created by the instance in 
> question and only remove it if it was.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18725) IsolatedJMX should not release all TCPEndpoints on instance shutdown

2023-08-24 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758767#comment-17758767
 ] 

Jon Meredith commented on CASSANDRA-18725:
--

Reviewing test failures, I don't think any are due to the changes as it should 
only affect in-jvm tests and those failures are pre-existing.
{{code}}

*3.11*

j8_jvm_dtests

 testReprepareMixedVersionWithoutReset - hashcollision

j8_jvm_upgrade_dtests

 testDropCompactWithClusteringAndValueColumn  - cannot achieve consistency
 testDropCompactWithClusteringAndValueColumnWithDeletesAndWrites - unavailable 
exception
 org.apache.cassandra.distributed.upgrade.MixedModeReadTest - unavailable 
exception

*4.0*

j11_dtest
  test_move_backwards_between_and_cleanup - startup timeout

j11_jvm_dtests
  org.apache.cassandra.distributed.test.FailingRepairTest:testFailingMessage - 
test timeout

j11_unit_tests
  testPagingWithClustering

j8_jvm_dtests
  timeout in org.apache.cassandra.distributed.test.FailingRepairTest

j8_jvm_upgrade_dtests

  org.apache.cassandra.distributed.upgrade.MixedModeReadTest -- cannot achieve 
consistency


*4.1*

j11_dtests
  ttl_test.py::TestTTL::test_insert_ttl_has_priority_on_defaut_ttl

j8_jvm_upgrade_dtests
  org.apache.cassandra.distributed.upgrade.MixedModeReadTest -- cannot achieve 
consistency

j8_upgrade_dtests
  
TestProtoV3Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD.test_parallel_upgrade_with_internode_ssl

j8_upgrade_dtests
  TestUpgrade_indev_4_1_x_To_indev_trunk.test_bootstrap_multidc

*5.0*

j11_dtests_vnode
  TestSecondaryIndexes.test_failing_manual_rebuild_index

*trunk*
  org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityV30AllOneTest 
-- failing before commit
{{code}}

> IsolatedJMX should not release all TCPEndpoints on instance shutdown
> 
>
> Key: CASSANDRA-18725
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18725
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
> Fix For: 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In the original implementation of the JMX feature, we fixed some memory leaks 
> by clearing some internal state in Java’s TCPEndpoint. However, that 
> implementation was overly aggressive and cleared the whole map, vs. just 
> removing the endpoints created by the individual instances. This causes 
> issues when you remove a node from the cluster (as all of the endpoints are 
> cleared, not just the ones in use by that instance).
>  
> In stead, we should check if the endpoint was created by the instance in 
> question and only remove it if it was.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18725) IsolatedJMX should not release all TCPEndpoints on instance shutdown

2023-08-23 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18725:
-
Status: Ready to Commit  (was: Review In Progress)

Starting commit

CI Results (pending):
||Branch||Source||Circle CI||Jenkins||
|cassandra-3.11|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18725-cassandra-3.11-2B16244D-B55E-4541-A5D6-CBCB9ACC7CF8]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18725-cassandra-3.11-2B16244D-B55E-4541-A5D6-CBCB9ACC7CF8]|[build|unknown]|
|cassandra-4.0|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18725-cassandra-4.0-2B16244D-B55E-4541-A5D6-CBCB9ACC7CF8]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18725-cassandra-4.0-2B16244D-B55E-4541-A5D6-CBCB9ACC7CF8]|[build|unknown]|
|cassandra-4.1|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18725-cassandra-4.1-2B16244D-B55E-4541-A5D6-CBCB9ACC7CF8]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18725-cassandra-4.1-2B16244D-B55E-4541-A5D6-CBCB9ACC7CF8]|[build|unknown]|
|cassandra-5.0|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18725-cassandra-5.0-2B16244D-B55E-4541-A5D6-CBCB9ACC7CF8]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18725-cassandra-5.0-2B16244D-B55E-4541-A5D6-CBCB9ACC7CF8]|[build|unknown]|
|trunk|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18725-trunk-2B16244D-B55E-4541-A5D6-CBCB9ACC7CF8]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18725-trunk-2B16244D-B55E-4541-A5D6-CBCB9ACC7CF8]|[build|unknown]|

> IsolatedJMX should not release all TCPEndpoints on instance shutdown
> 
>
> Key: CASSANDRA-18725
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18725
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
> Fix For: 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In the original implementation of the JMX feature, we fixed some memory leaks 
> by clearing some internal state in Java’s TCPEndpoint. However, that 
> implementation was overly aggressive and cleared the whole map, vs. just 
> removing the endpoints created by the individual instances. This causes 
> issues when you remove a node from the cluster (as all of the endpoints are 
> cleared, not just the ones in use by that instance).
>  
> In stead, we should check if the endpoint was created by the instance in 
> question and only remove it if it was.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18722) Support Dynamic Port Allocation for in-jvm dtest framework

2023-08-23 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758170#comment-17758170
 ] 

Jon Meredith commented on CASSANDRA-18722:
--

Starting commit

CI Results (pending):
||Branch||Source||Circle CI||Jenkins||
|cassandra-4.0|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18722-cassandra-4.0-07BAFB36-456F-4505-B8AA-DE800A7F49EA]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18722-cassandra-4.0-07BAFB36-456F-4505-B8AA-DE800A7F49EA]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2570/]|
|cassandra-4.1|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18722-cassandra-4.1-07BAFB36-456F-4505-B8AA-DE800A7F49EA]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18722-cassandra-4.1-07BAFB36-456F-4505-B8AA-DE800A7F49EA]|[build|unknown]|
|cassandra-5.0|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18722-cassandra-5.0-07BAFB36-456F-4505-B8AA-DE800A7F49EA]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18722-cassandra-5.0-07BAFB36-456F-4505-B8AA-DE800A7F49EA]|[build|unknown]|
|trunk|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18722-trunk-07BAFB36-456F-4505-B8AA-DE800A7F49EA]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18722-trunk-07BAFB36-456F-4505-B8AA-DE800A7F49EA]|[build|unknown]|

> Support Dynamic Port Allocation for in-jvm dtest framework
> --
>
> Key: CASSANDRA-18722
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18722
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Francisco Guerrero
>Assignee: Francisco Guerrero
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, {{INodeProvisionStrategy}} supports two strategies 
> {{OneNetworkInterface}} and {{MultipleNetworkInterfaces}}. However the 
> {{seedPort}}, {{storagePorts}}, {{nativeTransportPorts}}, and {{jmxPorts}} 
> are always fixed or a function of the node number.
> In order to better support parallel test runs, we need to support dynamic 
> port allocation for the {{seedPort}}, {{storagePorts}}, 
> {{nativeTransportPorts}}, and {{jmxPorts}}. This would enable us to more 
> easily write tests that can run in parallel. This effort is only a stepping 
> stone in what's required to run more tests in parallel, but it allows us to 
> begin somewhere with the in-jvm dtest framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18722) Support Dynamic Port Allocation for in-jvm dtest framework

2023-08-23 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18722:
-
Reviewers: Dinesh Joshi, Jon Meredith, Yifan Cai  (was: Dinesh Joshi, Jon 
Meredith)

> Support Dynamic Port Allocation for in-jvm dtest framework
> --
>
> Key: CASSANDRA-18722
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18722
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Francisco Guerrero
>Assignee: Francisco Guerrero
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, {{INodeProvisionStrategy}} supports two strategies 
> {{OneNetworkInterface}} and {{MultipleNetworkInterfaces}}. However the 
> {{seedPort}}, {{storagePorts}}, {{nativeTransportPorts}}, and {{jmxPorts}} 
> are always fixed or a function of the node number.
> In order to better support parallel test runs, we need to support dynamic 
> port allocation for the {{seedPort}}, {{storagePorts}}, 
> {{nativeTransportPorts}}, and {{jmxPorts}}. This would enable us to more 
> easily write tests that can run in parallel. This effort is only a stepping 
> stone in what's required to run more tests in parallel, but it allows us to 
> begin somewhere with the in-jvm dtest framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18722) Support Dynamic Port Allocation for in-jvm dtest framework

2023-08-23 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18722:
-
Status: Ready to Commit  (was: Review In Progress)

> Support Dynamic Port Allocation for in-jvm dtest framework
> --
>
> Key: CASSANDRA-18722
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18722
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Francisco Guerrero
>Assignee: Francisco Guerrero
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, {{INodeProvisionStrategy}} supports two strategies 
> {{OneNetworkInterface}} and {{MultipleNetworkInterfaces}}. However the 
> {{seedPort}}, {{storagePorts}}, {{nativeTransportPorts}}, and {{jmxPorts}} 
> are always fixed or a function of the node number.
> In order to better support parallel test runs, we need to support dynamic 
> port allocation for the {{seedPort}}, {{storagePorts}}, 
> {{nativeTransportPorts}}, and {{jmxPorts}}. This would enable us to more 
> easily write tests that can run in parallel. This effort is only a stepping 
> stone in what's required to run more tests in parallel, but it allows us to 
> begin somewhere with the in-jvm dtest framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18722) Support Dynamic Port Allocation for in-jvm dtest framework

2023-08-23 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758140#comment-17758140
 ] 

Jon Meredith commented on CASSANDRA-18722:
--

+1 from me too, I'll fire up the commit test.

> Support Dynamic Port Allocation for in-jvm dtest framework
> --
>
> Key: CASSANDRA-18722
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18722
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Francisco Guerrero
>Assignee: Francisco Guerrero
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, {{INodeProvisionStrategy}} supports two strategies 
> {{OneNetworkInterface}} and {{MultipleNetworkInterfaces}}. However the 
> {{seedPort}}, {{storagePorts}}, {{nativeTransportPorts}}, and {{jmxPorts}} 
> are always fixed or a function of the node number.
> In order to better support parallel test runs, we need to support dynamic 
> port allocation for the {{seedPort}}, {{storagePorts}}, 
> {{nativeTransportPorts}}, and {{jmxPorts}}. This would enable us to more 
> easily write tests that can run in parallel. This effort is only a stepping 
> stone in what's required to run more tests in parallel, but it allows us to 
> begin somewhere with the in-jvm dtest framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18722) Support Dynamic Port Allocation for in-jvm dtest framework

2023-08-23 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18722:
-
Reviewers: Dinesh Joshi, Jon Meredith
   Status: Review In Progress  (was: Needs Committer)

> Support Dynamic Port Allocation for in-jvm dtest framework
> --
>
> Key: CASSANDRA-18722
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18722
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Francisco Guerrero
>Assignee: Francisco Guerrero
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, {{INodeProvisionStrategy}} supports two strategies 
> {{OneNetworkInterface}} and {{MultipleNetworkInterfaces}}. However the 
> {{seedPort}}, {{storagePorts}}, {{nativeTransportPorts}}, and {{jmxPorts}} 
> are always fixed or a function of the node number.
> In order to better support parallel test runs, we need to support dynamic 
> port allocation for the {{seedPort}}, {{storagePorts}}, 
> {{nativeTransportPorts}}, and {{jmxPorts}}. This would enable us to more 
> easily write tests that can run in parallel. This effort is only a stepping 
> stone in what's required to run more tests in parallel, but it allows us to 
> begin somewhere with the in-jvm dtest framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18722) Support Dynamic Port Allocation for in-jvm dtest framework

2023-08-23 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18722:
-
Status: Needs Committer  (was: Patch Available)

> Support Dynamic Port Allocation for in-jvm dtest framework
> --
>
> Key: CASSANDRA-18722
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18722
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Francisco Guerrero
>Assignee: Francisco Guerrero
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, {{INodeProvisionStrategy}} supports two strategies 
> {{OneNetworkInterface}} and {{MultipleNetworkInterfaces}}. However the 
> {{seedPort}}, {{storagePorts}}, {{nativeTransportPorts}}, and {{jmxPorts}} 
> are always fixed or a function of the node number.
> In order to better support parallel test runs, we need to support dynamic 
> port allocation for the {{seedPort}}, {{storagePorts}}, 
> {{nativeTransportPorts}}, and {{jmxPorts}}. This would enable us to more 
> easily write tests that can run in parallel. This effort is only a stepping 
> stone in what's required to run more tests in parallel, but it allows us to 
> begin somewhere with the in-jvm dtest framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18778) Empty keystore_password no longer allowed on encryption_options

2023-08-21 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757037#comment-17757037
 ] 

Jon Meredith commented on CASSANDRA-18778:
--

+1 from me, I don't think we considered existing deployments carefully enough 
when the check was upgraded from just null to empty string in CASSANDRA-18124

> Empty keystore_password no longer allowed on encryption_options
> ---
>
> Key: CASSANDRA-18778
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18778
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Andy Tolbert
>Assignee: Andy Tolbert
>Priority: Normal
> Fix For: 4.1.x, 5.0.x
>
>
> After CASSANDRA-18124 (introduced in 4.1.2 and 5.0) it is no longer possible 
> to set an empty {{keystore_password}} under {{client_encryption_options}} or 
> {{server_encryption_options}} using the default implementation 
> {{{}DefaultSslContextFactory{}}}.
> While keytool does not allow generating keystores with empty passwords, it 
> does support reading them. It is not uncommon to use PKCS12 certificates 
> generated by other tools (eg. openssl) that do not enforce passwords.
> The fix for this should be pretty straightforward, which should involve 
> changing 
> [FileBasedSslContextFactory.validatePassword|https://github.com/apache/cassandra/blob/cassandra-4.1.2/src/java/org/apache/cassandra/security/FileBasedSslContextFactory.java#L128-L135]
>  to only disallow null passwords (which would be consistent with previous 
> versions). I will create pull requests against the relevant branches shortly.
> {noformat}
> Exception (org.apache.cassandra.exceptions.ConfigurationException) 
> encountered during startup: Failed to initialize SSL
> org.apache.cassandra.exceptions.ConfigurationException: Failed to initialize 
> SSL
>   at 
> org.apache.cassandra.config.DatabaseDescriptor.applySslContext(DatabaseDescriptor.java:1155)
>   at 
> org.apache.cassandra.config.DatabaseDescriptor.applyAll(DatabaseDescriptor.java:390)
>   at 
> org.apache.cassandra.config.DatabaseDescriptor.daemonInitialization(DatabaseDescriptor.java:204)
>   at 
> org.apache.cassandra.config.DatabaseDescriptor.daemonInitialization(DatabaseDescriptor.java:188)
>   at 
> org.apache.cassandra.service.CassandraDaemon.applyConfig(CassandraDaemon.java:804)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:747)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:875)
> Caused by: java.io.IOException: Failed to create SSL context using Native 
> transport
>   at 
> org.apache.cassandra.security.SSLFactory.validateSslContext(SSLFactory.java:405)
>   at 
> org.apache.cassandra.config.DatabaseDescriptor.applySslContext(DatabaseDescriptor.java:1150)
>   ... 6 more
> Caused by: java.lang.IllegalArgumentException: 'keystore_password' must be 
> specified
>   at 
> org.apache.cassandra.security.FileBasedSslContextFactory.validatePassword(FileBasedSslContextFactory.java:133)
>   at 
> org.apache.cassandra.security.FileBasedSslContextFactory.buildKeyManagerFactory(FileBasedSslContextFactory.java:151)
>   at 
> org.apache.cassandra.security.AbstractSslContextFactory.createNettySslContext(AbstractSslContextFactory.java:181)
>   at 
> org.apache.cassandra.security.SSLFactory.createNettySslContext(SSLFactory.java:168)
>   at 
> org.apache.cassandra.security.SSLFactory.validateSslContext(SSLFactory.java:355)
>   ... 7 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18778) Empty keystore_password no longer allowed on encryption_options

2023-08-21 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757033#comment-17757033
 ] 

Jon Meredith commented on CASSANDRA-18778:
--

Running through CI

CI Results (pending):
||Branch||Source||Circle CI||Jenkins||
|cassandra-4.1|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18778-cassandra-4.1-F8EF5D5E-1505-4998-83C2-77D5EF86AF6D]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18778-cassandra-4.1-F8EF5D5E-1505-4998-83C2-77D5EF86AF6D]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2566/]|
|cassandra-5.0|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18778-cassandra-5.0-F8EF5D5E-1505-4998-83C2-77D5EF86AF6D]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18778-cassandra-5.0-F8EF5D5E-1505-4998-83C2-77D5EF86AF6D]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2567/]|
|trunk|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18778-trunk-F8EF5D5E-1505-4998-83C2-77D5EF86AF6D]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18778-trunk-F8EF5D5E-1505-4998-83C2-77D5EF86AF6D]|[build|unknown]|

> Empty keystore_password no longer allowed on encryption_options
> ---
>
> Key: CASSANDRA-18778
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18778
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Andy Tolbert
>Assignee: Andy Tolbert
>Priority: Normal
> Fix For: 4.1.x, 5.0.x
>
>
> After CASSANDRA-18124 (introduced in 4.1.2 and 5.0) it is no longer possible 
> to set an empty {{keystore_password}} under {{client_encryption_options}} or 
> {{server_encryption_options}} using the default implementation 
> {{{}DefaultSslContextFactory{}}}.
> While keytool does not allow generating keystores with empty passwords, it 
> does support reading them. It is not uncommon to use PKCS12 certificates 
> generated by other tools (eg. openssl) that do not enforce passwords.
> The fix for this should be pretty straightforward, which should involve 
> changing 
> [FileBasedSslContextFactory.validatePassword|https://github.com/apache/cassandra/blob/cassandra-4.1.2/src/java/org/apache/cassandra/security/FileBasedSslContextFactory.java#L128-L135]
>  to only disallow null passwords (which would be consistent with previous 
> versions). I will create pull requests against the relevant branches shortly.
> {noformat}
> Exception (org.apache.cassandra.exceptions.ConfigurationException) 
> encountered during startup: Failed to initialize SSL
> org.apache.cassandra.exceptions.ConfigurationException: Failed to initialize 
> SSL
>   at 
> org.apache.cassandra.config.DatabaseDescriptor.applySslContext(DatabaseDescriptor.java:1155)
>   at 
> org.apache.cassandra.config.DatabaseDescriptor.applyAll(DatabaseDescriptor.java:390)
>   at 
> org.apache.cassandra.config.DatabaseDescriptor.daemonInitialization(DatabaseDescriptor.java:204)
>   at 
> org.apache.cassandra.config.DatabaseDescriptor.daemonInitialization(DatabaseDescriptor.java:188)
>   at 
> org.apache.cassandra.service.CassandraDaemon.applyConfig(CassandraDaemon.java:804)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:747)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:875)
> Caused by: java.io.IOException: Failed to create SSL context using Native 
> transport
>   at 
> org.apache.cassandra.security.SSLFactory.validateSslContext(SSLFactory.java:405)
>   at 
> org.apache.cassandra.config.DatabaseDescriptor.applySslContext(DatabaseDescriptor.java:1150)
>   ... 6 more
> Caused by: java.lang.IllegalArgumentException: 'keystore_password' must be 
> specified
>   at 
> org.apache.cassandra.security.FileBasedSslContextFactory.validatePassword(FileBasedSslContextFactory.java:133)
>   at 
> org.apache.cassandra.security.FileBasedSslContextFactory.buildKeyManagerFactory(FileBasedSslContextFactory.java:151)
>   at 
> org.apache.cassandra.security.AbstractSslContextFactory.createNettySslContext(AbstractSslContextFactory.java:181)
>   at 
> org.apache.cassandra.security.SSLFactory.createNettySslContext(SSLFactory.java:168)
>   at 
> org.apache.cassandra.security.SSLFactory.validateSslContext(SSLFactory.java:355)
>   ... 7 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18725) IsolatedJMX should not release all TCPEndpoints on instance shutdown

2023-08-18 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756132#comment-17756132
 ] 

Jon Meredith commented on CASSANDRA-18725:
--

+1 - thanks for digging through the test failures

> IsolatedJMX should not release all TCPEndpoints on instance shutdown
> 
>
> Key: CASSANDRA-18725
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18725
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
> Fix For: 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In the original implementation of the JMX feature, we fixed some memory leaks 
> by clearing some internal state in Java’s TCPEndpoint. However, that 
> implementation was overly aggressive and cleared the whole map, vs. just 
> removing the endpoints created by the individual instances. This causes 
> issues when you remove a node from the cluster (as all of the endpoints are 
> cleared, not just the ones in use by that instance).
>  
> In stead, we should check if the endpoint was created by the instance in 
> question and only remove it if it was.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18733) Waiting indefinitely on ReceivedMessage response in StreamSession#receive() can cause deadlock

2023-08-16 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755247#comment-17755247
 ] 

Jon Meredith commented on CASSANDRA-18733:
--

CI Results (pending):
||Branch||Source||Circle CI||Jenkins||
|cassandra-4.1|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18733-cassandra-4.1-3B701280-17E1-427A-8F6B-EAD2F6E02C40]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18733-cassandra-4.1-3B701280-17E1-427A-8F6B-EAD2F6E02C40]|[build\|unknown]|
|cassandra-5.0|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18733-cassandra-5.0-3B701280-17E1-427A-8F6B-EAD2F6E02C40]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18733-cassandra-5.0-3B701280-17E1-427A-8F6B-EAD2F6E02C40]|[build\|unknown]|
|trunk|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18733-trunk-3B701280-17E1-427A-8F6B-EAD2F6E02C40]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18733-trunk-3B701280-17E1-427A-8F6B-EAD2F6E02C40]|[build\|unknown]|

> Waiting indefinitely on ReceivedMessage response in StreamSession#receive() 
> can cause deadlock
> --
>
> Key: CASSANDRA-18733
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18733
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming
>Reporter: Caleb Rackliffe
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.1.x, 5.0, 5.x
>
>
> I've observed in a recent stack trace from a node running 4.1 what looks like 
> a deadlock around the {{StreamSession}} monitor lock when 
> {{StreamSession#receive()}} waits via {{syncUninteruptibly()}} for a response 
> to a control message.
> {noformat}
> "Messaging-EventLoop-3-10" #320 daemon prio=5 os_prio=0 cpu=57979617.98ms 
> elapsed=5587916.03s tid=0x7f056e88ae00 nid=0x80ec waiting for monitor 
> entry  [0x7f056d277000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:524)
> - waiting to lock <0x0006816fae70> (a 
> org.apache.cassandra.streaming.StreamSession)
> at 
> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:690)
> at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel.onMessageComplete(StreamingMultiplexedChannel.java:264)
> at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel.lambda$sendMessage$1(StreamingMultiplexedChannel.java:233)
> at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel$$Lambda$2029/0x0008007a0c40.operationComplete(Unknown
>  Source)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notifyListener(ListenerList.java:134)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notifyListener(ListenerList.java:148)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList$GenericFutureListenerList.notifySelf(ListenerList.java:190)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.lambda$notifyExclusive$0(ListenerList.java:124)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList$$Lambda$950/0x000800666040.accept(Unknown
>  Source)
> at 
> org.apache.cassandra.utils.concurrent.IntrusiveStack.forEach(IntrusiveStack.java:195)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notifyExclusive(ListenerList.java:124)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notify(ListenerList.java:96)
> at 
> org.apache.cassandra.utils.concurrent.AsyncFuture.trySet(AsyncFuture.java:104)
> at 
> org.apache.cassandra.utils.concurrent.AbstractFuture.tryFailure(AbstractFuture.java:148)
> at 
> org.apache.cassandra.utils.concurrent.AsyncPromise.tryFailure(AsyncPromise.java:139)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:1009)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:870)
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
> at 
> io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
> at 
> 

[jira] [Commented] (CASSANDRA-18733) Waiting indefinitely on ReceivedMessage response in StreamSession#receive() can cause deadlock

2023-08-15 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754829#comment-17754829
 ] 

Jon Meredith commented on CASSANDRA-18733:
--

Looks like I need to upgrade my script for the java11/java17 changes for 
CircleCI

> Waiting indefinitely on ReceivedMessage response in StreamSession#receive() 
> can cause deadlock
> --
>
> Key: CASSANDRA-18733
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18733
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming
>Reporter: Caleb Rackliffe
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.1.x, 5.0, 5.x
>
>
> I've observed in a recent stack trace from a node running 4.1 what looks like 
> a deadlock around the {{StreamSession}} monitor lock when 
> {{StreamSession#receive()}} waits via {{syncUninteruptibly()}} for a response 
> to a control message.
> {noformat}
> "Messaging-EventLoop-3-10" #320 daemon prio=5 os_prio=0 cpu=57979617.98ms 
> elapsed=5587916.03s tid=0x7f056e88ae00 nid=0x80ec waiting for monitor 
> entry  [0x7f056d277000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:524)
> - waiting to lock <0x0006816fae70> (a 
> org.apache.cassandra.streaming.StreamSession)
> at 
> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:690)
> at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel.onMessageComplete(StreamingMultiplexedChannel.java:264)
> at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel.lambda$sendMessage$1(StreamingMultiplexedChannel.java:233)
> at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel$$Lambda$2029/0x0008007a0c40.operationComplete(Unknown
>  Source)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notifyListener(ListenerList.java:134)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notifyListener(ListenerList.java:148)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList$GenericFutureListenerList.notifySelf(ListenerList.java:190)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.lambda$notifyExclusive$0(ListenerList.java:124)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList$$Lambda$950/0x000800666040.accept(Unknown
>  Source)
> at 
> org.apache.cassandra.utils.concurrent.IntrusiveStack.forEach(IntrusiveStack.java:195)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notifyExclusive(ListenerList.java:124)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notify(ListenerList.java:96)
> at 
> org.apache.cassandra.utils.concurrent.AsyncFuture.trySet(AsyncFuture.java:104)
> at 
> org.apache.cassandra.utils.concurrent.AbstractFuture.tryFailure(AbstractFuture.java:148)
> at 
> org.apache.cassandra.utils.concurrent.AsyncPromise.tryFailure(AsyncPromise.java:139)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:1009)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:870)
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
> at 
> io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
> at 
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(java.base@11.0.16/Thread.java:829)
> {noformat}
> It seems that while {{receive()} is holding the monitor lock on 
> {{StreamSession}}, the callback that executes on a different thread for the 
> control message it sends carries an error. This error, when handled in 
> {{onError()}}, then calls {{closeSession()}}, which tries to acquire the 
> monitor lock already 

[jira] [Updated] (CASSANDRA-18733) Waiting indefinitely on ReceivedMessage response in StreamSession#receive() can cause deadlock

2023-08-15 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18733:
-
Test and Documentation Plan: Added StreamDisconnectedWhileReceivingTest 
 Status: Patch Available  (was: Open)

[trunk|https://github.com/jonmeredith/cassandra/tree/C18733-trunk] 
[PR|https://github.com/apache/cassandra/pull/2565]

Other branches were very similar.
[5.0|https://github.com/jonmeredith/cassandra/tree/C18733-5.0]
[4.1|https://github.com/jonmeredith/cassandra/tree/C18733-4.1]

CI Results (pending):
||Branch||Source||Circle CI||Jenkins||
|cassandra-4.1|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18733-cassandra-4.1-DEB132B4-DE06-474A-8655-D0BBC26B3E89]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18733-cassandra-4.1-DEB132B4-DE06-474A-8655-D0BBC26B3E89]|[build|unknown]|
|cassandra-5.0|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18733-cassandra-5.0-DEB132B4-DE06-474A-8655-D0BBC26B3E89]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18733-cassandra-5.0-DEB132B4-DE06-474A-8655-D0BBC26B3E89]|[build|unknown]|
|trunk|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18733-trunk-DEB132B4-DE06-474A-8655-D0BBC26B3E89]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18733-trunk-DEB132B4-DE06-474A-8655-D0BBC26B3E89]|[build|unknown]|

> Waiting indefinitely on ReceivedMessage response in StreamSession#receive() 
> can cause deadlock
> --
>
> Key: CASSANDRA-18733
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18733
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming
>Reporter: Caleb Rackliffe
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.1.x, 5.0, 5.x
>
>
> I've observed in a recent stack trace from a node running 4.1 what looks like 
> a deadlock around the {{StreamSession}} monitor lock when 
> {{StreamSession#receive()}} waits via {{syncUninteruptibly()}} for a response 
> to a control message.
> {noformat}
> "Messaging-EventLoop-3-10" #320 daemon prio=5 os_prio=0 cpu=57979617.98ms 
> elapsed=5587916.03s tid=0x7f056e88ae00 nid=0x80ec waiting for monitor 
> entry  [0x7f056d277000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:524)
> - waiting to lock <0x0006816fae70> (a 
> org.apache.cassandra.streaming.StreamSession)
> at 
> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:690)
> at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel.onMessageComplete(StreamingMultiplexedChannel.java:264)
> at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel.lambda$sendMessage$1(StreamingMultiplexedChannel.java:233)
> at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel$$Lambda$2029/0x0008007a0c40.operationComplete(Unknown
>  Source)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notifyListener(ListenerList.java:134)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notifyListener(ListenerList.java:148)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList$GenericFutureListenerList.notifySelf(ListenerList.java:190)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.lambda$notifyExclusive$0(ListenerList.java:124)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList$$Lambda$950/0x000800666040.accept(Unknown
>  Source)
> at 
> org.apache.cassandra.utils.concurrent.IntrusiveStack.forEach(IntrusiveStack.java:195)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notifyExclusive(ListenerList.java:124)
> at 
> org.apache.cassandra.utils.concurrent.ListenerList.notify(ListenerList.java:96)
> at 
> org.apache.cassandra.utils.concurrent.AsyncFuture.trySet(AsyncFuture.java:104)
> at 
> org.apache.cassandra.utils.concurrent.AbstractFuture.tryFailure(AbstractFuture.java:148)
> at 
> org.apache.cassandra.utils.concurrent.AsyncPromise.tryFailure(AsyncPromise.java:139)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:1009)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:870)
> at 
> 

[jira] [Commented] (CASSANDRA-18360) Test Failure: o.a.c.cql3.validation.operations.AlterTest#testDropListAndAddListWithSameName

2023-08-15 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754726#comment-17754726
 ] 

Jon Meredith commented on CASSANDRA-18360:
--

Thanks for tagging me, I didn't see this before. I'll add investigating the 
merge to my task list.

> Test Failure: 
> o.a.c.cql3.validation.operations.AlterTest#testDropListAndAddListWithSameName
> ---
>
> Key: CASSANDRA-18360
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18360
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Andres de la Peña
>Priority: Normal
> Fix For: 5.x
>
>
> The unit test 
> {{org.apache.cassandra.cql3.validation.operations.AlterTest#testDropListAndAddListWithSameName}}
>  is flaky at least on trunk. Flakiness seems lesser than 1%.
> While I haven't seen it on Jenkins yet, it can easily be reproduced on 
> CircleCI with the multiplexer:
> * 
> https://app.circleci.com/pipelines/github/adelapena/cassandra/2731/workflows/58edc2f6-9a21-4d09-b783-b7fb15e1b320/jobs/32235
> * 
> https://app.circleci.com/pipelines/github/adelapena/cassandra/2731/workflows/58edc2f6-9a21-4d09-b783-b7fb15e1b320/jobs/32234
> * 
> https://app.circleci.com/pipelines/github/adelapena/cassandra/2731/workflows/739a95d3-8e42-4447-93dd-122fc16fdd7d/jobs/32233/tests
> Those runs show two types of errors:
> {code}
> junit.framework.AssertionFailedError: Invalid value for row 0 column 1 
> (content of type text), expected  but got 
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1506)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:102)
> {code}
> and: 
> {code}
> org.apache.cassandra.serializers.MarshalException: Invalid UTF-8 bytes 
> 00e0279515437f00
>   at 
> org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(AbstractTextSerializer.java:46)
>   at 
> org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(AbstractTextSerializer.java:29)
>   at 
> org.apache.cassandra.serializers.TypeSerializer.deserialize(TypeSerializer.java:37)
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1494)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:102)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}
> The CircleCI config I used to reproduce the test failure can be generated 
> with:
> {code}
> .circleci/generate.sh -p \
>   -e REPEATED_UTESTS_COUNT=500 \
>   -e REPEATED_UTESTS=org.apache.cassandra.cql3.validation.operations.AlterTest
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18725) IsolatedJMX should not release all TCPEndpoints on instance shutdown

2023-08-14 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18725:
-
Reviewers: Jon Meredith
   Status: Review In Progress  (was: Patch Available)

> IsolatedJMX should not release all TCPEndpoints on instance shutdown
> 
>
> Key: CASSANDRA-18725
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18725
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
> Fix For: 3.11.x, 4.0, 4.1.x, 5.0, 5.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In the original implementation of the JMX feature, we fixed some memory leaks 
> by clearing some internal state in Java’s TCPEndpoint. However, that 
> implementation was overly aggressive and cleared the whole map, vs. just 
> removing the endpoints created by the individual instances. This causes 
> issues when you remove a node from the cluster (as all of the endpoints are 
> cleared, not just the ones in use by that instance).
>  
> In stead, we should check if the endpoint was created by the instance in 
> question and only remove it if it was.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18736) Streaming exception race creates corrupt transaction log files that prevent restart

2023-08-09 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18736:
-
Reviewers: Caleb Rackliffe, David Capwell  (was: Caleb Rackliffe, David 
Capwell, Jon Meredith)

> Streaming exception race creates corrupt transaction log files that prevent 
> restart
> ---
>
> Key: CASSANDRA-18736
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18736
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Streaming, Local/Startup and Shutdown
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0
>
>
> On restart, Cassandra logs this message and terminates.
> {code:java}
> ERROR 2023-07-17T17:17:22,931 [main] 
> org.apache.cassandra.db.lifecycle.LogTransaction:561 - Unexpected disk state: 
> failed to read transaction log 
> [nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log in 
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043]
> Files and contents follow:
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log
> ABORT:[,0,0][737437348]
> ***This record should have been the last one in all replicas
> 
> ADD:[/datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb-284490-big-,0,8][2493503833]
> {code}
> The root cause is a race during streaming exception handling.
> Although concurrent modification of to the {{LogTransaction}} was added for 
> CASSANDRA-16225, there is nothing to prevent usage after the transaction is 
> completed (committed/aborted) once it has been processed by 
> {{TransactionTidier}} (after the last reference is released). Before the 
> transaction is tidied, the {{LogFile}} keeps a list of records that are 
> checked for completion before adding new entries. In {{TransactionTidier}} 
> {{LogFile.records}} are cleared as no longer needed, however the 
> LogTransaction/LogFile is still accessible to the stream.
> The changes in CASSANDRA-17273 added a parallel set of {{onDiskRecords}} that 
> could be used to reliably recreate the transaction log at any new datadirs 
> the same as the existing
> datadirs - regardless of the effect of 
> {{LogTransaction.untrackNew/LogFile.remove}}
> If a streaming exception causes the LogTransaction to be aborted and tidied 
> just before {{SimpleSSTableMultiWriter}} calls trackNew to add a new sstable. 
> At the time of the call, the {{LogFile}} will not contain any {{LogReplicas}},
> {{LogFile.records}} will be empty, and {{LogFile.onDiskRecords}} will contain 
> an {{ABORT}}.
> When {{LogTransaction.trackNew/LogFile.add}} is called, the check for 
> completed transaction fails as records is empty, there are no replicas on the 
> datadir, so {{maybeCreateReplicas}} creates a new txnlog file replica 
> containing ABORT, then
> appends an ADD record.
> The LogFile has already been tidied after the abort so the txnlog file is not 
> removed and sits on disk until a restart, causing the faiulre.
> There is a related exception caused with a different interleaving of aborts, 
> after an sstable is added, however this is just a nuisance in the logs as the 
> LogRelica is already created with an {{ADD}} record first.
> {code:java}
> java.lang.AssertionError: 
> [ADD:[/datadir1/keyspace/table/nb-23314378-big-,0,8][1869379820]] is not 
> tracked by 55be35b0-35d1-11ee-865d-8b1e3c48ca06
> at org.apache.cassandra.db.lifecycle.LogFile.remove(LogFile.java:388)
> at 
> org.apache.cassandra.db.lifecycle.LogTransaction.untrackNew(LogTransaction.java:158)
> at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.untrackNew(LifecycleTransaction.java:577)
> at 
> org.apache.cassandra.db.streaming.CassandraStreamReceiver$1.untrackNew(CassandraStreamReceiver.java:149)
> at 
> org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.abort(SimpleSSTableMultiWriter.java:95)
> at 
> org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter.abort(RangeAwareSSTableWriter.java:191)
> at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamReader.read(CassandraCompressedStreamReader.java:115)
> at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:85)
> at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:53)
> at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:38)
> at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:53)
> at 
> 

[jira] [Updated] (CASSANDRA-18736) Streaming exception race creates corrupt transaction log files that prevent restart

2023-08-09 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18736:
-
Reviewers: Caleb Rackliffe, David Capwell, Jon Meredith
   Status: Review In Progress  (was: Patch Available)

> Streaming exception race creates corrupt transaction log files that prevent 
> restart
> ---
>
> Key: CASSANDRA-18736
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18736
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Streaming, Local/Startup and Shutdown
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0
>
>
> On restart, Cassandra logs this message and terminates.
> {code:java}
> ERROR 2023-07-17T17:17:22,931 [main] 
> org.apache.cassandra.db.lifecycle.LogTransaction:561 - Unexpected disk state: 
> failed to read transaction log 
> [nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log in 
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043]
> Files and contents follow:
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log
> ABORT:[,0,0][737437348]
> ***This record should have been the last one in all replicas
> 
> ADD:[/datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb-284490-big-,0,8][2493503833]
> {code}
> The root cause is a race during streaming exception handling.
> Although concurrent modification of to the {{LogTransaction}} was added for 
> CASSANDRA-16225, there is nothing to prevent usage after the transaction is 
> completed (committed/aborted) once it has been processed by 
> {{TransactionTidier}} (after the last reference is released). Before the 
> transaction is tidied, the {{LogFile}} keeps a list of records that are 
> checked for completion before adding new entries. In {{TransactionTidier}} 
> {{LogFile.records}} are cleared as no longer needed, however the 
> LogTransaction/LogFile is still accessible to the stream.
> The changes in CASSANDRA-17273 added a parallel set of {{onDiskRecords}} that 
> could be used to reliably recreate the transaction log at any new datadirs 
> the same as the existing
> datadirs - regardless of the effect of 
> {{LogTransaction.untrackNew/LogFile.remove}}
> If a streaming exception causes the LogTransaction to be aborted and tidied 
> just before {{SimpleSSTableMultiWriter}} calls trackNew to add a new sstable. 
> At the time of the call, the {{LogFile}} will not contain any {{LogReplicas}},
> {{LogFile.records}} will be empty, and {{LogFile.onDiskRecords}} will contain 
> an {{ABORT}}.
> When {{LogTransaction.trackNew/LogFile.add}} is called, the check for 
> completed transaction fails as records is empty, there are no replicas on the 
> datadir, so {{maybeCreateReplicas}} creates a new txnlog file replica 
> containing ABORT, then
> appends an ADD record.
> The LogFile has already been tidied after the abort so the txnlog file is not 
> removed and sits on disk until a restart, causing the faiulre.
> There is a related exception caused with a different interleaving of aborts, 
> after an sstable is added, however this is just a nuisance in the logs as the 
> LogRelica is already created with an {{ADD}} record first.
> {code:java}
> java.lang.AssertionError: 
> [ADD:[/datadir1/keyspace/table/nb-23314378-big-,0,8][1869379820]] is not 
> tracked by 55be35b0-35d1-11ee-865d-8b1e3c48ca06
> at org.apache.cassandra.db.lifecycle.LogFile.remove(LogFile.java:388)
> at 
> org.apache.cassandra.db.lifecycle.LogTransaction.untrackNew(LogTransaction.java:158)
> at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.untrackNew(LifecycleTransaction.java:577)
> at 
> org.apache.cassandra.db.streaming.CassandraStreamReceiver$1.untrackNew(CassandraStreamReceiver.java:149)
> at 
> org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.abort(SimpleSSTableMultiWriter.java:95)
> at 
> org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter.abort(RangeAwareSSTableWriter.java:191)
> at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamReader.read(CassandraCompressedStreamReader.java:115)
> at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:85)
> at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:53)
> at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:38)
> at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:53)
> at 
> 

[jira] [Updated] (CASSANDRA-18736) Streaming exception race creates corrupt transaction log files that prevent restart

2023-08-09 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18736:
-
Test and Documentation Plan: Added StreamDisconnectedWhileReceivingTest
 Status: Patch Available  (was: Open)

Trunk [PR|https://github.com/apache/cassandra/pull/2565] 
[Branch|https://github.com/jonmeredith/cassandra/tree/C18733-trunk]
[5.0 Branch|https://github.com/jonmeredith/cassandra/tree/C18733-5.0]

Only minor difference is no \{{StreamSession.failureReason}}
[4.1 Branch|https://github.com/jonmeredith/cassandra/tree/C18733-4.1]

> Streaming exception race creates corrupt transaction log files that prevent 
> restart
> ---
>
> Key: CASSANDRA-18736
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18736
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Streaming, Local/Startup and Shutdown
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0
>
>
> On restart, Cassandra logs this message and terminates.
> {code:java}
> ERROR 2023-07-17T17:17:22,931 [main] 
> org.apache.cassandra.db.lifecycle.LogTransaction:561 - Unexpected disk state: 
> failed to read transaction log 
> [nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log in 
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043]
> Files and contents follow:
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log
> ABORT:[,0,0][737437348]
> ***This record should have been the last one in all replicas
> 
> ADD:[/datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb-284490-big-,0,8][2493503833]
> {code}
> The root cause is a race during streaming exception handling.
> Although concurrent modification of to the {{LogTransaction}} was added for 
> CASSANDRA-16225, there is nothing to prevent usage after the transaction is 
> completed (committed/aborted) once it has been processed by 
> {{TransactionTidier}} (after the last reference is released). Before the 
> transaction is tidied, the {{LogFile}} keeps a list of records that are 
> checked for completion before adding new entries. In {{TransactionTidier}} 
> {{LogFile.records}} are cleared as no longer needed, however the 
> LogTransaction/LogFile is still accessible to the stream.
> The changes in CASSANDRA-17273 added a parallel set of {{onDiskRecords}} that 
> could be used to reliably recreate the transaction log at any new datadirs 
> the same as the existing
> datadirs - regardless of the effect of 
> {{LogTransaction.untrackNew/LogFile.remove}}
> If a streaming exception causes the LogTransaction to be aborted and tidied 
> just before {{SimpleSSTableMultiWriter}} calls trackNew to add a new sstable. 
> At the time of the call, the {{LogFile}} will not contain any {{LogReplicas}},
> {{LogFile.records}} will be empty, and {{LogFile.onDiskRecords}} will contain 
> an {{ABORT}}.
> When {{LogTransaction.trackNew/LogFile.add}} is called, the check for 
> completed transaction fails as records is empty, there are no replicas on the 
> datadir, so {{maybeCreateReplicas}} creates a new txnlog file replica 
> containing ABORT, then
> appends an ADD record.
> The LogFile has already been tidied after the abort so the txnlog file is not 
> removed and sits on disk until a restart, causing the faiulre.
> There is a related exception caused with a different interleaving of aborts, 
> after an sstable is added, however this is just a nuisance in the logs as the 
> LogRelica is already created with an {{ADD}} record first.
> {code:java}
> java.lang.AssertionError: 
> [ADD:[/datadir1/keyspace/table/nb-23314378-big-,0,8][1869379820]] is not 
> tracked by 55be35b0-35d1-11ee-865d-8b1e3c48ca06
> at org.apache.cassandra.db.lifecycle.LogFile.remove(LogFile.java:388)
> at 
> org.apache.cassandra.db.lifecycle.LogTransaction.untrackNew(LogTransaction.java:158)
> at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.untrackNew(LifecycleTransaction.java:577)
> at 
> org.apache.cassandra.db.streaming.CassandraStreamReceiver$1.untrackNew(CassandraStreamReceiver.java:149)
> at 
> org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.abort(SimpleSSTableMultiWriter.java:95)
> at 
> org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter.abort(RangeAwareSSTableWriter.java:191)
> at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamReader.read(CassandraCompressedStreamReader.java:115)
> at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:85)
> at 
> 

[jira] [Commented] (CASSANDRA-18727) JMXUtil.getJmxConnector should retry connection attempts

2023-08-08 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752172#comment-17752172
 ] 

Jon Meredith commented on CASSANDRA-18727:
--

+1

> JMXUtil.getJmxConnector should retry connection attempts 
> -
>
> Key: CASSANDRA-18727
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18727
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
>  Labels: pull-request-available
>
> We previously added a JMXUtil class that makes it easy for dtests to get a 
> JMX connection. It ends up that occasionally the JMX server side needs more 
> time to start up (especially when stopping and restarting instances, which 
> we’re now doing more frequently, or when stopping and restarting the whole 
> cluster). In these cases, JMXUtil.getJmxConnector can fail, when it would be 
> possible to connect if a retry mechanism was added. We should add this 
> capability to the in-jvm dtest framework.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18736) Streaming exception race creates corrupt transaction log files that prevent restart

2023-08-08 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18736:
-
 Bug Category: Parent values: Availability(12983)Level 1 values: Process 
Crash(12992)
   Complexity: Normal
Discovered By: User Report
Fix Version/s: 4.0.x
   4.1.x
   5.0
 Severity: Low
   Status: Open  (was: Triage Needed)

> Streaming exception race creates corrupt transaction log files that prevent 
> restart
> ---
>
> Key: CASSANDRA-18736
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18736
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Streaming, Local/Startup and Shutdown
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0
>
>
> On restart, Cassandra logs this message and terminates.
> {code:java}
> ERROR 2023-07-17T17:17:22,931 [main] 
> org.apache.cassandra.db.lifecycle.LogTransaction:561 - Unexpected disk state: 
> failed to read transaction log 
> [nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log in 
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043]
> Files and contents follow:
> /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log
> ABORT:[,0,0][737437348]
> ***This record should have been the last one in all replicas
> 
> ADD:[/datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb-284490-big-,0,8][2493503833]
> {code}
> The root cause is a race during streaming exception handling.
> Although concurrent modification of to the {{LogTransaction}} was added for 
> CASSANDRA-16225, there is nothing to prevent usage after the transaction is 
> completed (committed/aborted) once it has been processed by 
> {{TransactionTidier}} (after the last reference is released). Before the 
> transaction is tidied, the {{LogFile}} keeps a list of records that are 
> checked for completion before adding new entries. In {{TransactionTidier}} 
> {{LogFile.records}} are cleared as no longer needed, however the 
> LogTransaction/LogFile is still accessible to the stream.
> The changes in CASSANDRA-17273 added a parallel set of {{onDiskRecords}} that 
> could be used to reliably recreate the transaction log at any new datadirs 
> the same as the existing
> datadirs - regardless of the effect of 
> {{LogTransaction.untrackNew/LogFile.remove}}
> If a streaming exception causes the LogTransaction to be aborted and tidied 
> just before {{SimpleSSTableMultiWriter}} calls trackNew to add a new sstable. 
> At the time of the call, the {{LogFile}} will not contain any {{LogReplicas}},
> {{LogFile.records}} will be empty, and {{LogFile.onDiskRecords}} will contain 
> an {{ABORT}}.
> When {{LogTransaction.trackNew/LogFile.add}} is called, the check for 
> completed transaction fails as records is empty, there are no replicas on the 
> datadir, so {{maybeCreateReplicas}} creates a new txnlog file replica 
> containing ABORT, then
> appends an ADD record.
> The LogFile has already been tidied after the abort so the txnlog file is not 
> removed and sits on disk until a restart, causing the faiulre.
> There is a related exception caused with a different interleaving of aborts, 
> after an sstable is added, however this is just a nuisance in the logs as the 
> LogRelica is already created with an {{ADD}} record first.
> {code:java}
> java.lang.AssertionError: 
> [ADD:[/datadir1/keyspace/table/nb-23314378-big-,0,8][1869379820]] is not 
> tracked by 55be35b0-35d1-11ee-865d-8b1e3c48ca06
> at org.apache.cassandra.db.lifecycle.LogFile.remove(LogFile.java:388)
> at 
> org.apache.cassandra.db.lifecycle.LogTransaction.untrackNew(LogTransaction.java:158)
> at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.untrackNew(LifecycleTransaction.java:577)
> at 
> org.apache.cassandra.db.streaming.CassandraStreamReceiver$1.untrackNew(CassandraStreamReceiver.java:149)
> at 
> org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.abort(SimpleSSTableMultiWriter.java:95)
> at 
> org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter.abort(RangeAwareSSTableWriter.java:191)
> at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamReader.read(CassandraCompressedStreamReader.java:115)
> at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:85)
> at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:53)
> at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:38)
> 

[jira] [Created] (CASSANDRA-18736) Streaming exception race creates corrupt transaction log files that prevent restart

2023-08-08 Thread Jon Meredith (Jira)
Jon Meredith created CASSANDRA-18736:


 Summary: Streaming exception race creates corrupt transaction log 
files that prevent restart
 Key: CASSANDRA-18736
 URL: https://issues.apache.org/jira/browse/CASSANDRA-18736
 Project: Cassandra
  Issue Type: Bug
  Components: Consistency/Streaming, Local/Startup and Shutdown
Reporter: Jon Meredith
Assignee: Jon Meredith


On restart, Cassandra logs this message and terminates.
{code:java}
ERROR 2023-07-17T17:17:22,931 [main] 
org.apache.cassandra.db.lifecycle.LogTransaction:561 - Unexpected disk state: 
failed to read transaction log 
[nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log in 
/datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043]
Files and contents follow:
/datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log
ABORT:[,0,0][737437348]
***This record should have been the last one in all replicas

ADD:[/datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043/nb-284490-big-,0,8][2493503833]
{code}
The root cause is a race during streaming exception handling.

Although concurrent modification of to the {{LogTransaction}} was added for 
CASSANDRA-16225, there is nothing to prevent usage after the transaction is 
completed (committed/aborted) once it has been processed by 
{{TransactionTidier}} (after the last reference is released). Before the 
transaction is tidied, the {{LogFile}} keeps a list of records that are checked 
for completion before adding new entries. In {{TransactionTidier}} 
{{LogFile.records}} are cleared as no longer needed, however the 
LogTransaction/LogFile is still accessible to the stream.

The changes in CASSANDRA-17273 added a parallel set of {{onDiskRecords}} that 
could be used to reliably recreate the transaction log at any new datadirs the 
same as the existing
datadirs - regardless of the effect of 
{{LogTransaction.untrackNew/LogFile.remove}}

If a streaming exception causes the LogTransaction to be aborted and tidied 
just before {{SimpleSSTableMultiWriter}} calls trackNew to add a new sstable. 
At the time of the call, the {{LogFile}} will not contain any {{LogReplicas}},
{{LogFile.records}} will be empty, and {{LogFile.onDiskRecords}} will contain 
an {{ABORT}}.

When {{LogTransaction.trackNew/LogFile.add}} is called, the check for completed 
transaction fails as records is empty, there are no replicas on the datadir, so 
{{maybeCreateReplicas}} creates a new txnlog file replica containing ABORT, then
appends an ADD record.

The LogFile has already been tidied after the abort so the txnlog file is not 
removed and sits on disk until a restart, causing the faiulre.

There is a related exception caused with a different interleaving of aborts, 
after an sstable is added, however this is just a nuisance in the logs as the 
LogRelica is already created with an {{ADD}} record first.
{code:java}
java.lang.AssertionError: 
[ADD:[/datadir1/keyspace/table/nb-23314378-big-,0,8][1869379820]] is not 
tracked by 55be35b0-35d1-11ee-865d-8b1e3c48ca06
at org.apache.cassandra.db.lifecycle.LogFile.remove(LogFile.java:388)
at 
org.apache.cassandra.db.lifecycle.LogTransaction.untrackNew(LogTransaction.java:158)
at 
org.apache.cassandra.db.lifecycle.LifecycleTransaction.untrackNew(LifecycleTransaction.java:577)
at 
org.apache.cassandra.db.streaming.CassandraStreamReceiver$1.untrackNew(CassandraStreamReceiver.java:149)
at 
org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.abort(SimpleSSTableMultiWriter.java:95)
at 
org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter.abort(RangeAwareSSTableWriter.java:191)
at 
org.apache.cassandra.db.streaming.CassandraCompressedStreamReader.read(CassandraCompressedStreamReader.java:115)
at 
org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:85)
at 
org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:53)
at 
org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:38)
at 
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:53)
at 
org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:172)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18704) Test Failure: HandshakeTest.testOutboundConnectionfFallbackDuringUpgrades ClassCastException: Established -> Disconnected

2023-08-02 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17750479#comment-17750479
 ] 

Jon Meredith commented on CASSANDRA-18704:
--

+1 from me, thanks for fixing it.

> Test Failure: HandshakeTest.testOutboundConnectionfFallbackDuringUpgrades 
> ClassCastException: Established ->  Disconnected
> --
>
> Key: CASSANDRA-18704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18704
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Michael Semb Wever
>Assignee: Maxim Muzafarov
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The following has been witnesses a number of times in repeated runs, in 
> different jdks and different test configurations.
> {noformat}
> java.lang.ClassCastException: class 
> org.apache.cassandra.net.OutboundConnection$Established cannot be cast to 
> class org.apache.cassandra.net.OutboundConnection$Disconnected 
> (org.apache.cassandra.net.OutboundConnection$Established and 
> org.apache.cassandra.net.OutboundConnection$Disconnected are in unnamed 
> module of loader 'app')
>   at 
> org.apache.cassandra.net.OutboundConnection$State.disconnected(OutboundConnection.java:201)
>   at 
> org.apache.cassandra.net.OutboundConnection$1Initiate.initiate(OutboundConnection.java:1248)
>   at 
> org.apache.cassandra.net.OutboundConnection.initiate(OutboundConnection.java:1254)
>   at 
> org.apache.cassandra.net.HandshakeTest.initiateOutbound(HandshakeTest.java:365)
>   at 
> org.apache.cassandra.net.HandshakeTest.testOutboundFallbackOnSSLHandshakeFailure(HandshakeTest.java:380)
>   at 
> org.apache.cassandra.net.HandshakeTest.testOutboundConnectionfFallbackDuringUpgrades(HandshakeTest.java:255)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {noformat}
> ref: 
> https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/217/workflows/ce7392fa-756a-4392-b3a5-206bb2940553/jobs/15965/tests
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18704) Test Failure: HandshakeTest.testOutboundConnectionfFallbackDuringUpgrades ClassCastException: Established -> Disconnected

2023-08-02 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17750475#comment-17750475
 ] 

Jon Meredith commented on CASSANDRA-18704:
--

Sure, I can take a look.

> Test Failure: HandshakeTest.testOutboundConnectionfFallbackDuringUpgrades 
> ClassCastException: Established ->  Disconnected
> --
>
> Key: CASSANDRA-18704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18704
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Michael Semb Wever
>Assignee: Maxim Muzafarov
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The following has been witnesses a number of times in repeated runs, in 
> different jdks and different test configurations.
> {noformat}
> java.lang.ClassCastException: class 
> org.apache.cassandra.net.OutboundConnection$Established cannot be cast to 
> class org.apache.cassandra.net.OutboundConnection$Disconnected 
> (org.apache.cassandra.net.OutboundConnection$Established and 
> org.apache.cassandra.net.OutboundConnection$Disconnected are in unnamed 
> module of loader 'app')
>   at 
> org.apache.cassandra.net.OutboundConnection$State.disconnected(OutboundConnection.java:201)
>   at 
> org.apache.cassandra.net.OutboundConnection$1Initiate.initiate(OutboundConnection.java:1248)
>   at 
> org.apache.cassandra.net.OutboundConnection.initiate(OutboundConnection.java:1254)
>   at 
> org.apache.cassandra.net.HandshakeTest.initiateOutbound(HandshakeTest.java:365)
>   at 
> org.apache.cassandra.net.HandshakeTest.testOutboundFallbackOnSSLHandshakeFailure(HandshakeTest.java:380)
>   at 
> org.apache.cassandra.net.HandshakeTest.testOutboundConnectionfFallbackDuringUpgrades(HandshakeTest.java:255)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {noformat}
> ref: 
> https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/217/workflows/ce7392fa-756a-4392-b3a5-206bb2940553/jobs/15965/tests
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18681) Internode legacy SSL storage port certificate is not hot reloaded on update

2023-07-21 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18681:
-
 Bug Category: Parent values: Availability(12983)
   Complexity: Normal
Discovered By: User Report
 Severity: Low
   Status: Open  (was: Triage Needed)

> Internode legacy SSL storage port certificate is not hot reloaded on update
> ---
>
> Key: CASSANDRA-18681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18681
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Priority: Normal
>
> In CASSANDRA-1 the SSLContext cache was changed to clear individual 
> {{EncryptionOptions}} from the SslContext cache if they needed reloading to 
> reduce resource consumption. Before the change if ANY cert needed hot 
> reloading, the SSLContext cache would be cleared for ALL certs.
> If the legacy SSL storage port is configured, a new {{EncryptionOptions}} 
> object is created in {{org.apache.cassandra.net.InboundSockets#addBindings}} 
> just for binding the socket, but never gets cleared as the change in port 
> means it no longer matches the configuration retrieved from 
> {{DatabaseDescriptor}} in 
> {{org.apache.cassandra.net.MessagingServiceMBeanImpl#reloadSslCertificates}}.
> This is unlikely to be an issue in practice as the legacy SSL internode 
> socket is only used in mixed version clusters with pre-4.0 nodes, so the cert 
> only needs to stay valid until all nodes upgrade to 4.x or above.
> One way to avoid this class of failures is to just check the entries present 
> in the SSLContext cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-18681) Internode legacy SSL storage port certificate is not hot reloaded on update

2023-07-21 Thread Jon Meredith (Jira)
Jon Meredith created CASSANDRA-18681:


 Summary: Internode legacy SSL storage port certificate is not hot 
reloaded on update
 Key: CASSANDRA-18681
 URL: https://issues.apache.org/jira/browse/CASSANDRA-18681
 Project: Cassandra
  Issue Type: Bug
  Components: Messaging/Internode
Reporter: Jon Meredith


In CASSANDRA-1 the SSLContext cache was changed to clear individual 
{{EncryptionOptions}} from the SslContext cache if they needed reloading to 
reduce resource consumption. Before the change if ANY cert needed hot 
reloading, the SSLContext cache would be cleared for ALL certs.

If the legacy SSL storage port is configured, a new {{EncryptionOptions}} 
object is created in {{org.apache.cassandra.net.InboundSockets#addBindings}} 
just for binding the socket, but never gets cleared as the change in port means 
it no longer matches the configuration retrieved from {{DatabaseDescriptor}} in 
{{org.apache.cassandra.net.MessagingServiceMBeanImpl#reloadSslCertificates}}.

This is unlikely to be an issue in practice as the legacy SSL internode socket 
is only used in mixed version clusters with pre-4.0 nodes, so the cert only 
needs to stay valid until all nodes upgrade to 4.x or above.

One way to avoid this class of failures is to just check the entries present in 
the SSLContext cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18582) BulkLoader withCipherSuites option is ignored

2023-07-10 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18582:
-
Reviewers: Ekaterina Dimitrova, Jon Meredith  (was: Ekaterina Dimitrova)

> BulkLoader withCipherSuites option is ignored
> -
>
> Key: CASSANDRA-18582
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18582
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/bulk load
>Reporter: dan jatnieks
>Assignee: dan jatnieks
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.x
>
>
> The {{withCipherSuites}} option of {{BulkLoader}} is being ignored. It seems 
> that since CASSANDRA-16362 the {{BulkLoader.buildSSLOptions}} method no 
> longer applies the cipher suite options provided by 
> {{clientEncryptionOptions}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18582) BulkLoader withCipherSuites option is ignored

2023-07-06 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740819#comment-17740819
 ] 

Jon Meredith commented on CASSANDRA-18582:
--

will do

> BulkLoader withCipherSuites option is ignored
> -
>
> Key: CASSANDRA-18582
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18582
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/bulk load
>Reporter: dan jatnieks
>Assignee: dan jatnieks
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.x
>
>
> The {{withCipherSuites}} option of {{BulkLoader}} is being ignored. It seems 
> that since CASSANDRA-16362 the {{BulkLoader.buildSSLOptions}} method no 
> longer applies the cipher suite options provided by 
> {{clientEncryptionOptions}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18467) Update generate-idea-files for J17

2023-06-29 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738781#comment-17738781
 ] 

Jon Meredith commented on CASSANDRA-18467:
--

+1 on commit 596bfadab5e385ed254e34c38b8210311887c239 - I checked the 
JMXFeatureTest worked under Java 8/11/17.

My only slight concern is the change to remove and existing .idea directory 
before regeneration, losing custom test configurations. They can be saved to 
the project directory instead if people still need them, but may be burned the 
first few times. 

> Update generate-idea-files for J17
> --
>
> Key: CASSANDRA-18467
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18467
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Ekaterina Dimitrova
>Assignee: Jakub Zytka
>Priority: Low
> Fix For: 5.x
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> There was a discussion in CASSANDRA-18258 how to update generate-idea-files.
> The final agreement was to create one target to cover both Java 11 and Java 
> 17.
> It will be good to figure out CASSANDRA-18263 and reshuffle arguments and 
> tasks based on what we decide to use as gc in testing for both Java 11 and 
> Java 17.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18559) Upgrade to 4.1.1 fails with NullPointerException

2023-06-01 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17728412#comment-17728412
 ] 

Jon Meredith commented on CASSANDRA-18559:
--

This looks like a result of c2f24d2c45aae6030310d881dcd96ba60d04a2ad to enforce 
the internode encryption policy on inbound connections.

Without rack/dc information from the snitch, should unencrypted rack or dc 
connections be permitted? I would say not which still causes the availability 
issue but should give a nicer message than the NPE.

As a workaround, can the snitch be made to supply rack/dc for all the members 
of the cluster?

> Upgrade to 4.1.1 fails with NullPointerException
> 
>
> Key: CASSANDRA-18559
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18559
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Eric Evans
>Priority: Normal
> Fix For: 4.1.x
>
>
> When upgrading from 3.11.14 to 4.1.1 —and when {{internode_encryption}} is 
> one of {{dc}} or {{{}rack{}}}— startup fails with an NPE.
>  
> {noformat}
> io.netty.handler.codec.DecoderException: java.lang.NullPointerException
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:478)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.handlerRemoved(ByteToMessageDecoder.java:253)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:515)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:447)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
>   at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>   at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:750)
> Caused by: java.lang.NullPointerException: null
>   at 
> org.apache.cassandra.locator.GossipingPropertyFileSnitch.getRack(GossipingPropertyFileSnitch.java:116)
>   at 
> org.apache.cassandra.locator.DynamicEndpointSnitch.getRack(DynamicEndpointSnitch.java:162)
>   at 
> org.apache.cassandra.config.EncryptionOptions$ServerEncryptionOptions.shouldEncrypt(EncryptionOptions.java:682)
>   at 
> org.apache.cassandra.net.InboundConnectionInitiator$Handler.isEncryptionRequired(InboundConnectionInitiator.java:363)
>   at 
> org.apache.cassandra.net.InboundConnectionInitiator$Handler.initiate(InboundConnectionInitiator.java:278)
>   at 
> org.apache.cassandra.net.InboundConnectionInitiator$Handler.decode(InboundConnectionInitiator.java:265)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:508)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:447)
>   ... 22 common frames omitted
> {noformat}
>  
> {noformat}
> io.netty.handler.codec.DecoderException: java.lang.NullPointerException
> at 
> 

[jira] [Updated] (CASSANDRA-18511) Add support for JMX in the in-jvm dtest framework

2023-05-31 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18511:
-
  Fix Version/s: 3.11.16
 4.0.11
 4.1.3
 (was: 3.11.x)
 (was: 4.0.x)
 (was: 4.1.x)
Source Control Link:  
https://github.com/apache/cassandra/commit/43ec1843918aba9e81d3c2dc1433a1ef4740a51f
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Add support for JMX in the in-jvm dtest framework
> -
>
> Key: CASSANDRA-18511
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18511
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 3.11.16, 4.0.11, 4.1.3, 5.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> In many cases, it would be useful to be able to enable JMX endpoints within 
> the in-jvm dtest framework, including the existing JMX Getter test, which 
> used to simply spin up a JMX registry and then leave it running.  There are 
> quite a few JMX-related functions that don’t have tests today, and some 
> external usages of the in-jvm dtest framework could also benefit from 
> exposing JMX like we did Native before.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18511) Add support for JMX in the in-jvm dtest framework

2023-05-30 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17727713#comment-17727713
 ] 

Jon Meredith commented on CASSANDRA-18511:
--

Starting over after pausing during the release vote.

Starting commit

CI Results (pending):
||Branch||Source||Circle CI||Jenkins||
|cassandra-3.11|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18511-cassandra-3.11-FBBC1B68-F894-4605-A0BA-A4CA6F9BA47D]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18511-cassandra-3.11-FBBC1B68-F894-4605-A0BA-A4CA6F9BA47D]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2495/]|
|cassandra-4.0|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18511-cassandra-4.0-FBBC1B68-F894-4605-A0BA-A4CA6F9BA47D]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18511-cassandra-4.0-FBBC1B68-F894-4605-A0BA-A4CA6F9BA47D]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2496/]|
|cassandra-4.1|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18511-cassandra-4.1-FBBC1B68-F894-4605-A0BA-A4CA6F9BA47D]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18511-cassandra-4.1-FBBC1B68-F894-4605-A0BA-A4CA6F9BA47D]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2497/]|
|trunk|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18511-trunk-FBBC1B68-F894-4605-A0BA-A4CA6F9BA47D]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18511-trunk-FBBC1B68-F894-4605-A0BA-A4CA6F9BA47D]|[build|unknown]|

> Add support for JMX in the in-jvm dtest framework
> -
>
> Key: CASSANDRA-18511
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18511
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 3.11.x, 4.0.x, 4.1.x, 5.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> In many cases, it would be useful to be able to enable JMX endpoints within 
> the in-jvm dtest framework, including the existing JMX Getter test, which 
> used to simply spin up a JMX registry and then leave it running.  There are 
> quite a few JMX-related functions that don’t have tests today, and some 
> external usages of the in-jvm dtest framework could also benefit from 
> exposing JMX like we did Native before.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18511) Add support for JMX in the in-jvm dtest framework

2023-05-25 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17726263#comment-17726263
 ] 

Jon Meredith commented on CASSANDRA-18511:
--

+1 from me too

> Add support for JMX in the in-jvm dtest framework
> -
>
> Key: CASSANDRA-18511
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18511
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 3.11.x, 4.0.x, 4.1.x, 5.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> In many cases, it would be useful to be able to enable JMX endpoints within 
> the in-jvm dtest framework, including the existing JMX Getter test, which 
> used to simply spin up a JMX registry and then leave it running.  There are 
> quite a few JMX-related functions that don’t have tests today, and some 
> external usages of the in-jvm dtest framework could also benefit from 
> exposing JMX like we did Native before.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18537) Add JMX utility class to in-jvm dtest to ease development of new tests using JMX

2023-05-18 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724089#comment-17724089
 ] 

Jon Meredith commented on CASSANDRA-18537:
--

+1

> Add JMX utility class to in-jvm dtest to ease development of new tests using 
> JMX
> 
>
> Key: CASSANDRA-18537
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18537
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Doug Rohrer
>Priority: Normal
>
> While reviewing CASSANDRA-18511, some repetitive code was identified across 
> the 4 branches, and 2 different tests, that would also be repeated for any 
> new usages of the JMX support in the in-jvm dtest framework. Therefore, a 
> utility class should be added to the dtest-api's `shared` package that will 
> simplify some of this repetitive and error-prone code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18511) Add support for JMX in the in-jvm dtest framework

2023-05-16 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18511:
-
Reviewers: Alex Petrov, Jon Meredith  (was: Alex Petrov)

> Add support for JMX in the in-jvm dtest framework
> -
>
> Key: CASSANDRA-18511
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18511
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 3.11.x, 4.0.x, 4.1.x, 5.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In many cases, it would be useful to be able to enable JMX endpoints within 
> the in-jvm dtest framework, including the existing JMX Getter test, which 
> used to simply spin up a JMX registry and then leave it running.  There are 
> quite a few JMX-related functions that don’t have tests today, and some 
> external usages of the in-jvm dtest framework could also benefit from 
> exposing JMX like we did Native before.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18505) NPE when deserializing malformed collections from client

2023-05-09 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18505:
-
  Fix Version/s: 4.0.10
 4.1.2
 (was: 4.0.x)
 (was: 4.1.x)
  Since Version: 3.0.0
Source Control Link:  
https://github.com/apache/cassandra/commit/ae995eb3d3cc1c98f61db0d071522b6f09443927
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> NPE when deserializing malformed collections from client
> 
>
> Key: CASSANDRA-18505
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18505
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0.10, 4.1.2, 5.0
>
>
> When deserializing collections sent from the client, if an element in the 
> collection is incorrectly serialized, Collections.getValue can return null if 
> the length of the element is negative.  Currently this isn't detected and 
> serialization continues, calling validate and throwing an NPE in serializers 
> that don't handle null value buffers.
> Detect the malformed input and throw a better MarshalException so it will be 
> converted to an InvalidRequestException for the client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18505) NPE when deserializing malformed collections from client

2023-05-09 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18505:
-
Status: Ready to Commit  (was: Review In Progress)

> NPE when deserializing malformed collections from client
> 
>
> Key: CASSANDRA-18505
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18505
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0
>
>
> When deserializing collections sent from the client, if an element in the 
> collection is incorrectly serialized, Collections.getValue can return null if 
> the length of the element is negative.  Currently this isn't detected and 
> serialization continues, calling validate and throwing an NPE in serializers 
> that don't handle null value buffers.
> Detect the malformed input and throw a better MarshalException so it will be 
> converted to an InvalidRequestException for the client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18505) NPE when deserializing malformed collections from client

2023-05-09 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720977#comment-17720977
 ] 

Jon Meredith commented on CASSANDRA-18505:
--

Resubmitted after review feedback

Starting commit

CI Results (pending):
||Branch||Source||Circle CI||Jenkins||
|cassandra-4.0|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18505-cassandra-4.0-3823DF16-40CB-4A83-9D4D-0D4662C64AF4]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18505-cassandra-4.0-3823DF16-40CB-4A83-9D4D-0D4662C64AF4]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2461/]|
|cassandra-4.1|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18505-cassandra-4.1-3823DF16-40CB-4A83-9D4D-0D4662C64AF4]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18505-cassandra-4.1-3823DF16-40CB-4A83-9D4D-0D4662C64AF4]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2462/]|
|trunk|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18505-trunk-3823DF16-40CB-4A83-9D4D-0D4662C64AF4]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18505-trunk-3823DF16-40CB-4A83-9D4D-0D4662C64AF4]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2463/]|


> NPE when deserializing malformed collections from client
> 
>
> Key: CASSANDRA-18505
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18505
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0
>
>
> When deserializing collections sent from the client, if an element in the 
> collection is incorrectly serialized, Collections.getValue can return null if 
> the length of the element is negative.  Currently this isn't detected and 
> serialization continues, calling validate and throwing an NPE in serializers 
> that don't handle null value buffers.
> Detect the malformed input and throw a better MarshalException so it will be 
> converted to an InvalidRequestException for the client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18505) NPE when deserializing malformed collections from client

2023-05-08 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720711#comment-17720711
 ] 

Jon Meredith commented on CASSANDRA-18505:
--

CI Results (pending):
||Branch||Source||Circle CI||Jenkins||
|cassandra-4.0|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18505-cassandra-4.0-4241543C-EA7D-4586-BA9E-34181A5B966A]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18505-cassandra-4.0-4241543C-EA7D-4586-BA9E-34181A5B966A]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2453]|
|cassandra-4.1|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18505-cassandra-4.1-4241543C-EA7D-4586-BA9E-34181A5B966A]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18505-cassandra-4.1-4241543C-EA7D-4586-BA9E-34181A5B966A]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2454/]|
|trunk|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18505-trunk-4241543C-EA7D-4586-BA9E-34181A5B966A]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18505-trunk-4241543C-EA7D-4586-BA9E-34181A5B966A]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2455/]|

> NPE when deserializing malformed collections from client
> 
>
> Key: CASSANDRA-18505
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18505
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0
>
>
> When deserializing collections sent from the client, if an element in the 
> collection is incorrectly serialized, Collections.getValue can return null if 
> the length of the element is negative.  Currently this isn't detected and 
> serialization continues, calling validate and throwing an NPE in serializers 
> that don't handle null value buffers.
> Detect the malformed input and throw a better MarshalException so it will be 
> converted to an InvalidRequestException for the client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18505) NPE when deserializing malformed collections from client

2023-05-08 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720700#comment-17720700
 ] 

Jon Meredith commented on CASSANDRA-18505:
--

More difference in each branch than I thought, including additional usage for 
Lists in 4.1/trunk.  Opening up PRs for the other branches for clarity.

[4.0 PR|https://github.com/apache/cassandra/pull/2312]
[4.1 PR|https://github.com/apache/cassandra/pull/2313]
[trunk PR|https://github.com/apache/cassandra/pull/2314]

> NPE when deserializing malformed collections from client
> 
>
> Key: CASSANDRA-18505
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18505
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0
>
>
> When deserializing collections sent from the client, if an element in the 
> collection is incorrectly serialized, Collections.getValue can return null if 
> the length of the element is negative.  Currently this isn't detected and 
> serialization continues, calling validate and throwing an NPE in serializers 
> that don't handle null value buffers.
> Detect the malformed input and throw a better MarshalException so it will be 
> converted to an InvalidRequestException for the client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18505) NPE when deserializing malformed collections from client

2023-05-08 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720604#comment-17720604
 ] 

Jon Meredith edited comment on CASSANDRA-18505 at 5/8/23 4:55 PM:
--

4.0 [PR|https://github.com/apache/cassandra/pull/2312] - others very similar


was (Author: jonmeredith):
4.0 [PR|https://github.com/apache/cassandra/pull/2312]

> NPE when deserializing malformed collections from client
> 
>
> Key: CASSANDRA-18505
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18505
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0
>
>
> When deserializing collections sent from the client, if an element in the 
> collection is incorrectly serialized, Collections.getValue can return null if 
> the length of the element is negative.  Currently this isn't detected and 
> serialization continues, calling validate and throwing an NPE in serializers 
> that don't handle null value buffers.
> Detect the malformed input and throw a better MarshalException so it will be 
> converted to an InvalidRequestException for the client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18505) NPE when deserializing malformed collections from client

2023-05-08 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18505:
-
Test and Documentation Plan: Run through CI
 Status: Patch Available  (was: Open)

4.0 [PR|https://github.com/apache/cassandra/pull/2312]

> NPE when deserializing malformed collections from client
> 
>
> Key: CASSANDRA-18505
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18505
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0
>
>
> When deserializing collections sent from the client, if an element in the 
> collection is incorrectly serialized, Collections.getValue can return null if 
> the length of the element is negative.  Currently this isn't detected and 
> serialization continues, calling validate and throwing an NPE in serializers 
> that don't handle null value buffers.
> Detect the malformed input and throw a better MarshalException so it will be 
> converted to an InvalidRequestException for the client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18505) NPE when deserializing malformed collections from client

2023-05-08 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18505:
-
 Bug Category: Parent values: Code(13163)
   Complexity: Low Hanging Fruit
  Component/s: Messaging/Client
Discovered By: User Report
Fix Version/s: 4.0.x
   4.1.x
   5.0
 Severity: Low
   Status: Open  (was: Triage Needed)

> NPE when deserializing malformed collections from client
> 
>
> Key: CASSANDRA-18505
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18505
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0
>
>
> When deserializing collections sent from the client, if an element in the 
> collection is incorrectly serialized, Collections.getValue can return null if 
> the length of the element is negative.  Currently this isn't detected and 
> serialization continues, calling validate and throwing an NPE in serializers 
> that don't handle null value buffers.
> Detect the malformed input and throw a better MarshalException so it will be 
> converted to an InvalidRequestException for the client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18505) NPE when deserializing malformed collections from client

2023-05-08 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720580#comment-17720580
 ] 

Jon Meredith commented on CASSANDRA-18505:
--

Example stack trace
{code}
at 
org.apache.cassandra.db.marshal.ByteBufferAccessor.size(ByteBufferAccessor.java)
at 
org.apache.cassandra.db.marshal.ByteBufferAccessor.size(ByteBufferAccessor.java)
at 
org.apache.cassandra.serializers.UUIDSerializer.validate(UUIDSerializer.java)
at 
org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java)
at org.apache.cassandra.cql3.Sets$Value.fromSerialized(Sets.java)
at org.apache.cassandra.cql3.Sets$Marker.bind(Sets.java)
at org.apache.cassandra.cql3.Sets$Setter.execute(Sets.java)
at 
org.apache.cassandra.cql3.statements.UpdateStatement.addUpdateForKey(UpdateStatement.java)
at 
org.apache.cassandra.cql3.statements.CQL3CasRequest$RowUpdate.applyUpdates(CQL3CasRequest.java)
at 
org.apache.cassandra.cql3.statements.CQL3CasRequest.makeUpdates(CQL3CasRequest.java)
at 
org.apache.cassandra.service.StorageProxy.lambda$cas$3(StorageProxy.java)
at org.apache.cassandra.service.StorageProxy.doPaxos(StorageProxy.java)
at org.apache.cassandra.service.StorageProxy.cas(StorageProxy.java)
{code}

> NPE when deserializing malformed collections from client
> 
>
> Key: CASSANDRA-18505
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18505
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> When deserializing collections sent from the client, if an element in the 
> collection is incorrectly serialized, Collections.getValue can return null if 
> the length of the element is negative.  Currently this isn't detected and 
> serialization continues, calling validate and throwing an NPE in serializers 
> that don't handle null value buffers.
> Detect the malformed input and throw a better MarshalException so it will be 
> converted to an InvalidRequestException for the client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-18505) NPE when deserializing malformed collections from client

2023-05-08 Thread Jon Meredith (Jira)
Jon Meredith created CASSANDRA-18505:


 Summary: NPE when deserializing malformed collections from client
 Key: CASSANDRA-18505
 URL: https://issues.apache.org/jira/browse/CASSANDRA-18505
 Project: Cassandra
  Issue Type: Bug
Reporter: Jon Meredith
Assignee: Jon Meredith


When deserializing collections sent from the client, if an element in the 
collection is incorrectly serialized, Collections.getValue can return null if 
the length of the element is negative.  Currently this isn't detected and 
serialization continues, calling validate and throwing an NPE in serializers 
that don't handle null value buffers.

Detect the malformed input and throw a better MarshalException so it will be 
converted to an InvalidRequestException for the client.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18047) fix flaky o.a.c.distributed.test.PaxosRepair2Test.paxosRepairHistoryIsntUpdatedInForcedRepair

2023-05-03 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18047:
-
  Fix Version/s: 4.1.2
 5.0
 (was: 5.x)
 (was: 4.1.x)
  Since Version: 4.1.0  (was: 4.1.x)
Source Control Link: 
https://github.com/apache/cassandra/commit/602ffcbf3e4ead4732fdf46d506165f63d80a9a4
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> fix flaky 
> o.a.c.distributed.test.PaxosRepair2Test.paxosRepairHistoryIsntUpdatedInForcedRepair
> -
>
> Key: CASSANDRA-18047
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18047
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Stefan Miklosovic
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.1.2, 5.0
>
>
> This test was introduced by CASSANDRA-18029
>  
> {code:java}
> junit.framework.AssertionFailedError: Repair failed with errors: [Repair 
> session 864c53d0-61fe-11ed-935f-5103a8e332f7 for range 
> [(-3074457345618258603,3074457345618258601], 
> (9223372036854775805,-3074457345618258603], 
> (3074457345618258601,9223372036854775805]] failed with error UNKNOWN failure 
> response from /127.0.0.3:7012, Repair command #1 finished with error] at 
> org.apache.cassandra.distributed.test.PaxosRepair2Test.lambda$repair$54f7d7c2$1(PaxosRepair2Test.java:186)
>  at org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96) at 
> org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> different error:
> {code:java}
> junit.framework.AssertionFailedError: Repair failed with errors: [Endpoint 
> not alive: /127.0.0.3:7012, Repair command #1 finished with error]
>   at 
> org.apache.cassandra.distributed.test.PaxosRepair2Test.lambda$repair$54f7d7c2$1(PaxosRepair2Test.java:186)
>   at org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96)
>   at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)
>   at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



  1   2   3   4   5   6   7   8   >