[jira] [Updated] (CASSANDRA-18929) CEP-15: (C*) Implement TopologySorter to prioritise hosts based on DynamicSnitch and/or topology layout

2023-10-13 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-18929:
--
Change Category: Performance
 Complexity: Normal
  Fix Version/s: 5.x
   Assignee: David Capwell
 Status: Open  (was: Triage Needed)

> CEP-15: (C*) Implement TopologySorter to prioritise hosts based on 
> DynamicSnitch and/or topology layout
> ---
>
> Key: CASSANDRA-18929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18929
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 5.x
>
>
> Implement TopologySorter to prioritise hosts based on DynamicSnitch and/or 
> topology layout



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-18929) CEP-15: (C*) Implement TopologySorter to prioritise hosts based on DynamicSnitch and/or topology layout

2023-10-13 Thread David Capwell (Jira)
David Capwell created CASSANDRA-18929:
-

 Summary: CEP-15: (C*) Implement TopologySorter to prioritise hosts 
based on DynamicSnitch and/or topology layout
 Key: CASSANDRA-18929
 URL: https://issues.apache.org/jira/browse/CASSANDRA-18929
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: David Capwell


Implement TopologySorter to prioritise hosts based on DynamicSnitch and/or 
topology layout



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18904) Repair vtable caches consume excessive memory

2023-10-13 Thread Abe Ratnofsky (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775017#comment-17775017
 ] 

Abe Ratnofsky commented on CASSANDRA-18904:
---

PRs up:
- trunk: https://github.com/apache/cassandra/pull/2804
- 4.1: https://github.com/apache/cassandra/pull/2805

4.1 slightly differs from trunk due to the exclusion of CASSANDRA-18816, so I'm 
opening that up in a separate PR. I'll sync up all the branches based on the PR 
feedback.

> Repair vtable caches consume excessive memory
> -
>
> Key: CASSANDRA-18904
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18904
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Local/Caching
>Reporter: Abe Ratnofsky
>Assignee: Abe Ratnofsky
>Priority: Normal
>
> Currently, the repair vtables 
> (system_views.{repairs,repair_sessions,repair_jobs,repair_participates,repair_validations})
>  are backed by caches in ActiveRepairService that are bounded by the number 
> of elements in them, controlled by Config.repair_state_size and 
> Config.repair_state_expires.
> The individual cached elements are mutable, and can grow to retain a 
> significant amount of heap as the instance uptime increases and more repairs 
> are run. In a heap dump for a real cluster, I found these caches occupying 
> ~1GB of heap total between ActiveRepairService.repairs and 
> ActiveRepairService.participates. Individual cached elements were reaching 
> 100KB in size, so configuring the caches by number of elements introduces a 
> significant amount of potential variance in the actual heap usage of these 
> caches.
> We should measure these caches by the heap they retain, not by the number of 
> elements. Users should not be expected to check heap dumps to calibrate the 
> number of elements they configure the caches to consume - specifying a memory 
> total is much more user-friendly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18928) Simplify handling of Insufficient replies from Commit and Apply

2023-10-13 Thread Aleksey Yeschenko (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-18928:
--
Change Category: Code Clarity
 Complexity: Normal
  Reviewers: Benedict Elliott Smith
 Status: Open  (was: Triage Needed)

> Simplify handling of Insufficient replies from Commit and Apply
> ---
>
> Key: CASSANDRA-18928
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18928
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Aleksey Yeschenko
>Assignee: Aleksey Yeschenko
>Priority: Normal
>
> Remove the use of Defer for Commit, and reply with Maximal Apply to 
> Insufficient Apply responses



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-18928) Simplify handling of Insufficient replies from Commit and Apply

2023-10-13 Thread Aleksey Yeschenko (Jira)
Aleksey Yeschenko created CASSANDRA-18928:
-

 Summary: Simplify handling of Insufficient replies from Commit and 
Apply
 Key: CASSANDRA-18928
 URL: https://issues.apache.org/jira/browse/CASSANDRA-18928
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Aleksey Yeschenko
Assignee: Aleksey Yeschenko


Remove the use of Defer for Commit, and reply with Maximal Apply to 
Insufficient Apply responses



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18710) Test failure: org.apache.cassandra.io.DiskSpaceMetricsTest.testFlushSize-.jdk17 (from org.apache.cassandra.io.DiskSpaceMetricsTest-.jdk17)

2023-10-13 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-18710:
-
Attachment: org.apache.cassandra.io.DiskSpaceMetricsTest.txt

> Test failure: 
> org.apache.cassandra.io.DiskSpaceMetricsTest.testFlushSize-.jdk17 (from 
> org.apache.cassandra.io.DiskSpaceMetricsTest-.jdk17)
> --
>
> Key: CASSANDRA-18710
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18710
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Ekaterina Dimitrova
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
> Attachments: org.apache.cassandra.io.DiskSpaceMetricsTest.txt
>
>
> Seen here:
> [https://ci-cassandra.apache.org/job/Cassandra-trunk/1644/testReport/org.apache.cassandra.io/DiskSpaceMetricsTest/testFlushSize__jdk17/]
> h3.  
> {code:java}
> Error Message
> expected:<7200.0> but was:<1367.83970468544>
> Stacktrace
> junit.framework.AssertionFailedError: expected:<7200.0> but 
> was:<1367.83970468544> at 
> org.apache.cassandra.io.DiskSpaceMetricsTest.testFlushSize(DiskSpaceMetricsTest.java:119)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18710) Test failure: org.apache.cassandra.io.DiskSpaceMetricsTest.testFlushSize-.jdk17 (from org.apache.cassandra.io.DiskSpaceMetricsTest-.jdk17)

2023-10-13 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774918#comment-17774918
 ] 

Brandon Williams edited comment on CASSANDRA-18710 at 10/13/23 2:06 PM:


Running with [this 
patch|https://github.com/driftx/cassandra/commit/286bb5cd7c36c62e541cc79b025931215a982bc3],
 I've managed to reproduce, and it indicates the culprit sstable:

bq. [junit-timeout] INFO  [main] 2023-10-12 21:55:12,181 
DiskSpaceMetricsTest.java:125 - smallest sstable is 
/tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db
 at 2329 bytes

If we grep the log for that sstable:
{quote}
[junit-timeout] INFO  [PerDiskMemtableFlushWriter_0:2] 2023-10-12 21:55:11,128 
Flushing.java:180 - Completed flushing 
/tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db
 (6.839KiB) for commitlog position CommitLogPosition(segmentId=1697147706890, 
position=211)
[junit-timeout] DEBUG [MemtableFlushWriter:2] 2023-10-12 21:55:11,177 
ColumnFamilyStore.java:1345 - Flushed to 
[BigTableReader:big(path='/tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db')]
 (1 sstables, 11.232KiB), biggest 11.232KiB, smallest 11.232KiB
[junit-timeout] INFO  [main] 2023-10-12 21:55:12,181 
DiskSpaceMetricsTest.java:125 - smallest sstable is 
/tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db
 at 2329 bytes
{quote}

It looks like SSTR.onDiskLength() and bytesOnDisk() disagree at some point, 
which seems like a bug.  [~blambov] can you take a look? I've uploaded the full 
log from the failure.


was (Author: brandon.williams):
Running with [this 
patch|https://github.com/driftx/cassandra/commit/286bb5cd7c36c62e541cc79b025931215a982bc3],
 I've managed to reproduce, and it indicates the culprit sstable:

bq. [junit-timeout] INFO  [main] 2023-10-12 21:55:12,181 
DiskSpaceMetricsTest.java:125 - smallest sstable is 
/tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db
 at 2329 bytes

If we grep the log for that sstable:
{quote}
[junit-timeout] INFO  [PerDiskMemtableFlushWriter_0:2] 2023-10-12 21:55:11,128 
Flushing.java:180 - Completed flushing 
/tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db
 (6.839KiB) for commitlog position CommitLogPosition(segmentId=1697147706890, 
position=211)
[junit-timeout] DEBUG [MemtableFlushWriter:2] 2023-10-12 21:55:11,177 
ColumnFamilyStore.java:1345 - Flushed to 
[BigTableReader:big(path='/tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db')]
 (1 sstables, 11.232KiB), biggest 11.232KiB, smallest 11.232KiB
[junit-timeout] INFO  [main] 2023-10-12 21:55:12,181 
DiskSpaceMetricsTest.java:125 - smallest sstable is 
/tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db
 at 2329 bytes
{quote}

It looks like SSTR.onDiskLength() and bytesOnDisk() disagree at some point, 
which seems like a bug.  [~blambov] can you take a look?

> Test failure: 
> org.apache.cassandra.io.DiskSpaceMetricsTest.testFlushSize-.jdk17 (from 
> org.apache.cassandra.io.DiskSpaceMetricsTest-.jdk17)
> --
>
> Key: CASSANDRA-18710
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18710
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Ekaterina Dimitrova
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
> Attachments: org.apache.cassandra.io.DiskSpaceMetricsTest.txt
>
>
> Seen here:
> [https://ci-cassandra.apache.org/job/Cassandra-trunk/1644/testReport/org.apache.cassandra.io/DiskSpaceMetricsTest/testFlushSize__jdk17/]
> h3.  
> {code:java}
> Error Message
> expected:<7200.0> but was:<1367.83970468544>
> Stacktrace
> junit.framework.AssertionFailedError: expected:<7200.0> but 
> was:<1367.83970468544> at 
> org.apache.cassandra.io.DiskSpaceMetricsTest.testFlushSize(DiskSpaceMetricsTest.java:119)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CASSANDRA-18710) Test failure: org.apache.cassandra.io.DiskSpaceMetricsTest.testFlushSize-.jdk17 (from org.apache.cassandra.io.DiskSpaceMetricsTest-.jdk17)

2023-10-13 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774918#comment-17774918
 ] 

Brandon Williams commented on CASSANDRA-18710:
--

Running with [this 
patch|https://github.com/driftx/cassandra/commit/286bb5cd7c36c62e541cc79b025931215a982bc3],
 I've managed to reproduce, and it indicates the culprit sstable:

bq. [junit-timeout] INFO  [main] 2023-10-12 21:55:12,181 
DiskSpaceMetricsTest.java:125 - smallest sstable is 
/tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db
 at 2329 bytes

If we grep the log for that sstable:
{quote}
[junit-timeout] INFO  [PerDiskMemtableFlushWriter_0:2] 2023-10-12 21:55:11,128 
Flushing.java:180 - Completed flushing 
/tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db
 (6.839KiB) for commitlog position CommitLogPosition(segmentId=1697147706890, 
position=211)
[junit-timeout] DEBUG [MemtableFlushWriter:2] 2023-10-12 21:55:11,177 
ColumnFamilyStore.java:1345 - Flushed to 
[BigTableReader:big(path='/tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db')]
 (1 sstables, 11.232KiB), biggest 11.232KiB, smallest 11.232KiB
[junit-timeout] INFO  [main] 2023-10-12 21:55:12,181 
DiskSpaceMetricsTest.java:125 - smallest sstable is 
/tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db
 at 2329 bytes
{quote}

It looks like SSTR.onDiskLength() and bytesOnDisk() disagree at some point, 
which seems like a bug.  [~blambov] can you take a look?

> Test failure: 
> org.apache.cassandra.io.DiskSpaceMetricsTest.testFlushSize-.jdk17 (from 
> org.apache.cassandra.io.DiskSpaceMetricsTest-.jdk17)
> --
>
> Key: CASSANDRA-18710
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18710
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Ekaterina Dimitrova
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> Seen here:
> [https://ci-cassandra.apache.org/job/Cassandra-trunk/1644/testReport/org.apache.cassandra.io/DiskSpaceMetricsTest/testFlushSize__jdk17/]
> h3.  
> {code:java}
> Error Message
> expected:<7200.0> but was:<1367.83970468544>
> Stacktrace
> junit.framework.AssertionFailedError: expected:<7200.0> but 
> was:<1367.83970468544> at 
> org.apache.cassandra.io.DiskSpaceMetricsTest.testFlushSize(DiskSpaceMetricsTest.java:119)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-website] branch asf-staging updated (209ffb5f -> 28f6b431)

2023-10-13 Thread git-site-role
This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a change to branch asf-staging
in repository https://gitbox.apache.org/repos/asf/cassandra-website.git


 discard 209ffb5f generate docs for db70fb96
 new 28f6b431 generate docs for db70fb96

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (209ffb5f)
\
 N -- N -- N   refs/heads/asf-staging (28f6b431)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 site-ui/build/ui-bundle.zip | Bin 4881412 -> 4881412 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-10-13 Thread Jacek Lewandowski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774843#comment-17774843
 ] 

Jacek Lewandowski commented on CASSANDRA-18798:
---

[~henrik.ingo] I think you are focused on timestamps but timestamps is not the 
problem which causes incorrect order of items in the list. It is the cell path 
content, which is populated with timeuuid collected too early. Therefore, I'm 
afraid that manipulating timestamps will gives us nothing - cell path needs to 
be populated at the application time.

> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 5.0-alpha2
>
> Attachments: image-2023-09-26-20-05-25-846.png
>
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in 
> the edn format from a test.
> {code:java}
> {:type :invoke    :process 8    :value [[:append 5 352]]    :tid 3    :n 52   
>  :time 1692607285967116627}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 54    
> :time 1692607286078732473}
> {:type :invoke    :process 6    :value [[:append 5 553]]    :tid 5    :n 53   
>  :time 1692607286133833428}
> {:type :invoke    :process 7    :value [[:append 5 455]]    :tid 4    :n 55   
>  :time 1692607286149702511}
> {:type :ok    :process 8    :value [[:append 5 352]]    :tid 3    :n 52    
> :time 1692607286156314099}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 52    
> :time 1692607286167090389}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352]]]    :tid 1    :n 54    :time 1692607286168657534}
> {:type :invoke    :process 1    :value [[:r 5 nil]]    :tid 0    :n 51    
> :time 1692607286201762938}
> {:type :ok    :process 7    :value [[:append 5 455]]    :tid 4    :n 55    
> :time 1692607286245571513}
> {:type :invoke    :process 7    :value [[:r 5 nil]]    :tid 4    :n 56    
> :time 1692607286245655775}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 455]]]    :tid 9    :n 52    :time 1692607286253928906}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 53    
> :time 1692607286254095215}
> {:type :ok    :process 6    :value [[:append 5 553]]    :tid 5    :n 53    
> :time 1692607286266263422}
> {:type :ok    :process 1    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 0    :n 51    :time 1692607286271617955}
> {:type :ok    :process 7    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 4    :n 56    :time 1692607286271816933}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 9    :n 53    :time 1692607286281483026}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 56    
> :time 1692607286284097561}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 1    :n 56  

[jira] [Commented] (CASSANDRA-18747) Test failure: Fix assertion error AssertionError: Unknown keyspace system_auth\n\tat org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat org.apache.ca

2023-10-13 Thread Jacek Lewandowski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774836#comment-17774836
 ] 

Jacek Lewandowski commented on CASSANDRA-18747:
---

Some methods stayed there as they were previously.

> Test failure: Fix assertion error AssertionError: Unknown keyspace 
> system_auth\n\tat 
> org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat 
> org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)
> ---
>
> Key: CASSANDRA-18747
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18747
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema, Test/dtest/python
>Reporter: Ekaterina Dimitrova
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> I've been seeing this assertion error in different tests lately.
> Full error message:
> {code:java}
> failed on teardown with "Unexpected error found in node logs (see stdout for 
> full details). Errors: [[node2] 'ERROR [PendingRangeCalculator:1] 2023-08-11 
> 16:35:14,445 JVMStabilityInspector.java:70 - Exception in thread 
> Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError:
>  Unknown keyspace system_auth\n\tat 
> org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat 
> org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat 
> org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat
>  
> org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat
>  org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat 
> org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat 
> org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat
>  
> org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat
>  
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)']" Unexpected error found in 
> node logs (see stdout for full details). Errors: [[node2] 'ERROR 
> [PendingRangeCalculator:1] 2023-08-11 16:35:14,445 
> JVMStabilityInspector.java:70 - Exception in thread 
> Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError:
>  Unknown keyspace system_auth\n\tat 
> org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat 
> org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat 
> org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat
>  
> org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat
>  org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat 
> org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat 
> org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat
>  
> org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat
>  
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)']{code}
> Example failures:
> test_failed_snitch_update_property_file_snitch - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2475/workflows/2086619e-0f21-464b-a866-84aca516b5e5/jobs/36716/tests]
> test_gcgs_validation - 
> [https://ci-cassandra.apache.org/job/Cassandra-trunk/1666/testReport/junit/dtest.materialized_views_test/TestMaterializedViews/test_gcgs_validation/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18747) Test failure: Fix assertion error AssertionError: Unknown keyspace system_auth\n\tat org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat org.apache.ca

2023-10-13 Thread Jacek Lewandowski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774834#comment-17774834
 ] 

Jacek Lewandowski commented on CASSANDRA-18747:
---

Regarding the first comment, you are probably right, fresh look was needed 
indeed;

Regarding the second question - if you mean local and distributed - because 
local are not synchronized across the cluster

> Test failure: Fix assertion error AssertionError: Unknown keyspace 
> system_auth\n\tat 
> org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat 
> org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)
> ---
>
> Key: CASSANDRA-18747
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18747
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema, Test/dtest/python
>Reporter: Ekaterina Dimitrova
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> I've been seeing this assertion error in different tests lately.
> Full error message:
> {code:java}
> failed on teardown with "Unexpected error found in node logs (see stdout for 
> full details). Errors: [[node2] 'ERROR [PendingRangeCalculator:1] 2023-08-11 
> 16:35:14,445 JVMStabilityInspector.java:70 - Exception in thread 
> Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError:
>  Unknown keyspace system_auth\n\tat 
> org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat 
> org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat 
> org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat
>  
> org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat
>  org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat 
> org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat 
> org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat
>  
> org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat
>  
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)']" Unexpected error found in 
> node logs (see stdout for full details). Errors: [[node2] 'ERROR 
> [PendingRangeCalculator:1] 2023-08-11 16:35:14,445 
> JVMStabilityInspector.java:70 - Exception in thread 
> Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError:
>  Unknown keyspace system_auth\n\tat 
> org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat 
> org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat 
> org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat
>  
> org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat
>  org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat 
> org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat 
> org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat
>  
> org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat
>  
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)']{code}
> Example failures:
> test_failed_snitch_update_property_file_snitch - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2475/workflows/2086619e-0f21-464b-a866-84aca516b5e5/jobs/36716/tests]
> test_gcgs_validation - 
> [https://ci-cassandra.apache.org/job/Cassandra-trunk/1666/testReport/junit/dtest.materialized_views_test/TestMaterializedViews/test_gcgs_validation/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: 

[jira] [Comment Edited] (CASSANDRA-18747) Test failure: Fix assertion error AssertionError: Unknown keyspace system_auth\n\tat org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat org.apac

2023-10-13 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774832#comment-17774832
 ] 

Benjamin Lerer edited comment on CASSANDRA-18747 at 10/13/23 9:08 AM:
--

I looked at the code of 4.0 and 4.1 and thinking a bit more about it, I do not 
understand why the keyspaces were split into several groups.
Some methods look also wrong. There seem to be some confusions between what is 
called distributed keyspaces, non-system keyspaces and local keyspaces.
It feels to me that we should revisit that code more carefully. 


was (Author: blerer):
I looked at the code of 4.0 and 4.1 and thinking a bit more about it, I do not 
understand why the keyspaces were split into several groups.
Some methods look also wrong. There seems to be some confusions between what is 
called distributed keyspaces, non-system keyspaces and local keyspaces.
It feels to me that we should revisit that code more carefully. 

> Test failure: Fix assertion error AssertionError: Unknown keyspace 
> system_auth\n\tat 
> org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat 
> org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)
> ---
>
> Key: CASSANDRA-18747
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18747
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema, Test/dtest/python
>Reporter: Ekaterina Dimitrova
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> I've been seeing this assertion error in different tests lately.
> Full error message:
> {code:java}
> failed on teardown with "Unexpected error found in node logs (see stdout for 
> full details). Errors: [[node2] 'ERROR [PendingRangeCalculator:1] 2023-08-11 
> 16:35:14,445 JVMStabilityInspector.java:70 - Exception in thread 
> Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError:
>  Unknown keyspace system_auth\n\tat 
> org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat 
> org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat 
> org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat
>  
> org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat
>  org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat 
> org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat 
> org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat
>  
> org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat
>  
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)']" Unexpected error found in 
> node logs (see stdout for full details). Errors: [[node2] 'ERROR 
> [PendingRangeCalculator:1] 2023-08-11 16:35:14,445 
> JVMStabilityInspector.java:70 - Exception in thread 
> Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError:
>  Unknown keyspace system_auth\n\tat 
> org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat 
> org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat 
> org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat
>  
> org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat
>  org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat 
> org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat 
> org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat
>  
> org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat
>  
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  

[jira] [Comment Edited] (CASSANDRA-18747) Test failure: Fix assertion error AssertionError: Unknown keyspace system_auth\n\tat org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat org.apac

2023-10-13 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774832#comment-17774832
 ] 

Benjamin Lerer edited comment on CASSANDRA-18747 at 10/13/23 9:08 AM:
--

I looked at the code of 4.0 and 4.1 and thinking a bit more about it, I do not 
understand why the keyspaces were split into several groups.
Some methods look also wrong. There seems to be some confusions between what is 
called distributed keyspaces, non-system keyspaces and local keyspaces.
It feels to me that we should revisit that code more carefully. 


was (Author: blerer):
I looked at the code of 4.0 and 4.1 and thinking a bit more about it, I do not 
understand why the keyspaces were split into several groups. 

> Test failure: Fix assertion error AssertionError: Unknown keyspace 
> system_auth\n\tat 
> org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat 
> org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)
> ---
>
> Key: CASSANDRA-18747
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18747
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema, Test/dtest/python
>Reporter: Ekaterina Dimitrova
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> I've been seeing this assertion error in different tests lately.
> Full error message:
> {code:java}
> failed on teardown with "Unexpected error found in node logs (see stdout for 
> full details). Errors: [[node2] 'ERROR [PendingRangeCalculator:1] 2023-08-11 
> 16:35:14,445 JVMStabilityInspector.java:70 - Exception in thread 
> Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError:
>  Unknown keyspace system_auth\n\tat 
> org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat 
> org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat 
> org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat
>  
> org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat
>  org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat 
> org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat 
> org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat
>  
> org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat
>  
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)']" Unexpected error found in 
> node logs (see stdout for full details). Errors: [[node2] 'ERROR 
> [PendingRangeCalculator:1] 2023-08-11 16:35:14,445 
> JVMStabilityInspector.java:70 - Exception in thread 
> Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError:
>  Unknown keyspace system_auth\n\tat 
> org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat 
> org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat 
> org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat
>  
> org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat
>  org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat 
> org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat 
> org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat
>  
> org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat
>  
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)']{code}
> Example failures:
> test_failed_snitch_update_property_file_snitch - 
> 

[jira] [Commented] (CASSANDRA-18747) Test failure: Fix assertion error AssertionError: Unknown keyspace system_auth\n\tat org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat org.apache.ca

2023-10-13 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774832#comment-17774832
 ] 

Benjamin Lerer commented on CASSANDRA-18747:


I looked at the code of 4.0 and 4.1 and thinking a bit more about it, I do not 
understand why the keyspaces were split into several groups. 

> Test failure: Fix assertion error AssertionError: Unknown keyspace 
> system_auth\n\tat 
> org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat 
> org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)
> ---
>
> Key: CASSANDRA-18747
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18747
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema, Test/dtest/python
>Reporter: Ekaterina Dimitrova
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> I've been seeing this assertion error in different tests lately.
> Full error message:
> {code:java}
> failed on teardown with "Unexpected error found in node logs (see stdout for 
> full details). Errors: [[node2] 'ERROR [PendingRangeCalculator:1] 2023-08-11 
> 16:35:14,445 JVMStabilityInspector.java:70 - Exception in thread 
> Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError:
>  Unknown keyspace system_auth\n\tat 
> org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat 
> org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat 
> org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat
>  
> org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat
>  org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat 
> org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat 
> org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat
>  
> org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat
>  
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)']" Unexpected error found in 
> node logs (see stdout for full details). Errors: [[node2] 'ERROR 
> [PendingRangeCalculator:1] 2023-08-11 16:35:14,445 
> JVMStabilityInspector.java:70 - Exception in thread 
> Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError:
>  Unknown keyspace system_auth\n\tat 
> org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat 
> org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat 
> org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat
>  
> org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat
>  org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat 
> org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat 
> org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat
>  
> org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat
>  
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)']{code}
> Example failures:
> test_failed_snitch_update_property_file_snitch - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2475/workflows/2086619e-0f21-464b-a866-84aca516b5e5/jobs/36716/tests]
> test_gcgs_validation - 
> [https://ci-cassandra.apache.org/job/Cassandra-trunk/1666/testReport/junit/dtest.materialized_views_test/TestMaterializedViews/test_gcgs_validation/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 

[jira] [Updated] (CASSANDRA-18747) Test failure: Fix assertion error AssertionError: Unknown keyspace system_auth\n\tat org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat org.apache.cass

2023-10-13 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-18747:
---
Status: Changes Suggested  (was: Ready to Commit)

> Test failure: Fix assertion error AssertionError: Unknown keyspace 
> system_auth\n\tat 
> org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat 
> org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)
> ---
>
> Key: CASSANDRA-18747
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18747
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema, Test/dtest/python
>Reporter: Ekaterina Dimitrova
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> I've been seeing this assertion error in different tests lately.
> Full error message:
> {code:java}
> failed on teardown with "Unexpected error found in node logs (see stdout for 
> full details). Errors: [[node2] 'ERROR [PendingRangeCalculator:1] 2023-08-11 
> 16:35:14,445 JVMStabilityInspector.java:70 - Exception in thread 
> Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError:
>  Unknown keyspace system_auth\n\tat 
> org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat 
> org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat 
> org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat
>  
> org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat
>  org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat 
> org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat 
> org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat
>  
> org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat
>  
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)']" Unexpected error found in 
> node logs (see stdout for full details). Errors: [[node2] 'ERROR 
> [PendingRangeCalculator:1] 2023-08-11 16:35:14,445 
> JVMStabilityInspector.java:70 - Exception in thread 
> Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError:
>  Unknown keyspace system_auth\n\tat 
> org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat 
> org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat 
> org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat
>  
> org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat
>  org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat 
> org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat 
> org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat
>  
> org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat
>  
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)']{code}
> Example failures:
> test_failed_snitch_update_property_file_snitch - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2475/workflows/2086619e-0f21-464b-a866-84aca516b5e5/jobs/36716/tests]
> test_gcgs_validation - 
> [https://ci-cassandra.apache.org/job/Cassandra-trunk/1666/testReport/junit/dtest.materialized_views_test/TestMaterializedViews/test_gcgs_validation/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18747) Test failure: Fix assertion error AssertionError: Unknown keyspace system_auth\n\tat org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat org.apache.ca

2023-10-13 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774828#comment-17774828
 ] 

Benjamin Lerer commented on CASSANDRA-18747:


It seems to me that the proposed solution is going in the opposite direction of 
where it should go. The issue mainly comes from the fact that we have 
duplicated some information in a multithreaded code. Rather than making that 
logic more complex we should simplify it an remove the duplication. Looking at 
where those 2 variables are used and how they get used I really do not see the 
need for the {{distributedAndLocalKeyspaces}} variable. Am I missing something?

> Test failure: Fix assertion error AssertionError: Unknown keyspace 
> system_auth\n\tat 
> org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat 
> org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)
> ---
>
> Key: CASSANDRA-18747
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18747
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema, Test/dtest/python
>Reporter: Ekaterina Dimitrova
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> I've been seeing this assertion error in different tests lately.
> Full error message:
> {code:java}
> failed on teardown with "Unexpected error found in node logs (see stdout for 
> full details). Errors: [[node2] 'ERROR [PendingRangeCalculator:1] 2023-08-11 
> 16:35:14,445 JVMStabilityInspector.java:70 - Exception in thread 
> Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError:
>  Unknown keyspace system_auth\n\tat 
> org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat 
> org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat 
> org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat
>  
> org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat
>  org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat 
> org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat 
> org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat
>  
> org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat
>  
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)']" Unexpected error found in 
> node logs (see stdout for full details). Errors: [[node2] 'ERROR 
> [PendingRangeCalculator:1] 2023-08-11 16:35:14,445 
> JVMStabilityInspector.java:70 - Exception in thread 
> Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError:
>  Unknown keyspace system_auth\n\tat 
> org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat 
> org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat 
> org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat
>  
> org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat
>  org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat 
> org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat 
> org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat
>  
> org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat
>  
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)']{code}
> Example failures:
> test_failed_snitch_update_property_file_snitch - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2475/workflows/2086619e-0f21-464b-a866-84aca516b5e5/jobs/36716/tests]
> test_gcgs_validation - 
> 

[jira] [Updated] (CASSANDRA-18924) TCM: Allow unknown nodes during discovery

2023-10-13 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-18924:

Test and Documentation Plan: Includes a test
 Status: Patch Available  (was: Open)

Patch: https://github.com/apache/cassandra/pull/2803

> TCM: Allow unknown nodes during discovery
> -
>
> Key: CASSANDRA-18924
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18924
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: High
>
>   * avoid discovered.addAll(DatabaseDescriptor.getSeeds()) when starting 
> discovery to exclude them from the final result
>   * add responded node to discovered set, even if it responds with an 
> empty set
>   * Implement a simple simulation for discovery that does not involve 
> setting up entire clusters
>   * Allow _any_ seed to start up first



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18924) TCM: Allow unknown nodes during discovery

2023-10-13 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-18924:

Change Category: Operability
 Complexity: Normal
   Priority: High  (was: Normal)
 Status: Open  (was: Triage Needed)

> TCM: Allow unknown nodes during discovery
> -
>
> Key: CASSANDRA-18924
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18924
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: High
>
>   * avoid discovered.addAll(DatabaseDescriptor.getSeeds()) when starting 
> discovery to exclude them from the final result
>   * add responded node to discovered set, even if it responds with an 
> empty set
>   * Implement a simple simulation for discovery that does not involve 
> setting up entire clusters
>   * Allow _any_ seed to start up first



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18866) Node sends multiple inflight echos

2023-10-13 Thread Stefan Miklosovic (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic updated CASSANDRA-18866:
--
Status: Needs Committer  (was: Review In Progress)

> Node sends multiple inflight echos
> --
>
> Key: CASSANDRA-18866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18866
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Gossip
>Reporter: Cameron Zemek
>Assignee: Cameron Zemek
>Priority: Normal
> Attachments: 18866-regression.patch, duplicates.log, echo.log
>
>
> CASSANDRA-18854 rolled back the changes from CASSANDRA-18845. In particular, 
> 18845 had change to only allow 1 inflight ECHO request at a time. As per 
> 18854 some tests have an error rate due to this change. Creating this ticket 
> to discuss this further. As the current state also does not have retry logic, 
> it just allowing multiple ECHO requests inflight at the same time so less 
> likely that all ECHO will timeout or get lost.
> With the change from 18845 adding in some extra logging to track what is 
> going on, I do see it retrying ECHOs. Likewise, I patched a node to drop ECHO 
> requests from a node and also see it retrying ECHOs when it doesn't get a 
> reply.
> Therefore, I think the problem is more specific than the dropping of one ECHO 
> request. Yes there no retry logic for failed ECHO requests, but this is the 
> case even both before and after 18845. ECHO requests are only sent via gossip 
> verb handlers calling applyStateLocally. In these failed tests I therefore 
> assuming their cases where it won't call markAlive when other nodes consider 
> the node UP but its marked DOWN by a node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18866) Node sends multiple inflight echos

2023-10-13 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774815#comment-17774815
 ] 

Stefan Miklosovic commented on CASSANDRA-18866:
---

[~brandon.williams] do you remember CASSANDRA-18854 / CASSANDRA-18543 where we 
reverted the logic around missed echo message? This one fixes it. Repeated 
tests seem to be stable, it is seen that in some cases it resends echo request 
when lost (in 1% of cases). Do you think this is something you could take a 
look at?  

[trunk 
j17|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3337/workflows/f44ee4a8-03fb-488f-ba54-43306bdc86d0]
[trunk 
j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3337/workflows/459e6b55-3d43-4ee8-85cd-b308e8797e51]
[5.0 
j17|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3336/workflows/50a0bc41-b800-478d-b7e1-38cc73c16f84]
[5.0 
j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3336/workflows/d92a1a77-1f81-4210-a14a-da61fb18e1dd]
[4.1 
j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3335/workflows/dc9128f2-94f0-4651-afc8-df4a2db53a9b]
[4.1 
j8|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3335/workflows/8205db12-78f4-429a-81b8-508ba83b98bb]
[4.0 
j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3326/workflows/86b807f3-801f-47a0-925b-9ca49eb76d97]
[4.0 
j8|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3326/workflows/0af2bd74-55fc-4d3f-a4e0-2a4c1461ed90]
[3.11 j8 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/3325/workflows/ff2b2562-c238-49d2-abb2-3457acb9618d]

> Node sends multiple inflight echos
> --
>
> Key: CASSANDRA-18866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18866
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Gossip
>Reporter: Cameron Zemek
>Assignee: Cameron Zemek
>Priority: Normal
> Attachments: 18866-regression.patch, duplicates.log, echo.log
>
>
> CASSANDRA-18854 rolled back the changes from CASSANDRA-18845. In particular, 
> 18845 had change to only allow 1 inflight ECHO request at a time. As per 
> 18854 some tests have an error rate due to this change. Creating this ticket 
> to discuss this further. As the current state also does not have retry logic, 
> it just allowing multiple ECHO requests inflight at the same time so less 
> likely that all ECHO will timeout or get lost.
> With the change from 18845 adding in some extra logging to track what is 
> going on, I do see it retrying ECHOs. Likewise, I patched a node to drop ECHO 
> requests from a node and also see it retrying ECHOs when it doesn't get a 
> reply.
> Therefore, I think the problem is more specific than the dropping of one ECHO 
> request. Yes there no retry logic for failed ECHO requests, but this is the 
> case even both before and after 18845. ECHO requests are only sent via gossip 
> verb handlers calling applyStateLocally. In these failed tests I therefore 
> assuming their cases where it won't call markAlive when other nodes consider 
> the node UP but its marked DOWN by a node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18866) Node sends multiple inflight echos

2023-10-13 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774815#comment-17774815
 ] 

Stefan Miklosovic edited comment on CASSANDRA-18866 at 10/13/23 7:44 AM:
-

[~brandon.williams] do you remember CASSANDRA-18854 / CASSANDRA-18543 where we 
reverted the logic around missed echo message? This one fixes it. Repeated 
tests seem to be stable, it is seen that in some cases it resends echo request 
when lost (in 1% of cases). Do you think this is something you could take a 
look at?  

[trunk 
j17|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3337/workflows/f44ee4a8-03fb-488f-ba54-43306bdc86d0]
[trunk 
j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3337/workflows/459e6b55-3d43-4ee8-85cd-b308e8797e51]
[5.0 
j17|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3336/workflows/50a0bc41-b800-478d-b7e1-38cc73c16f84]
[5.0 
j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3336/workflows/d92a1a77-1f81-4210-a14a-da61fb18e1dd]
[4.1 
j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3335/workflows/dc9128f2-94f0-4651-afc8-df4a2db53a9b]
[4.1 
j8|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3335/workflows/8205db12-78f4-429a-81b8-508ba83b98bb]
[4.0 
j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3326/workflows/86b807f3-801f-47a0-925b-9ca49eb76d97]
[4.0 
j8|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3326/workflows/0af2bd74-55fc-4d3f-a4e0-2a4c1461ed90]
[3.11 j8| 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/3325/workflows/ff2b2562-c238-49d2-abb2-3457acb9618d]


was (Author: smiklosovic):
[~brandon.williams] do you remember CASSANDRA-18854 / CASSANDRA-18543 where we 
reverted the logic around missed echo message? This one fixes it. Repeated 
tests seem to be stable, it is seen that in some cases it resends echo request 
when lost (in 1% of cases). Do you think this is something you could take a 
look at?  

[trunk 
j17|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3337/workflows/f44ee4a8-03fb-488f-ba54-43306bdc86d0]
[trunk 
j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3337/workflows/459e6b55-3d43-4ee8-85cd-b308e8797e51]
[5.0 
j17|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3336/workflows/50a0bc41-b800-478d-b7e1-38cc73c16f84]
[5.0 
j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3336/workflows/d92a1a77-1f81-4210-a14a-da61fb18e1dd]
[4.1 
j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3335/workflows/dc9128f2-94f0-4651-afc8-df4a2db53a9b]
[4.1 
j8|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3335/workflows/8205db12-78f4-429a-81b8-508ba83b98bb]
[4.0 
j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3326/workflows/86b807f3-801f-47a0-925b-9ca49eb76d97]
[4.0 
j8|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3326/workflows/0af2bd74-55fc-4d3f-a4e0-2a4c1461ed90]
[3.11 j8 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/3325/workflows/ff2b2562-c238-49d2-abb2-3457acb9618d]

> Node sends multiple inflight echos
> --
>
> Key: CASSANDRA-18866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18866
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Gossip
>Reporter: Cameron Zemek
>Assignee: Cameron Zemek
>Priority: Normal
> Attachments: 18866-regression.patch, duplicates.log, echo.log
>
>
> CASSANDRA-18854 rolled back the changes from CASSANDRA-18845. In particular, 
> 18845 had change to only allow 1 inflight ECHO request at a time. As per 
> 18854 some tests have an error rate due to this change. Creating this ticket 
> to discuss this further. As the current state also does not have retry logic, 
> it just allowing multiple ECHO requests inflight at the same time so less 
> likely that all ECHO will timeout or get lost.
> With the change from 18845 adding in some extra logging to track what is 
> going on, I do see it retrying ECHOs. Likewise, I patched a node to drop ECHO 
> requests from a node and also see it retrying ECHOs when it doesn't get a 
> reply.
> Therefore, I think the problem is more specific than the dropping of one ECHO 
> request. Yes there no retry logic for failed ECHO requests, but this is the 
> case even both before and after 18845. ECHO requests are only sent via gossip 
> verb handlers calling applyStateLocally. In these failed tests I therefore 
> assuming their cases where it won't call markAlive when other nodes consider 
> the node UP but its marked DOWN by a node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)