[jira] [Updated] (CASSANDRA-18929) CEP-15: (C*) Implement TopologySorter to prioritise hosts based on DynamicSnitch and/or topology layout
[ https://issues.apache.org/jira/browse/CASSANDRA-18929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-18929: -- Change Category: Performance Complexity: Normal Fix Version/s: 5.x Assignee: David Capwell Status: Open (was: Triage Needed) > CEP-15: (C*) Implement TopologySorter to prioritise hosts based on > DynamicSnitch and/or topology layout > --- > > Key: CASSANDRA-18929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18929 > Project: Cassandra > Issue Type: Improvement > Components: Accord >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 5.x > > > Implement TopologySorter to prioritise hosts based on DynamicSnitch and/or > topology layout -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-18929) CEP-15: (C*) Implement TopologySorter to prioritise hosts based on DynamicSnitch and/or topology layout
David Capwell created CASSANDRA-18929: - Summary: CEP-15: (C*) Implement TopologySorter to prioritise hosts based on DynamicSnitch and/or topology layout Key: CASSANDRA-18929 URL: https://issues.apache.org/jira/browse/CASSANDRA-18929 Project: Cassandra Issue Type: Improvement Components: Accord Reporter: David Capwell Implement TopologySorter to prioritise hosts based on DynamicSnitch and/or topology layout -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18904) Repair vtable caches consume excessive memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775017#comment-17775017 ] Abe Ratnofsky commented on CASSANDRA-18904: --- PRs up: - trunk: https://github.com/apache/cassandra/pull/2804 - 4.1: https://github.com/apache/cassandra/pull/2805 4.1 slightly differs from trunk due to the exclusion of CASSANDRA-18816, so I'm opening that up in a separate PR. I'll sync up all the branches based on the PR feedback. > Repair vtable caches consume excessive memory > - > > Key: CASSANDRA-18904 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18904 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Local/Caching >Reporter: Abe Ratnofsky >Assignee: Abe Ratnofsky >Priority: Normal > > Currently, the repair vtables > (system_views.{repairs,repair_sessions,repair_jobs,repair_participates,repair_validations}) > are backed by caches in ActiveRepairService that are bounded by the number > of elements in them, controlled by Config.repair_state_size and > Config.repair_state_expires. > The individual cached elements are mutable, and can grow to retain a > significant amount of heap as the instance uptime increases and more repairs > are run. In a heap dump for a real cluster, I found these caches occupying > ~1GB of heap total between ActiveRepairService.repairs and > ActiveRepairService.participates. Individual cached elements were reaching > 100KB in size, so configuring the caches by number of elements introduces a > significant amount of potential variance in the actual heap usage of these > caches. > We should measure these caches by the heap they retain, not by the number of > elements. Users should not be expected to check heap dumps to calibrate the > number of elements they configure the caches to consume - specifying a memory > total is much more user-friendly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18928) Simplify handling of Insufficient replies from Commit and Apply
[ https://issues.apache.org/jira/browse/CASSANDRA-18928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-18928: -- Change Category: Code Clarity Complexity: Normal Reviewers: Benedict Elliott Smith Status: Open (was: Triage Needed) > Simplify handling of Insufficient replies from Commit and Apply > --- > > Key: CASSANDRA-18928 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18928 > Project: Cassandra > Issue Type: Improvement > Components: Accord >Reporter: Aleksey Yeschenko >Assignee: Aleksey Yeschenko >Priority: Normal > > Remove the use of Defer for Commit, and reply with Maximal Apply to > Insufficient Apply responses -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-18928) Simplify handling of Insufficient replies from Commit and Apply
Aleksey Yeschenko created CASSANDRA-18928: - Summary: Simplify handling of Insufficient replies from Commit and Apply Key: CASSANDRA-18928 URL: https://issues.apache.org/jira/browse/CASSANDRA-18928 Project: Cassandra Issue Type: Improvement Components: Accord Reporter: Aleksey Yeschenko Assignee: Aleksey Yeschenko Remove the use of Defer for Commit, and reply with Maximal Apply to Insufficient Apply responses -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18710) Test failure: org.apache.cassandra.io.DiskSpaceMetricsTest.testFlushSize-.jdk17 (from org.apache.cassandra.io.DiskSpaceMetricsTest-.jdk17)
[ https://issues.apache.org/jira/browse/CASSANDRA-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-18710: - Attachment: org.apache.cassandra.io.DiskSpaceMetricsTest.txt > Test failure: > org.apache.cassandra.io.DiskSpaceMetricsTest.testFlushSize-.jdk17 (from > org.apache.cassandra.io.DiskSpaceMetricsTest-.jdk17) > -- > > Key: CASSANDRA-18710 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18710 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Ekaterina Dimitrova >Assignee: Brandon Williams >Priority: Normal > Fix For: 5.0.x, 5.x > > Attachments: org.apache.cassandra.io.DiskSpaceMetricsTest.txt > > > Seen here: > [https://ci-cassandra.apache.org/job/Cassandra-trunk/1644/testReport/org.apache.cassandra.io/DiskSpaceMetricsTest/testFlushSize__jdk17/] > h3. > {code:java} > Error Message > expected:<7200.0> but was:<1367.83970468544> > Stacktrace > junit.framework.AssertionFailedError: expected:<7200.0> but > was:<1367.83970468544> at > org.apache.cassandra.io.DiskSpaceMetricsTest.testFlushSize(DiskSpaceMetricsTest.java:119) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18710) Test failure: org.apache.cassandra.io.DiskSpaceMetricsTest.testFlushSize-.jdk17 (from org.apache.cassandra.io.DiskSpaceMetricsTest-.jdk17)
[ https://issues.apache.org/jira/browse/CASSANDRA-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774918#comment-17774918 ] Brandon Williams edited comment on CASSANDRA-18710 at 10/13/23 2:06 PM: Running with [this patch|https://github.com/driftx/cassandra/commit/286bb5cd7c36c62e541cc79b025931215a982bc3], I've managed to reproduce, and it indicates the culprit sstable: bq. [junit-timeout] INFO [main] 2023-10-12 21:55:12,181 DiskSpaceMetricsTest.java:125 - smallest sstable is /tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db at 2329 bytes If we grep the log for that sstable: {quote} [junit-timeout] INFO [PerDiskMemtableFlushWriter_0:2] 2023-10-12 21:55:11,128 Flushing.java:180 - Completed flushing /tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db (6.839KiB) for commitlog position CommitLogPosition(segmentId=1697147706890, position=211) [junit-timeout] DEBUG [MemtableFlushWriter:2] 2023-10-12 21:55:11,177 ColumnFamilyStore.java:1345 - Flushed to [BigTableReader:big(path='/tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db')] (1 sstables, 11.232KiB), biggest 11.232KiB, smallest 11.232KiB [junit-timeout] INFO [main] 2023-10-12 21:55:12,181 DiskSpaceMetricsTest.java:125 - smallest sstable is /tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db at 2329 bytes {quote} It looks like SSTR.onDiskLength() and bytesOnDisk() disagree at some point, which seems like a bug. [~blambov] can you take a look? I've uploaded the full log from the failure. was (Author: brandon.williams): Running with [this patch|https://github.com/driftx/cassandra/commit/286bb5cd7c36c62e541cc79b025931215a982bc3], I've managed to reproduce, and it indicates the culprit sstable: bq. [junit-timeout] INFO [main] 2023-10-12 21:55:12,181 DiskSpaceMetricsTest.java:125 - smallest sstable is /tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db at 2329 bytes If we grep the log for that sstable: {quote} [junit-timeout] INFO [PerDiskMemtableFlushWriter_0:2] 2023-10-12 21:55:11,128 Flushing.java:180 - Completed flushing /tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db (6.839KiB) for commitlog position CommitLogPosition(segmentId=1697147706890, position=211) [junit-timeout] DEBUG [MemtableFlushWriter:2] 2023-10-12 21:55:11,177 ColumnFamilyStore.java:1345 - Flushed to [BigTableReader:big(path='/tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db')] (1 sstables, 11.232KiB), biggest 11.232KiB, smallest 11.232KiB [junit-timeout] INFO [main] 2023-10-12 21:55:12,181 DiskSpaceMetricsTest.java:125 - smallest sstable is /tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db at 2329 bytes {quote} It looks like SSTR.onDiskLength() and bytesOnDisk() disagree at some point, which seems like a bug. [~blambov] can you take a look? > Test failure: > org.apache.cassandra.io.DiskSpaceMetricsTest.testFlushSize-.jdk17 (from > org.apache.cassandra.io.DiskSpaceMetricsTest-.jdk17) > -- > > Key: CASSANDRA-18710 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18710 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Ekaterina Dimitrova >Assignee: Brandon Williams >Priority: Normal > Fix For: 5.0.x, 5.x > > Attachments: org.apache.cassandra.io.DiskSpaceMetricsTest.txt > > > Seen here: > [https://ci-cassandra.apache.org/job/Cassandra-trunk/1644/testReport/org.apache.cassandra.io/DiskSpaceMetricsTest/testFlushSize__jdk17/] > h3. > {code:java} > Error Message > expected:<7200.0> but was:<1367.83970468544> > Stacktrace > junit.framework.AssertionFailedError: expected:<7200.0> but > was:<1367.83970468544> at > org.apache.cassandra.io.DiskSpaceMetricsTest.testFlushSize(DiskSpaceMetricsTest.java:119) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CASSANDRA-18710) Test failure: org.apache.cassandra.io.DiskSpaceMetricsTest.testFlushSize-.jdk17 (from org.apache.cassandra.io.DiskSpaceMetricsTest-.jdk17)
[ https://issues.apache.org/jira/browse/CASSANDRA-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774918#comment-17774918 ] Brandon Williams commented on CASSANDRA-18710: -- Running with [this patch|https://github.com/driftx/cassandra/commit/286bb5cd7c36c62e541cc79b025931215a982bc3], I've managed to reproduce, and it indicates the culprit sstable: bq. [junit-timeout] INFO [main] 2023-10-12 21:55:12,181 DiskSpaceMetricsTest.java:125 - smallest sstable is /tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db at 2329 bytes If we grep the log for that sstable: {quote} [junit-timeout] INFO [PerDiskMemtableFlushWriter_0:2] 2023-10-12 21:55:11,128 Flushing.java:180 - Completed flushing /tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db (6.839KiB) for commitlog position CommitLogPosition(segmentId=1697147706890, position=211) [junit-timeout] DEBUG [MemtableFlushWriter:2] 2023-10-12 21:55:11,177 ColumnFamilyStore.java:1345 - Flushed to [BigTableReader:big(path='/tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db')] (1 sstables, 11.232KiB), biggest 11.232KiB, smallest 11.232KiB [junit-timeout] INFO [main] 2023-10-12 21:55:12,181 DiskSpaceMetricsTest.java:125 - smallest sstable is /tmp/cassandra/build/test/cassandra/data/cql_test_keyspace_alt/table_01-03e61210694a11eeb4091bdb4ac3170b/nc-1-big-Data.db at 2329 bytes {quote} It looks like SSTR.onDiskLength() and bytesOnDisk() disagree at some point, which seems like a bug. [~blambov] can you take a look? > Test failure: > org.apache.cassandra.io.DiskSpaceMetricsTest.testFlushSize-.jdk17 (from > org.apache.cassandra.io.DiskSpaceMetricsTest-.jdk17) > -- > > Key: CASSANDRA-18710 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18710 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Ekaterina Dimitrova >Assignee: Brandon Williams >Priority: Normal > Fix For: 5.0.x, 5.x > > > Seen here: > [https://ci-cassandra.apache.org/job/Cassandra-trunk/1644/testReport/org.apache.cassandra.io/DiskSpaceMetricsTest/testFlushSize__jdk17/] > h3. > {code:java} > Error Message > expected:<7200.0> but was:<1367.83970468544> > Stacktrace > junit.framework.AssertionFailedError: expected:<7200.0> but > was:<1367.83970468544> at > org.apache.cassandra.io.DiskSpaceMetricsTest.testFlushSize(DiskSpaceMetricsTest.java:119) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra-website] branch asf-staging updated (209ffb5f -> 28f6b431)
This is an automated email from the ASF dual-hosted git repository. git-site-role pushed a change to branch asf-staging in repository https://gitbox.apache.org/repos/asf/cassandra-website.git discard 209ffb5f generate docs for db70fb96 new 28f6b431 generate docs for db70fb96 This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (209ffb5f) \ N -- N -- N refs/heads/asf-staging (28f6b431) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: site-ui/build/ui-bundle.zip | Bin 4881412 -> 4881412 bytes 1 file changed, 0 insertions(+), 0 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp
[ https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774843#comment-17774843 ] Jacek Lewandowski commented on CASSANDRA-18798: --- [~henrik.ingo] I think you are focused on timestamps but timestamps is not the problem which causes incorrect order of items in the list. It is the cell path content, which is populated with timeuuid collected too early. Therefore, I'm afraid that manipulating timestamps will gives us nothing - cell path needs to be populated at the application time. > Appending to list in Accord transactions uses insertion timestamp > - > > Key: CASSANDRA-18798 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18798 > Project: Cassandra > Issue Type: Bug > Components: Accord >Reporter: Jaroslaw Kijanowski >Assignee: Henrik Ingo >Priority: Normal > Fix For: 5.0-alpha2 > > Attachments: image-2023-09-26-20-05-25-846.png > > > Given the following schema: > {code:java} > CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': > 'SimpleStrategy', 'replication_factor': 3}; > CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents > LIST); > TRUNCATE accord.list_append;{code} > And the following two possible queries executed by 10 threads in parallel: > {code:java} > BEGIN TRANSACTION > LET row = (SELECT * FROM list_append WHERE id = ?); > SELECT row.contents; > COMMIT TRANSACTION;" > BEGIN TRANSACTION > UPDATE list_append SET contents += ? WHERE id = ?; > COMMIT TRANSACTION;" > {code} > there seems to be an issue with transaction guarantees. Here's an excerpt in > the edn format from a test. > {code:java} > {:type :invoke :process 8 :value [[:append 5 352]] :tid 3 :n 52 > :time 1692607285967116627} > {:type :invoke :process 9 :value [[:r 5 nil]] :tid 1 :n 54 > :time 1692607286078732473} > {:type :invoke :process 6 :value [[:append 5 553]] :tid 5 :n 53 > :time 1692607286133833428} > {:type :invoke :process 7 :value [[:append 5 455]] :tid 4 :n 55 > :time 1692607286149702511} > {:type :ok :process 8 :value [[:append 5 352]] :tid 3 :n 52 > :time 1692607286156314099} > {:type :invoke :process 5 :value [[:r 5 nil]] :tid 9 :n 52 > :time 1692607286167090389} > {:type :ok :process 9 :value [[:r 5 [303 304 604 6 306 509 909 409 912 > 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 > 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 > 852 352]]] :tid 1 :n 54 :time 1692607286168657534} > {:type :invoke :process 1 :value [[:r 5 nil]] :tid 0 :n 51 > :time 1692607286201762938} > {:type :ok :process 7 :value [[:append 5 455]] :tid 4 :n 55 > :time 1692607286245571513} > {:type :invoke :process 7 :value [[:r 5 nil]] :tid 4 :n 56 > :time 1692607286245655775} > {:type :ok :process 5 :value [[:r 5 [303 304 604 6 306 509 909 409 912 > 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 > 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 > 852 352 455]]] :tid 9 :n 52 :time 1692607286253928906} > {:type :invoke :process 5 :value [[:r 5 nil]] :tid 9 :n 53 > :time 1692607286254095215} > {:type :ok :process 6 :value [[:append 5 553]] :tid 5 :n 53 > :time 1692607286266263422} > {:type :ok :process 1 :value [[:r 5 [303 304 604 6 306 509 909 409 912 > 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 > 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 > 852 352 553 455]]] :tid 0 :n 51 :time 1692607286271617955} > {:type :ok :process 7 :value [[:r 5 [303 304 604 6 306 509 909 409 912 > 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 > 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 > 852 352 553 455]]] :tid 4 :n 56 :time 1692607286271816933} > {:type :ok :process 5 :value [[:r 5 [303 304 604 6 306 509 909 409 912 > 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 > 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 > 852 352 553 455]]] :tid 9 :n 53 :time 1692607286281483026} > {:type :invoke :process 9 :value [[:r 5 nil]] :tid 1 :n 56 > :time 1692607286284097561} > {:type :ok :process 9 :value [[:r 5 [303 304 604 6 306 509 909 409 912 > 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 > 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 > 852 352 553 455]]] :tid 1 :n 56
[jira] [Commented] (CASSANDRA-18747) Test failure: Fix assertion error AssertionError: Unknown keyspace system_auth\n\tat org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat org.apache.ca
[ https://issues.apache.org/jira/browse/CASSANDRA-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774836#comment-17774836 ] Jacek Lewandowski commented on CASSANDRA-18747: --- Some methods stayed there as they were previously. > Test failure: Fix assertion error AssertionError: Unknown keyspace > system_auth\n\tat > org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat > org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162) > --- > > Key: CASSANDRA-18747 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18747 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema, Test/dtest/python >Reporter: Ekaterina Dimitrova >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.x > > Time Spent: 6h 50m > Remaining Estimate: 0h > > I've been seeing this assertion error in different tests lately. > Full error message: > {code:java} > failed on teardown with "Unexpected error found in node logs (see stdout for > full details). Errors: [[node2] 'ERROR [PendingRangeCalculator:1] 2023-08-11 > 16:35:14,445 JVMStabilityInspector.java:70 - Exception in thread > Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError: > Unknown keyspace system_auth\n\tat > org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat > org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat > org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat > > org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat > org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat > > org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat > > org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat > > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat > java.base/java.lang.Thread.run(Thread.java:829)']" Unexpected error found in > node logs (see stdout for full details). Errors: [[node2] 'ERROR > [PendingRangeCalculator:1] 2023-08-11 16:35:14,445 > JVMStabilityInspector.java:70 - Exception in thread > Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError: > Unknown keyspace system_auth\n\tat > org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat > org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat > org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat > > org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat > org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat > > org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat > > org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat > > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat > java.base/java.lang.Thread.run(Thread.java:829)']{code} > Example failures: > test_failed_snitch_update_property_file_snitch - > [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2475/workflows/2086619e-0f21-464b-a866-84aca516b5e5/jobs/36716/tests] > test_gcgs_validation - > [https://ci-cassandra.apache.org/job/Cassandra-trunk/1666/testReport/junit/dtest.materialized_views_test/TestMaterializedViews/test_gcgs_validation/] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18747) Test failure: Fix assertion error AssertionError: Unknown keyspace system_auth\n\tat org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat org.apache.ca
[ https://issues.apache.org/jira/browse/CASSANDRA-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774834#comment-17774834 ] Jacek Lewandowski commented on CASSANDRA-18747: --- Regarding the first comment, you are probably right, fresh look was needed indeed; Regarding the second question - if you mean local and distributed - because local are not synchronized across the cluster > Test failure: Fix assertion error AssertionError: Unknown keyspace > system_auth\n\tat > org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat > org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162) > --- > > Key: CASSANDRA-18747 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18747 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema, Test/dtest/python >Reporter: Ekaterina Dimitrova >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.x > > Time Spent: 6h 50m > Remaining Estimate: 0h > > I've been seeing this assertion error in different tests lately. > Full error message: > {code:java} > failed on teardown with "Unexpected error found in node logs (see stdout for > full details). Errors: [[node2] 'ERROR [PendingRangeCalculator:1] 2023-08-11 > 16:35:14,445 JVMStabilityInspector.java:70 - Exception in thread > Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError: > Unknown keyspace system_auth\n\tat > org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat > org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat > org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat > > org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat > org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat > > org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat > > org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat > > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat > java.base/java.lang.Thread.run(Thread.java:829)']" Unexpected error found in > node logs (see stdout for full details). Errors: [[node2] 'ERROR > [PendingRangeCalculator:1] 2023-08-11 16:35:14,445 > JVMStabilityInspector.java:70 - Exception in thread > Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError: > Unknown keyspace system_auth\n\tat > org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat > org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat > org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat > > org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat > org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat > > org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat > > org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat > > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat > java.base/java.lang.Thread.run(Thread.java:829)']{code} > Example failures: > test_failed_snitch_update_property_file_snitch - > [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2475/workflows/2086619e-0f21-464b-a866-84aca516b5e5/jobs/36716/tests] > test_gcgs_validation - > [https://ci-cassandra.apache.org/job/Cassandra-trunk/1666/testReport/junit/dtest.materialized_views_test/TestMaterializedViews/test_gcgs_validation/] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail:
[jira] [Comment Edited] (CASSANDRA-18747) Test failure: Fix assertion error AssertionError: Unknown keyspace system_auth\n\tat org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat org.apac
[ https://issues.apache.org/jira/browse/CASSANDRA-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774832#comment-17774832 ] Benjamin Lerer edited comment on CASSANDRA-18747 at 10/13/23 9:08 AM: -- I looked at the code of 4.0 and 4.1 and thinking a bit more about it, I do not understand why the keyspaces were split into several groups. Some methods look also wrong. There seem to be some confusions between what is called distributed keyspaces, non-system keyspaces and local keyspaces. It feels to me that we should revisit that code more carefully. was (Author: blerer): I looked at the code of 4.0 and 4.1 and thinking a bit more about it, I do not understand why the keyspaces were split into several groups. Some methods look also wrong. There seems to be some confusions between what is called distributed keyspaces, non-system keyspaces and local keyspaces. It feels to me that we should revisit that code more carefully. > Test failure: Fix assertion error AssertionError: Unknown keyspace > system_auth\n\tat > org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat > org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162) > --- > > Key: CASSANDRA-18747 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18747 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema, Test/dtest/python >Reporter: Ekaterina Dimitrova >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.x > > Time Spent: 6h 50m > Remaining Estimate: 0h > > I've been seeing this assertion error in different tests lately. > Full error message: > {code:java} > failed on teardown with "Unexpected error found in node logs (see stdout for > full details). Errors: [[node2] 'ERROR [PendingRangeCalculator:1] 2023-08-11 > 16:35:14,445 JVMStabilityInspector.java:70 - Exception in thread > Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError: > Unknown keyspace system_auth\n\tat > org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat > org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat > org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat > > org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat > org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat > > org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat > > org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat > > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat > java.base/java.lang.Thread.run(Thread.java:829)']" Unexpected error found in > node logs (see stdout for full details). Errors: [[node2] 'ERROR > [PendingRangeCalculator:1] 2023-08-11 16:35:14,445 > JVMStabilityInspector.java:70 - Exception in thread > Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError: > Unknown keyspace system_auth\n\tat > org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat > org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat > org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat > > org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat > org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat > > org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat > > org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat > > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat >
[jira] [Comment Edited] (CASSANDRA-18747) Test failure: Fix assertion error AssertionError: Unknown keyspace system_auth\n\tat org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat org.apac
[ https://issues.apache.org/jira/browse/CASSANDRA-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774832#comment-17774832 ] Benjamin Lerer edited comment on CASSANDRA-18747 at 10/13/23 9:08 AM: -- I looked at the code of 4.0 and 4.1 and thinking a bit more about it, I do not understand why the keyspaces were split into several groups. Some methods look also wrong. There seems to be some confusions between what is called distributed keyspaces, non-system keyspaces and local keyspaces. It feels to me that we should revisit that code more carefully. was (Author: blerer): I looked at the code of 4.0 and 4.1 and thinking a bit more about it, I do not understand why the keyspaces were split into several groups. > Test failure: Fix assertion error AssertionError: Unknown keyspace > system_auth\n\tat > org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat > org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162) > --- > > Key: CASSANDRA-18747 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18747 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema, Test/dtest/python >Reporter: Ekaterina Dimitrova >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.x > > Time Spent: 6h 50m > Remaining Estimate: 0h > > I've been seeing this assertion error in different tests lately. > Full error message: > {code:java} > failed on teardown with "Unexpected error found in node logs (see stdout for > full details). Errors: [[node2] 'ERROR [PendingRangeCalculator:1] 2023-08-11 > 16:35:14,445 JVMStabilityInspector.java:70 - Exception in thread > Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError: > Unknown keyspace system_auth\n\tat > org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat > org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat > org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat > > org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat > org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat > > org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat > > org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat > > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat > java.base/java.lang.Thread.run(Thread.java:829)']" Unexpected error found in > node logs (see stdout for full details). Errors: [[node2] 'ERROR > [PendingRangeCalculator:1] 2023-08-11 16:35:14,445 > JVMStabilityInspector.java:70 - Exception in thread > Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError: > Unknown keyspace system_auth\n\tat > org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat > org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat > org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat > > org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat > org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat > > org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat > > org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat > > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat > java.base/java.lang.Thread.run(Thread.java:829)']{code} > Example failures: > test_failed_snitch_update_property_file_snitch - >
[jira] [Commented] (CASSANDRA-18747) Test failure: Fix assertion error AssertionError: Unknown keyspace system_auth\n\tat org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat org.apache.ca
[ https://issues.apache.org/jira/browse/CASSANDRA-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774832#comment-17774832 ] Benjamin Lerer commented on CASSANDRA-18747: I looked at the code of 4.0 and 4.1 and thinking a bit more about it, I do not understand why the keyspaces were split into several groups. > Test failure: Fix assertion error AssertionError: Unknown keyspace > system_auth\n\tat > org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat > org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162) > --- > > Key: CASSANDRA-18747 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18747 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema, Test/dtest/python >Reporter: Ekaterina Dimitrova >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.x > > Time Spent: 6h 50m > Remaining Estimate: 0h > > I've been seeing this assertion error in different tests lately. > Full error message: > {code:java} > failed on teardown with "Unexpected error found in node logs (see stdout for > full details). Errors: [[node2] 'ERROR [PendingRangeCalculator:1] 2023-08-11 > 16:35:14,445 JVMStabilityInspector.java:70 - Exception in thread > Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError: > Unknown keyspace system_auth\n\tat > org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat > org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat > org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat > > org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat > org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat > > org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat > > org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat > > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat > java.base/java.lang.Thread.run(Thread.java:829)']" Unexpected error found in > node logs (see stdout for full details). Errors: [[node2] 'ERROR > [PendingRangeCalculator:1] 2023-08-11 16:35:14,445 > JVMStabilityInspector.java:70 - Exception in thread > Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError: > Unknown keyspace system_auth\n\tat > org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat > org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat > org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat > > org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat > org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat > > org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat > > org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat > > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat > java.base/java.lang.Thread.run(Thread.java:829)']{code} > Example failures: > test_failed_snitch_update_property_file_snitch - > [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2475/workflows/2086619e-0f21-464b-a866-84aca516b5e5/jobs/36716/tests] > test_gcgs_validation - > [https://ci-cassandra.apache.org/job/Cassandra-trunk/1666/testReport/junit/dtest.materialized_views_test/TestMaterializedViews/test_gcgs_validation/] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail:
[jira] [Updated] (CASSANDRA-18747) Test failure: Fix assertion error AssertionError: Unknown keyspace system_auth\n\tat org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat org.apache.cass
[ https://issues.apache.org/jira/browse/CASSANDRA-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-18747: --- Status: Changes Suggested (was: Ready to Commit) > Test failure: Fix assertion error AssertionError: Unknown keyspace > system_auth\n\tat > org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat > org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162) > --- > > Key: CASSANDRA-18747 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18747 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema, Test/dtest/python >Reporter: Ekaterina Dimitrova >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.x > > Time Spent: 6h 50m > Remaining Estimate: 0h > > I've been seeing this assertion error in different tests lately. > Full error message: > {code:java} > failed on teardown with "Unexpected error found in node logs (see stdout for > full details). Errors: [[node2] 'ERROR [PendingRangeCalculator:1] 2023-08-11 > 16:35:14,445 JVMStabilityInspector.java:70 - Exception in thread > Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError: > Unknown keyspace system_auth\n\tat > org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat > org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat > org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat > > org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat > org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat > > org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat > > org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat > > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat > java.base/java.lang.Thread.run(Thread.java:829)']" Unexpected error found in > node logs (see stdout for full details). Errors: [[node2] 'ERROR > [PendingRangeCalculator:1] 2023-08-11 16:35:14,445 > JVMStabilityInspector.java:70 - Exception in thread > Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError: > Unknown keyspace system_auth\n\tat > org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat > org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat > org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat > > org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat > org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat > > org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat > > org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat > > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat > java.base/java.lang.Thread.run(Thread.java:829)']{code} > Example failures: > test_failed_snitch_update_property_file_snitch - > [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2475/workflows/2086619e-0f21-464b-a866-84aca516b5e5/jobs/36716/tests] > test_gcgs_validation - > [https://ci-cassandra.apache.org/job/Cassandra-trunk/1666/testReport/junit/dtest.materialized_views_test/TestMaterializedViews/test_gcgs_validation/] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18747) Test failure: Fix assertion error AssertionError: Unknown keyspace system_auth\n\tat org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat org.apache.ca
[ https://issues.apache.org/jira/browse/CASSANDRA-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774828#comment-17774828 ] Benjamin Lerer commented on CASSANDRA-18747: It seems to me that the proposed solution is going in the opposite direction of where it should go. The issue mainly comes from the fact that we have duplicated some information in a multithreaded code. Rather than making that logic more complex we should simplify it an remove the duplication. Looking at where those 2 variables are used and how they get used I really do not see the need for the {{distributedAndLocalKeyspaces}} variable. Am I missing something? > Test failure: Fix assertion error AssertionError: Unknown keyspace > system_auth\n\tat > org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat > org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162) > --- > > Key: CASSANDRA-18747 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18747 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema, Test/dtest/python >Reporter: Ekaterina Dimitrova >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.x > > Time Spent: 6h 50m > Remaining Estimate: 0h > > I've been seeing this assertion error in different tests lately. > Full error message: > {code:java} > failed on teardown with "Unexpected error found in node logs (see stdout for > full details). Errors: [[node2] 'ERROR [PendingRangeCalculator:1] 2023-08-11 > 16:35:14,445 JVMStabilityInspector.java:70 - Exception in thread > Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError: > Unknown keyspace system_auth\n\tat > org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat > org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat > org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat > > org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat > org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat > > org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat > > org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat > > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat > java.base/java.lang.Thread.run(Thread.java:829)']" Unexpected error found in > node logs (see stdout for full details). Errors: [[node2] 'ERROR > [PendingRangeCalculator:1] 2023-08-11 16:35:14,445 > JVMStabilityInspector.java:70 - Exception in thread > Thread[PendingRangeCalculator:1,5,PendingRangeCalculator]\njava.lang.AssertionError: > Unknown keyspace system_auth\n\tat > org.apache.cassandra.db.Keyspace.(Keyspace.java:324)\n\tat > org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:162)\n\tat > org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)\n\tat > > org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:251)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:162)\n\tat > org.apache.cassandra.db.Keyspace.open(Keyspace.java:151)\n\tat > org.apache.cassandra.service.PendingRangeCalculatorService.lambda$new$1(PendingRangeCalculatorService.java:58)\n\tat > > org.apache.cassandra.concurrent.SingleThreadExecutorPlus$AtLeastOnce.run(SingleThreadExecutorPlus.java:60)\n\tat > > org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat > > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat > java.base/java.lang.Thread.run(Thread.java:829)']{code} > Example failures: > test_failed_snitch_update_property_file_snitch - > [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2475/workflows/2086619e-0f21-464b-a866-84aca516b5e5/jobs/36716/tests] > test_gcgs_validation - >
[jira] [Updated] (CASSANDRA-18924) TCM: Allow unknown nodes during discovery
[ https://issues.apache.org/jira/browse/CASSANDRA-18924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov updated CASSANDRA-18924: Test and Documentation Plan: Includes a test Status: Patch Available (was: Open) Patch: https://github.com/apache/cassandra/pull/2803 > TCM: Allow unknown nodes during discovery > - > > Key: CASSANDRA-18924 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18924 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: High > > * avoid discovered.addAll(DatabaseDescriptor.getSeeds()) when starting > discovery to exclude them from the final result > * add responded node to discovered set, even if it responds with an > empty set > * Implement a simple simulation for discovery that does not involve > setting up entire clusters > * Allow _any_ seed to start up first -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18924) TCM: Allow unknown nodes during discovery
[ https://issues.apache.org/jira/browse/CASSANDRA-18924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov updated CASSANDRA-18924: Change Category: Operability Complexity: Normal Priority: High (was: Normal) Status: Open (was: Triage Needed) > TCM: Allow unknown nodes during discovery > - > > Key: CASSANDRA-18924 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18924 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: High > > * avoid discovered.addAll(DatabaseDescriptor.getSeeds()) when starting > discovery to exclude them from the final result > * add responded node to discovered set, even if it responds with an > empty set > * Implement a simple simulation for discovery that does not involve > setting up entire clusters > * Allow _any_ seed to start up first -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18866) Node sends multiple inflight echos
[ https://issues.apache.org/jira/browse/CASSANDRA-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-18866: -- Status: Needs Committer (was: Review In Progress) > Node sends multiple inflight echos > -- > > Key: CASSANDRA-18866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18866 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Gossip >Reporter: Cameron Zemek >Assignee: Cameron Zemek >Priority: Normal > Attachments: 18866-regression.patch, duplicates.log, echo.log > > > CASSANDRA-18854 rolled back the changes from CASSANDRA-18845. In particular, > 18845 had change to only allow 1 inflight ECHO request at a time. As per > 18854 some tests have an error rate due to this change. Creating this ticket > to discuss this further. As the current state also does not have retry logic, > it just allowing multiple ECHO requests inflight at the same time so less > likely that all ECHO will timeout or get lost. > With the change from 18845 adding in some extra logging to track what is > going on, I do see it retrying ECHOs. Likewise, I patched a node to drop ECHO > requests from a node and also see it retrying ECHOs when it doesn't get a > reply. > Therefore, I think the problem is more specific than the dropping of one ECHO > request. Yes there no retry logic for failed ECHO requests, but this is the > case even both before and after 18845. ECHO requests are only sent via gossip > verb handlers calling applyStateLocally. In these failed tests I therefore > assuming their cases where it won't call markAlive when other nodes consider > the node UP but its marked DOWN by a node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18866) Node sends multiple inflight echos
[ https://issues.apache.org/jira/browse/CASSANDRA-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774815#comment-17774815 ] Stefan Miklosovic commented on CASSANDRA-18866: --- [~brandon.williams] do you remember CASSANDRA-18854 / CASSANDRA-18543 where we reverted the logic around missed echo message? This one fixes it. Repeated tests seem to be stable, it is seen that in some cases it resends echo request when lost (in 1% of cases). Do you think this is something you could take a look at? [trunk j17|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3337/workflows/f44ee4a8-03fb-488f-ba54-43306bdc86d0] [trunk j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3337/workflows/459e6b55-3d43-4ee8-85cd-b308e8797e51] [5.0 j17|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3336/workflows/50a0bc41-b800-478d-b7e1-38cc73c16f84] [5.0 j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3336/workflows/d92a1a77-1f81-4210-a14a-da61fb18e1dd] [4.1 j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3335/workflows/dc9128f2-94f0-4651-afc8-df4a2db53a9b] [4.1 j8|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3335/workflows/8205db12-78f4-429a-81b8-508ba83b98bb] [4.0 j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3326/workflows/86b807f3-801f-47a0-925b-9ca49eb76d97] [4.0 j8|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3326/workflows/0af2bd74-55fc-4d3f-a4e0-2a4c1461ed90] [3.11 j8 https://app.circleci.com/pipelines/github/instaclustr/cassandra/3325/workflows/ff2b2562-c238-49d2-abb2-3457acb9618d] > Node sends multiple inflight echos > -- > > Key: CASSANDRA-18866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18866 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Gossip >Reporter: Cameron Zemek >Assignee: Cameron Zemek >Priority: Normal > Attachments: 18866-regression.patch, duplicates.log, echo.log > > > CASSANDRA-18854 rolled back the changes from CASSANDRA-18845. In particular, > 18845 had change to only allow 1 inflight ECHO request at a time. As per > 18854 some tests have an error rate due to this change. Creating this ticket > to discuss this further. As the current state also does not have retry logic, > it just allowing multiple ECHO requests inflight at the same time so less > likely that all ECHO will timeout or get lost. > With the change from 18845 adding in some extra logging to track what is > going on, I do see it retrying ECHOs. Likewise, I patched a node to drop ECHO > requests from a node and also see it retrying ECHOs when it doesn't get a > reply. > Therefore, I think the problem is more specific than the dropping of one ECHO > request. Yes there no retry logic for failed ECHO requests, but this is the > case even both before and after 18845. ECHO requests are only sent via gossip > verb handlers calling applyStateLocally. In these failed tests I therefore > assuming their cases where it won't call markAlive when other nodes consider > the node UP but its marked DOWN by a node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18866) Node sends multiple inflight echos
[ https://issues.apache.org/jira/browse/CASSANDRA-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774815#comment-17774815 ] Stefan Miklosovic edited comment on CASSANDRA-18866 at 10/13/23 7:44 AM: - [~brandon.williams] do you remember CASSANDRA-18854 / CASSANDRA-18543 where we reverted the logic around missed echo message? This one fixes it. Repeated tests seem to be stable, it is seen that in some cases it resends echo request when lost (in 1% of cases). Do you think this is something you could take a look at? [trunk j17|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3337/workflows/f44ee4a8-03fb-488f-ba54-43306bdc86d0] [trunk j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3337/workflows/459e6b55-3d43-4ee8-85cd-b308e8797e51] [5.0 j17|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3336/workflows/50a0bc41-b800-478d-b7e1-38cc73c16f84] [5.0 j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3336/workflows/d92a1a77-1f81-4210-a14a-da61fb18e1dd] [4.1 j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3335/workflows/dc9128f2-94f0-4651-afc8-df4a2db53a9b] [4.1 j8|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3335/workflows/8205db12-78f4-429a-81b8-508ba83b98bb] [4.0 j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3326/workflows/86b807f3-801f-47a0-925b-9ca49eb76d97] [4.0 j8|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3326/workflows/0af2bd74-55fc-4d3f-a4e0-2a4c1461ed90] [3.11 j8| https://app.circleci.com/pipelines/github/instaclustr/cassandra/3325/workflows/ff2b2562-c238-49d2-abb2-3457acb9618d] was (Author: smiklosovic): [~brandon.williams] do you remember CASSANDRA-18854 / CASSANDRA-18543 where we reverted the logic around missed echo message? This one fixes it. Repeated tests seem to be stable, it is seen that in some cases it resends echo request when lost (in 1% of cases). Do you think this is something you could take a look at? [trunk j17|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3337/workflows/f44ee4a8-03fb-488f-ba54-43306bdc86d0] [trunk j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3337/workflows/459e6b55-3d43-4ee8-85cd-b308e8797e51] [5.0 j17|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3336/workflows/50a0bc41-b800-478d-b7e1-38cc73c16f84] [5.0 j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3336/workflows/d92a1a77-1f81-4210-a14a-da61fb18e1dd] [4.1 j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3335/workflows/dc9128f2-94f0-4651-afc8-df4a2db53a9b] [4.1 j8|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3335/workflows/8205db12-78f4-429a-81b8-508ba83b98bb] [4.0 j11|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3326/workflows/86b807f3-801f-47a0-925b-9ca49eb76d97] [4.0 j8|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3326/workflows/0af2bd74-55fc-4d3f-a4e0-2a4c1461ed90] [3.11 j8 https://app.circleci.com/pipelines/github/instaclustr/cassandra/3325/workflows/ff2b2562-c238-49d2-abb2-3457acb9618d] > Node sends multiple inflight echos > -- > > Key: CASSANDRA-18866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18866 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Gossip >Reporter: Cameron Zemek >Assignee: Cameron Zemek >Priority: Normal > Attachments: 18866-regression.patch, duplicates.log, echo.log > > > CASSANDRA-18854 rolled back the changes from CASSANDRA-18845. In particular, > 18845 had change to only allow 1 inflight ECHO request at a time. As per > 18854 some tests have an error rate due to this change. Creating this ticket > to discuss this further. As the current state also does not have retry logic, > it just allowing multiple ECHO requests inflight at the same time so less > likely that all ECHO will timeout or get lost. > With the change from 18845 adding in some extra logging to track what is > going on, I do see it retrying ECHOs. Likewise, I patched a node to drop ECHO > requests from a node and also see it retrying ECHOs when it doesn't get a > reply. > Therefore, I think the problem is more specific than the dropping of one ECHO > request. Yes there no retry logic for failed ECHO requests, but this is the > case even both before and after 18845. ECHO requests are only sent via gossip > verb handlers calling applyStateLocally. In these failed tests I therefore > assuming their cases where it won't call markAlive when other nodes consider > the node UP but its marked DOWN by a node. -- This message was sent by Atlassian Jira (v8.20.10#820010)