[jira] [Updated] (CASSANDRA-15795) Cannot read data from a 3-node cluster which has two nodes down
[ https://issues.apache.org/jira/browse/CASSANDRA-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YCozy updated CASSANDRA-15795: -- Description: I start up a 3 nodes cluster, and write a row with 'replication_factor' : '3'. The consistency level is ONE. Then I kill two nodes, and try to get the row that I just inserted by cqlsh. But cqlsh returns NoHostAvailable. I find this issue in CA 3.11.5, and it can also be exposed in newest 3.11.6. was: I start up a 3 nodes cluster, and write a row with 'replication_factor' : '3'. The consistency level is ONE. Then I kill two nodes, and try to get the row that I just inserted by cqlsh. But cqlsh returns NoHostAvailable. My Cassandra version is 3.11.6. > Cannot read data from a 3-node cluster which has two nodes down > --- > > Key: CASSANDRA-15795 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15795 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Consistency/Coordination >Reporter: YCozy >Priority: Normal > > I start up a 3 nodes cluster, and write a row with 'replication_factor' : > '3'. The consistency level is ONE. > Then I kill two nodes, and try to get the row that I just inserted by cqlsh. > But cqlsh returns NoHostAvailable. > I find this issue in CA 3.11.5, and it can also be exposed in newest 3.11.6. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103019#comment-17103019 ] Yifan Cai commented on CASSANDRA-15214: --- Sounds good [~jolynch]. So for this ticket, the goal is to force JVM to trigger a Heap OOM upon receiving the direct buffer OOM. (I can work on it.) Do you want to the jvmquake integration be addressed in a different ticket? > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict Elliott Smith >Priority: Normal > Fix For: 4.0, 4.0-rc > > Attachments: oom-experiments.zip > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15797) Fix flaky BinLogTest - org.apache.cassandra.utils.binlog.BinLogTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102994#comment-17102994 ] Yifan Cai commented on CASSANDRA-15797: --- When running all tests under {{BinLogTest}}, {{testPutAfterStop}} can pass. But when running this individual test, it fails. The difference is caused by a race condition. What happens underneath is that there is a little delay between {{@Before}} and test itself introduced when running all together. So the {{BinLog}}'s internal thread starts and it enters the while loop to prepare the tasks to process, before the test flip the condition {{shouldContinue}} to false. Therefore, the thread drains the the {{NO_OP}} just put under the sample queue. When the test poll from the sample queue, it gets {{null}}. > Fix flaky BinLogTest - org.apache.cassandra.utils.binlog.BinLogTest > --- > > Key: CASSANDRA-15797 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15797 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Jon Meredith >Assignee: Yifan Cai >Priority: Normal > Fix For: 4.0-alpha > > > An internal CI system is failing BinLogTest somewhat frequently under JDK11. > Configuration was recently changed to reduce the number of cores the tests > run with, however it is reproducible on an 8 core laptop. > {code} > [junit-timeout] OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC > was deprecated in version 9.0 and will likely be removed in a future release. > [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest > [Junit-timeout] WARNING: An illegal reflective access operation has occurred > [junit-timeout] WARNING: Illegal reflective access by > net.openhft.chronicle.core.Jvm (file:/.../lib/chronicle-core-1.16.4.jar) to > field java.nio.Bits.RESERVED_MEMORY > [junit-timeout] WARNING: Please consider reporting this to the maintainers of > net.openhft.chronicle.core.Jvm > [junit-timeout] WARNING: Use --illegal-access=warn to enable warnings of > further illegal reflective access operations > [junit-timeout] WARNING: All illegal access operations will be denied in a > future release > [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest Tests > run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 13.895 sec > [junit-timeout] > [junit-timeout] Testcase: > testPutAfterStop(org.apache.cassandra.utils.binlog.BinLogTest): FAILED > [junit-timeout] expected: but > was: > [junit-timeout] junit.framework.AssertionFailedError: expected: but > was: > [junit-timeout] at > org.apache.cassandra.utils.binlog.BinLogTest.testPutAfterStop(BinLogTest.java:431) > [junit-timeout] at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [junit-timeout] at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] > [junit-timeout] > [junit-timeout] Test org.apache.cassandra.utils.binlog.BinLogTest FAILED > {code} > There's also a different failure under JDK8 > {code} > junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest > [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest Tests > run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 15.273 sec > [junit-timeout] > [junit-timeout] Testcase: > testBinLogStartStop(org.apache.cassandra.utils.binlog.BinLogTest): FAILED > [junit-timeout] expected:<2> but was:<0> > [junit-timeout] junit.framework.AssertionFailedError: expected:<2> but was:<0> > [junit-timeout] at > org.apache.cassandra.utils.binlog.BinLogTest.testBinLogStartStop(BinLogTest.java:172) > [junit-timeout] > [junit-timeout] > [junit-timeout] Test org.apache.cassandra.utils.binlog.BinLogTest FAILED > {code} > Reproducer > {code} > PASSED=0; time { while ant testclasslist -Dtest.classlistprefix=unit > -Dtest.classlistfile=<(echo > org/apache/cassandra/utils/binlog/BinLogTest.java); do PASSED=$((PASSED+1)); > echo PASSED $PASSED; done }; echo FAILED after $PASSED runs. > {code} > In the last four attempts it has taken 31, 38, 27 and 10 rounds respectively > under JDK11 and took 51 under JDK8 (about 15 minutes). > I have not tried running in a cpu-limited container or anything like that yet. > Additionally, this went past in the logs a few times (under JDK11). No idea > if it's just an artifact of weird test setup, or something more serious. > {code} > [junit-timeout] WARNING: Please consider reporting this to the maintainers of > net.openhft.chronicle.core.Jvm > [junit-timeout] WARNING: Use
[jira] [Commented] (CASSANDRA-15797) Fix flaky BinLogTest - org.apache.cassandra.utils.binlog.BinLogTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102928#comment-17102928 ] Yifan Cai commented on CASSANDRA-15797: --- What's surprising is that the test {{BinLogTest#testPutAfterStop}} fails constantly. The assertion in the test is simply wrong. A quick look at the binLog stop method. On stopping, {{NO_OP}} object is enqueued. Therefore, polling from the queue should not return {{null}}. > Fix flaky BinLogTest - org.apache.cassandra.utils.binlog.BinLogTest > --- > > Key: CASSANDRA-15797 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15797 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Jon Meredith >Assignee: Yifan Cai >Priority: Normal > Fix For: 4.0-alpha > > > An internal CI system is failing BinLogTest somewhat frequently under JDK11. > Configuration was recently changed to reduce the number of cores the tests > run with, however it is reproducible on an 8 core laptop. > {code} > [junit-timeout] OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC > was deprecated in version 9.0 and will likely be removed in a future release. > [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest > [Junit-timeout] WARNING: An illegal reflective access operation has occurred > [junit-timeout] WARNING: Illegal reflective access by > net.openhft.chronicle.core.Jvm (file:/.../lib/chronicle-core-1.16.4.jar) to > field java.nio.Bits.RESERVED_MEMORY > [junit-timeout] WARNING: Please consider reporting this to the maintainers of > net.openhft.chronicle.core.Jvm > [junit-timeout] WARNING: Use --illegal-access=warn to enable warnings of > further illegal reflective access operations > [junit-timeout] WARNING: All illegal access operations will be denied in a > future release > [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest Tests > run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 13.895 sec > [junit-timeout] > [junit-timeout] Testcase: > testPutAfterStop(org.apache.cassandra.utils.binlog.BinLogTest): FAILED > [junit-timeout] expected: but > was: > [junit-timeout] junit.framework.AssertionFailedError: expected: but > was: > [junit-timeout] at > org.apache.cassandra.utils.binlog.BinLogTest.testPutAfterStop(BinLogTest.java:431) > [junit-timeout] at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [junit-timeout] at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] > [junit-timeout] > [junit-timeout] Test org.apache.cassandra.utils.binlog.BinLogTest FAILED > {code} > There's also a different failure under JDK8 > {code} > junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest > [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest Tests > run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 15.273 sec > [junit-timeout] > [junit-timeout] Testcase: > testBinLogStartStop(org.apache.cassandra.utils.binlog.BinLogTest): FAILED > [junit-timeout] expected:<2> but was:<0> > [junit-timeout] junit.framework.AssertionFailedError: expected:<2> but was:<0> > [junit-timeout] at > org.apache.cassandra.utils.binlog.BinLogTest.testBinLogStartStop(BinLogTest.java:172) > [junit-timeout] > [junit-timeout] > [junit-timeout] Test org.apache.cassandra.utils.binlog.BinLogTest FAILED > {code} > Reproducer > {code} > PASSED=0; time { while ant testclasslist -Dtest.classlistprefix=unit > -Dtest.classlistfile=<(echo > org/apache/cassandra/utils/binlog/BinLogTest.java); do PASSED=$((PASSED+1)); > echo PASSED $PASSED; done }; echo FAILED after $PASSED runs. > {code} > In the last four attempts it has taken 31, 38, 27 and 10 rounds respectively > under JDK11 and took 51 under JDK8 (about 15 minutes). > I have not tried running in a cpu-limited container or anything like that yet. > Additionally, this went past in the logs a few times (under JDK11). No idea > if it's just an artifact of weird test setup, or something more serious. > {code} > [junit-timeout] WARNING: Please consider reporting this to the maintainers of > net.openhft.chronicle.core.Jvm > [junit-timeout] WARNING: Use --illegal-access=warn to enable warnings of > further illegal reflective access operations > [junit-timeout] WARNING: All illegal access operations will be denied in a > future release > [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest Tests > run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.839 sec > [junit-timeout] >
[jira] [Assigned] (CASSANDRA-15797) Fix flaky BinLogTest - org.apache.cassandra.utils.binlog.BinLogTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai reassigned CASSANDRA-15797: - Assignee: Yifan Cai > Fix flaky BinLogTest - org.apache.cassandra.utils.binlog.BinLogTest > --- > > Key: CASSANDRA-15797 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15797 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Jon Meredith >Assignee: Yifan Cai >Priority: Normal > Fix For: 4.0-alpha > > > An internal CI system is failing BinLogTest somewhat frequently under JDK11. > Configuration was recently changed to reduce the number of cores the tests > run with, however it is reproducible on an 8 core laptop. > {code} > [junit-timeout] OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC > was deprecated in version 9.0 and will likely be removed in a future release. > [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest > [Junit-timeout] WARNING: An illegal reflective access operation has occurred > [junit-timeout] WARNING: Illegal reflective access by > net.openhft.chronicle.core.Jvm (file:/.../lib/chronicle-core-1.16.4.jar) to > field java.nio.Bits.RESERVED_MEMORY > [junit-timeout] WARNING: Please consider reporting this to the maintainers of > net.openhft.chronicle.core.Jvm > [junit-timeout] WARNING: Use --illegal-access=warn to enable warnings of > further illegal reflective access operations > [junit-timeout] WARNING: All illegal access operations will be denied in a > future release > [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest Tests > run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 13.895 sec > [junit-timeout] > [junit-timeout] Testcase: > testPutAfterStop(org.apache.cassandra.utils.binlog.BinLogTest): FAILED > [junit-timeout] expected: but > was: > [junit-timeout] junit.framework.AssertionFailedError: expected: but > was: > [junit-timeout] at > org.apache.cassandra.utils.binlog.BinLogTest.testPutAfterStop(BinLogTest.java:431) > [junit-timeout] at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [junit-timeout] at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] > [junit-timeout] > [junit-timeout] Test org.apache.cassandra.utils.binlog.BinLogTest FAILED > {code} > There's also a different failure under JDK8 > {code} > junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest > [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest Tests > run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 15.273 sec > [junit-timeout] > [junit-timeout] Testcase: > testBinLogStartStop(org.apache.cassandra.utils.binlog.BinLogTest): FAILED > [junit-timeout] expected:<2> but was:<0> > [junit-timeout] junit.framework.AssertionFailedError: expected:<2> but was:<0> > [junit-timeout] at > org.apache.cassandra.utils.binlog.BinLogTest.testBinLogStartStop(BinLogTest.java:172) > [junit-timeout] > [junit-timeout] > [junit-timeout] Test org.apache.cassandra.utils.binlog.BinLogTest FAILED > {code} > Reproducer > {code} > PASSED=0; time { while ant testclasslist -Dtest.classlistprefix=unit > -Dtest.classlistfile=<(echo > org/apache/cassandra/utils/binlog/BinLogTest.java); do PASSED=$((PASSED+1)); > echo PASSED $PASSED; done }; echo FAILED after $PASSED runs. > {code} > In the last four attempts it has taken 31, 38, 27 and 10 rounds respectively > under JDK11 and took 51 under JDK8 (about 15 minutes). > I have not tried running in a cpu-limited container or anything like that yet. > Additionally, this went past in the logs a few times (under JDK11). No idea > if it's just an artifact of weird test setup, or something more serious. > {code} > [junit-timeout] WARNING: Please consider reporting this to the maintainers of > net.openhft.chronicle.core.Jvm > [junit-timeout] WARNING: Use --illegal-access=warn to enable warnings of > further illegal reflective access operations > [junit-timeout] WARNING: All illegal access operations will be denied in a > future release > [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest Tests > run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.839 sec > [junit-timeout] > [junit-timeout] java.lang.Throwable: 1e53135d-main creation ref-count=1 > [junit-timeout] at > net.openhft.chronicle.core.ReferenceCounter.newRefCountHistory(ReferenceCounter.java:45) > [junit-timeout] at >
[jira] [Assigned] (CASSANDRA-15778) CorruptSSTableException after a 2.1 SSTable is upgraded to 3.0, failing reads
[ https://issues.apache.org/jira/browse/CASSANDRA-15778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell reassigned CASSANDRA-15778: - Assignee: David Capwell > CorruptSSTableException after a 2.1 SSTable is upgraded to 3.0, failing reads > - > > Key: CASSANDRA-15778 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15778 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/SSTable >Reporter: Sumanth Pasupuleti >Assignee: David Capwell >Priority: Normal > Fix For: 3.0.x > > > Below is the exception with stack trace. This issue is consistently > reproduce-able. > {code:java} > ERROR [SharedPool-Worker-1] 2020-05-01 14:57:57,661 > AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-1,5,main]ERROR [SharedPool-Worker-1] 2020-05-01 > 14:57:57,661 AbstractLocalAwareExecutorService.java:169 - Uncaught exception > on thread > Thread[SharedPool-Worker-1,5,main]org.apache.cassandra.io.sstable.CorruptSSTableException: > Corrupted: > /mnt/data/cassandra/data// at > org.apache.cassandra.db.columniterator.AbstractSSTableIterator$Reader.hasNext(AbstractSSTableIterator.java:349) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.columniterator.AbstractSSTableIterator.hasNext(AbstractSSTableIterator.java:220) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.columniterator.SSTableIterator.hasNext(SSTableIterator.java:33) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:95) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:129) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:95) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:129) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:129) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:131) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:294) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:187) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:180) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:176) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:341) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_231] at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at >
[jira] [Updated] (CASSANDRA-15799) CorruptSSTableException when compacting a 3.0 format sstable that was originally created as an outcome of 2.1 sstable upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-15799: -- Bug Category: Parent values: Correctness(12982)Level 1 values: Unrecoverable Corruption / Loss(13161) Complexity: Normal Discovered By: User Report Severity: Critical Status: Open (was: Triage Needed) > CorruptSSTableException when compacting a 3.0 format sstable that was > originally created as an outcome of 2.1 sstable upgrade > - > > Key: CASSANDRA-15799 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15799 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/SSTable >Reporter: Sumanth Pasupuleti >Assignee: David Capwell >Priority: Normal > Fix For: 3.0.x > > > Below is the exception with stack trace. This issue is reproduce-able. > {code:java} > DEBUG [CompactionExecutor:10] 2020-05-07 19:33:34,268 CompactionTask.java:158 > - Compacting (a3ea9fc0-9099-11ea-933f-c5e852f71338) > [/mnt/data/cassandra/data/ks/cf/md-10802-big-Data.db:level=0, ] > ERROR [CompactionExecutor:10] 2020-05-07 19:33:34,275 > CassandraDaemon.java:208 - Exception in thread > Thread[CompactionExecutor:10,1,RMI Runtime] > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /mnt/data/cassandra/data/ks/cf/md-10802-big-Data.db > at > org.apache.cassandra.io.sstable.SSTableIdentityIterator.computeNext(SSTableIdentityIterator.java:105) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.io.sstable.SSTableIdentityIterator.computeNext(SSTableIdentityIterator.java:30) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:95) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.utils.MergeIterator$TrivialOneToOne.computeNext(MergeIterator.java:460) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:534) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:394) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:129) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.ColumnIndex$Builder.build(ColumnIndex.java:111) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.ColumnIndex.writeAndBuildIndex(ColumnIndex.java:52) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:165) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:126) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.realAppend(DefaultCompactionWriter.java:57) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:109) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:195) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at >
[jira] [Assigned] (CASSANDRA-15799) CorruptSSTableException when compacting a 3.0 format sstable that was originally created as an outcome of 2.1 sstable upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell reassigned CASSANDRA-15799: - Assignee: David Capwell > CorruptSSTableException when compacting a 3.0 format sstable that was > originally created as an outcome of 2.1 sstable upgrade > - > > Key: CASSANDRA-15799 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15799 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/SSTable >Reporter: Sumanth Pasupuleti >Assignee: David Capwell >Priority: Normal > Fix For: 3.0.x > > > Below is the exception with stack trace. This issue is reproduce-able. > {code:java} > DEBUG [CompactionExecutor:10] 2020-05-07 19:33:34,268 CompactionTask.java:158 > - Compacting (a3ea9fc0-9099-11ea-933f-c5e852f71338) > [/mnt/data/cassandra/data/ks/cf/md-10802-big-Data.db:level=0, ] > ERROR [CompactionExecutor:10] 2020-05-07 19:33:34,275 > CassandraDaemon.java:208 - Exception in thread > Thread[CompactionExecutor:10,1,RMI Runtime] > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /mnt/data/cassandra/data/ks/cf/md-10802-big-Data.db > at > org.apache.cassandra.io.sstable.SSTableIdentityIterator.computeNext(SSTableIdentityIterator.java:105) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.io.sstable.SSTableIdentityIterator.computeNext(SSTableIdentityIterator.java:30) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:95) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.utils.MergeIterator$TrivialOneToOne.computeNext(MergeIterator.java:460) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:534) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:394) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:129) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.ColumnIndex$Builder.build(ColumnIndex.java:111) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.ColumnIndex.writeAndBuildIndex(ColumnIndex.java:52) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:165) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:126) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.realAppend(DefaultCompactionWriter.java:57) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:109) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:195) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.compaction.CompactionManager$8.runMayThrow(CompactionManager.java:675) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8]
[jira] [Updated] (CASSANDRA-15799) CorruptSSTableException when compacting a 3.0 format sstable that was originally created as an outcome of 2.1 sstable upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumanth Pasupuleti updated CASSANDRA-15799: --- Fix Version/s: 3.0.x > CorruptSSTableException when compacting a 3.0 format sstable that was > originally created as an outcome of 2.1 sstable upgrade > - > > Key: CASSANDRA-15799 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15799 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/SSTable >Reporter: Sumanth Pasupuleti >Priority: Normal > Fix For: 3.0.x > > > Below is the exception with stack trace. This issue is reproduce-able. > {code:java} > DEBUG [CompactionExecutor:10] 2020-05-07 19:33:34,268 CompactionTask.java:158 > - Compacting (a3ea9fc0-9099-11ea-933f-c5e852f71338) > [/mnt/data/cassandra/data/ks/cf/md-10802-big-Data.db:level=0, ] > ERROR [CompactionExecutor:10] 2020-05-07 19:33:34,275 > CassandraDaemon.java:208 - Exception in thread > Thread[CompactionExecutor:10,1,RMI Runtime] > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /mnt/data/cassandra/data/ks/cf/md-10802-big-Data.db > at > org.apache.cassandra.io.sstable.SSTableIdentityIterator.computeNext(SSTableIdentityIterator.java:105) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.io.sstable.SSTableIdentityIterator.computeNext(SSTableIdentityIterator.java:30) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:95) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.utils.MergeIterator$TrivialOneToOne.computeNext(MergeIterator.java:460) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:534) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:394) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:129) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.ColumnIndex$Builder.build(ColumnIndex.java:111) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.ColumnIndex.writeAndBuildIndex(ColumnIndex.java:52) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:165) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:126) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.realAppend(DefaultCompactionWriter.java:57) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:109) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:195) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.db.compaction.CompactionManager$8.runMayThrow(CompactionManager.java:675) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] > at >
[jira] [Created] (CASSANDRA-15799) CorruptSSTableException when compacting a 3.0 format sstable that was originally created as an outcome of 2.1 sstable upgrade
Sumanth Pasupuleti created CASSANDRA-15799: -- Summary: CorruptSSTableException when compacting a 3.0 format sstable that was originally created as an outcome of 2.1 sstable upgrade Key: CASSANDRA-15799 URL: https://issues.apache.org/jira/browse/CASSANDRA-15799 Project: Cassandra Issue Type: Bug Components: Local/Compaction, Local/SSTable Reporter: Sumanth Pasupuleti Below is the exception with stack trace. This issue is reproduce-able. {code:java} DEBUG [CompactionExecutor:10] 2020-05-07 19:33:34,268 CompactionTask.java:158 - Compacting (a3ea9fc0-9099-11ea-933f-c5e852f71338) [/mnt/data/cassandra/data/ks/cf/md-10802-big-Data.db:level=0, ] ERROR [CompactionExecutor:10] 2020-05-07 19:33:34,275 CassandraDaemon.java:208 - Exception in thread Thread[CompactionExecutor:10,1,RMI Runtime] org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /mnt/data/cassandra/data/ks/cf/md-10802-big-Data.db at org.apache.cassandra.io.sstable.SSTableIdentityIterator.computeNext(SSTableIdentityIterator.java:105) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.io.sstable.SSTableIdentityIterator.computeNext(SSTableIdentityIterator.java:30) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:95) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.utils.MergeIterator$TrivialOneToOne.computeNext(MergeIterator.java:460) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:534) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:394) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:129) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.db.ColumnIndex$Builder.build(ColumnIndex.java:111) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.db.ColumnIndex.writeAndBuildIndex(ColumnIndex.java:52) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:165) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:126) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.realAppend(DefaultCompactionWriter.java:57) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:109) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:195) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.db.compaction.CompactionManager$8.runMayThrow(CompactionManager.java:675) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_231] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_231] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_231] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_231] at
[jira] [Commented] (CASSANDRA-15676) flaky test testWriteUnknownResult- org.apache.cassandra.distributed.test.CasWriteTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102826#comment-17102826 ] Yifan Cai commented on CASSANDRA-15676: --- Thanks [~gianluca]. LGTM. +1 _I am not a committer. [~ifesdjeen], would you like to take a look at the CAS test fix?_ > flaky test testWriteUnknownResult- > org.apache.cassandra.distributed.test.CasWriteTest > - > > Key: CASSANDRA-15676 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15676 > Project: Cassandra > Issue Type: Task > Components: Test/dtest >Reporter: Kevin Gallardo >Assignee: Gianluca Righetto >Priority: Normal > Fix For: 4.0-alpha > > Attachments: Screen Shot 2020-05-07 at 7.25.19 PM.png > > > Failure observed in: > https://app.circleci.com/pipelines/github/newkek/cassandra/33/workflows/54007cf7-4424-4ec1-9655-665f6044e6d1/jobs/187/tests > {noformat} > testWriteUnknownResult - org.apache.cassandra.distributed.test.CasWriteTest > junit.framework.AssertionFailedError: Expecting cause to be > CasWriteUncertainException > at > org.apache.cassandra.distributed.test.CasWriteTest.testWriteUnknownResult(CasWriteTest.java:257) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15790) EmptyType doesn't override writeValue so could attempt to write bytes when expected not to
[ https://issues.apache.org/jira/browse/CASSANDRA-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-15790: -- Reviewers: Jordan West, Yifan Cai (was: Jordan West) > EmptyType doesn't override writeValue so could attempt to write bytes when > expected not to > -- > > Key: CASSANDRA-15790 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15790 > Project: Cassandra > Issue Type: Bug > Components: CQL/Semantics >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0-alpha > > > EmptyType.writeValues is defined here > https://github.com/apache/cassandra/blob/e394dc0bb32f612a476269010930c617dd1ed3cb/src/java/org/apache/cassandra/db/marshal/AbstractType.java#L407-L414 > {code} > public void writeValue(ByteBuffer value, DataOutputPlus out) throws > IOException > { > assert value.hasRemaining(); > if (valueLengthIfFixed() >= 0) > out.write(value); > else > ByteBufferUtil.writeWithVIntLength(value, out); > } > {code} > This is fine when the value is empty as the write of empty no-ops (the > readValue also noops since the length is 0), but if the value is not empty > (possible during upgrades or random bugs) then this could silently cause > corruption; ideally this should throw a exception if the ByteBuffer has data. > This was called from > org.apache.cassandra.db.rows.BufferCell.Serializer#serialize, here we check > to see if data is present or not and update the flags. If data is present > then and only then do we call type.writeValue (which requires bytes is not > empty). The problem is that EmptyType never expects writes to happen, but it > still writes them; and does not read them (since it says it is fixed length > of 0, so does read(buffer, 0)). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15790) EmptyType doesn't override writeValue so could attempt to write bytes when expected not to
[ https://issues.apache.org/jira/browse/CASSANDRA-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102806#comment-17102806 ] David Capwell commented on CASSANDRA-15790: --- reading closer I think readValue should return empty buffer and not null, that way it works with org.apache.cassandra.serializers.EmptySerializer#validate and org.apache.cassandra.serializers.EmptySerializer#deserialize; null would break those call sites. > EmptyType doesn't override writeValue so could attempt to write bytes when > expected not to > -- > > Key: CASSANDRA-15790 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15790 > Project: Cassandra > Issue Type: Bug > Components: CQL/Semantics >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0-alpha > > > EmptyType.writeValues is defined here > https://github.com/apache/cassandra/blob/e394dc0bb32f612a476269010930c617dd1ed3cb/src/java/org/apache/cassandra/db/marshal/AbstractType.java#L407-L414 > {code} > public void writeValue(ByteBuffer value, DataOutputPlus out) throws > IOException > { > assert value.hasRemaining(); > if (valueLengthIfFixed() >= 0) > out.write(value); > else > ByteBufferUtil.writeWithVIntLength(value, out); > } > {code} > This is fine when the value is empty as the write of empty no-ops (the > readValue also noops since the length is 0), but if the value is not empty > (possible during upgrades or random bugs) then this could silently cause > corruption; ideally this should throw a exception if the ByteBuffer has data. > This was called from > org.apache.cassandra.db.rows.BufferCell.Serializer#serialize, here we check > to see if data is present or not and update the flags. If data is present > then and only then do we call type.writeValue (which requires bytes is not > empty). The problem is that EmptyType never expects writes to happen, but it > still writes them; and does not read them (since it says it is fixed length > of 0, so does read(buffer, 0)). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15725) Add support for adding custom Verbs
[ https://issues.apache.org/jira/browse/CASSANDRA-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-15725: -- Status: Changes Suggested (was: Review In Progress) > Add support for adding custom Verbs > --- > > Key: CASSANDRA-15725 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15725 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0-alpha > > Attachments: feedback_15725.patch > > > It should be possible to safely add custom/internal Verbs - without risking > conflicts when new ones are added. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15725) Add support for adding custom Verbs
[ https://issues.apache.org/jira/browse/CASSANDRA-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102798#comment-17102798 ] David Capwell commented on CASSANDRA-15725: --- Overall LGTM. I attached a patch to make sure we detect all potential id conflicts (left 3 commented out verbs for testing). I also tested out a 3.x custom verb upgrade. if custom verbs exist which don't map to the new ID format, and backwards compatibility is needed, the verb can define a pre40 verb with the old id, and the custom one can override org.apache.cassandra.net.Verb#toPre40Verb to point to it; this allows migration. > Add support for adding custom Verbs > --- > > Key: CASSANDRA-15725 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15725 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0-alpha > > Attachments: feedback_15725.patch > > > It should be possible to safely add custom/internal Verbs - without risking > conflicts when new ones are added. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15725) Add support for adding custom Verbs
[ https://issues.apache.org/jira/browse/CASSANDRA-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-15725: -- Attachment: feedback_15725.patch > Add support for adding custom Verbs > --- > > Key: CASSANDRA-15725 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15725 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0-alpha > > Attachments: feedback_15725.patch > > > It should be possible to safely add custom/internal Verbs - without risking > conflicts when new ones are added. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15725) Add support for adding custom Verbs
[ https://issues.apache.org/jira/browse/CASSANDRA-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-15725: -- Reviewers: Benedict Elliott Smith, David Capwell, David Capwell (was: Benedict Elliott Smith, David Capwell) Benedict Elliott Smith, David Capwell, David Capwell (was: Benedict Elliott Smith) Status: Review In Progress (was: Patch Available) > Add support for adding custom Verbs > --- > > Key: CASSANDRA-15725 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15725 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0-alpha > > > It should be possible to safely add custom/internal Verbs - without risking > conflicts when new ones are added. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15790) EmptyType doesn't override writeValue so could attempt to write bytes when expected not to
[ https://issues.apache.org/jira/browse/CASSANDRA-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102783#comment-17102783 ] Yifan Cai commented on CASSANDRA-15790: --- Thanks for digging deeper into call sites for readValue. I agree that the overrides in {{EmptyType}} should never be called with the current usage. All the usage are guarded by empty value check and produce an {{EMPTY_BYTE_BUFFER}} inline (without even calling the {{readValue()}} method), except one. {{SinglePartitionReadCommand.Deserializer#deserialize}} does call the {{readValue()}} method regardlessly. However, the type is for partition keys, so I do not think it could be an EmptyType. I think returning a value (either {{EMPTY_BYTE_BUFFER}} or null) from readValue method sounds better. Because {{EmptyType}} cannot have the assumption on how the call sites do the check and skip calling the readValue method. - Returning null, eventually cause an exception to indicate the abnormal state. - Returning {{EMPTY_BYTE_BUFFER}} may silent the error. _I did not read carefully and thought {{maxValueSize}} was the length to read. Thanks for pointing it out._ > EmptyType doesn't override writeValue so could attempt to write bytes when > expected not to > -- > > Key: CASSANDRA-15790 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15790 > Project: Cassandra > Issue Type: Bug > Components: CQL/Semantics >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0-alpha > > > EmptyType.writeValues is defined here > https://github.com/apache/cassandra/blob/e394dc0bb32f612a476269010930c617dd1ed3cb/src/java/org/apache/cassandra/db/marshal/AbstractType.java#L407-L414 > {code} > public void writeValue(ByteBuffer value, DataOutputPlus out) throws > IOException > { > assert value.hasRemaining(); > if (valueLengthIfFixed() >= 0) > out.write(value); > else > ByteBufferUtil.writeWithVIntLength(value, out); > } > {code} > This is fine when the value is empty as the write of empty no-ops (the > readValue also noops since the length is 0), but if the value is not empty > (possible during upgrades or random bugs) then this could silently cause > corruption; ideally this should throw a exception if the ByteBuffer has data. > This was called from > org.apache.cassandra.db.rows.BufferCell.Serializer#serialize, here we check > to see if data is present or not and update the flags. If data is present > then and only then do we call type.writeValue (which requires bytes is not > empty). The problem is that EmptyType never expects writes to happen, but it > still writes them; and does not read them (since it says it is fixed length > of 0, so does read(buffer, 0)). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15158) Wait for schema agreement rather then in flight schema requests when bootstrapping
[ https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102741#comment-17102741 ] Blake Eggleston commented on CASSANDRA-15158: - {quote}commenting on design issues, I am not completely sure if these issues you are talking about are related to this patch or they are already existing? We could indeed focus on the points you raised but it seems to me that the current (comitted) code is worse without this patch than with as I guess these problems are already there? Isn't the goal here to have all nodes on same versions? Isn't the very fact that there are multiple versions pretty strange to begin with so we should not even try to join a node if they mismatch hence there is nothing to deal with in the first place? {quote} When there are schema changes, it's not strange at all for there to be multiple schema versions in the cluster before they converge. We also don't forbid making schema changes while changing cluster topology, so this would be something we should expect to encounter, although I would expect it to happen infrequently. Since bootstrap doesn't stream keyspaces it doesn't know about, this could create a window of data loss. Since the goal of this ticket is to wait for schema to converge before starting bootstrap, we should deal with edge cases like this. Also, I believe there have been bugs that caused a lot of schema change activity when nodes bootstrap, so depending on what exactly you're doing {quote}How can a node report its schema while being unreachable? {quote} Schema versions are gossiped. So a node might gossip a new schema version then become unreachable. The bootstrapping node would learn about this new version via gossip, but be unable to contact it. {quote}> admit that adding isRunningForcibly method feels like a hack but I had very hard time to test this stuff out. {quote} I'll look into how testing can be improved. {quote}> This is the most likely not true unless I am not getting something. The node to be bootstrapped will never advance in doing so unless all nodes have same versions. {quote} Ah, yes you're right. Althought waiting for all nodes to arrive at the same schema version isn't neccesary, we just need to receive and merge at least one schema pull from every schema version in the cluster. > Wait for schema agreement rather then in flight schema requests when > bootstrapping > -- > > Key: CASSANDRA-15158 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15158 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Cluster/Schema >Reporter: Vincent White >Assignee: Ben Bromhead >Priority: Normal > > Currently when a node is bootstrapping we use a set of latches > (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of > in-flight schema pull requests, and we don't proceed with > bootstrapping/stream until all the latches are released (or we timeout > waiting for each one). One issue with this is that if we have a large schema, > or the retrieval of the schema from the other nodes was unexpectedly slow > then we have no explicit check in place to ensure we have actually received a > schema before we proceed. > While it's possible to increase "migration_task_wait_in_seconds" to force the > node to wait on each latche longer, there are cases where this doesn't help > because the callbacks for the schema pull requests have expired off the > messaging service's callback map > (org.apache.cassandra.net.MessagingService#callbacks) after > request_timeout_in_ms (default 10 seconds) before the other nodes were able > to respond to the new node. > This patch checks for schema agreement between the bootstrapping node and the > rest of the live nodes before proceeding with bootstrapping. It also adds a > check to prevent the new node from flooding existing nodes with simultaneous > schema pull requests as can happen in large clusters. > Removing the latch system should also prevent new nodes in large clusters > getting stuck for extended amounts of time as they wait > `migration_task_wait_in_seconds` on each of the latches left orphaned by the > timed out callbacks. > > ||3.11|| > |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]| > |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]| > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15798) Only calculate 1.0 + dynamicBadnessThreshold once per loop in DynamicEndpointSnitch
[ https://issues.apache.org/jira/browse/CASSANDRA-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-15798: Since Version: 4.0-alpha Source Control Link: https://github.com/apache/cassandra/commit/66eae58cd4f53c03ca5ab6b520aa490f7f61a59c Resolution: Fixed Status: Resolved (was: Ready to Commit) Thanks [~dcapwell]. Committed as 66eae58cd4f53c03ca5ab6b520aa490f7f61a59c. > Only calculate 1.0 + dynamicBadnessThreshold once per loop in > DynamicEndpointSnitch > --- > > Key: CASSANDRA-15798 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15798 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Internode >Reporter: Jordan West >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > The change to make dynamicBadnessThreshold volatile and mutable in > https://issues.apache.org/jira/browse/CASSANDRA-12179 could have minor > implications for the performance of this code: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/DynamicEndpointSnitch.java#L216. > Its better to calculate this once before the loop starts, which has the > added benefit that the value is stable throughout the calculation as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch trunk updated: Only calculate dynamicBadnessThreshold once per loop in DynamicEndpointSnitch
This is an automated email from the ASF dual-hosted git repository. jwest pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git The following commit(s) were added to refs/heads/trunk by this push: new 66eae58 Only calculate dynamicBadnessThreshold once per loop in DynamicEndpointSnitch 66eae58 is described below commit 66eae58cd4f53c03ca5ab6b520aa490f7f61a59c Author: Jordan West AuthorDate: Thu May 7 18:06:26 2020 -0700 Only calculate dynamicBadnessThreshold once per loop in DynamicEndpointSnitch Patch by Jordan West; Reviewed by David Capwell for CASSANDRA-15798 --- CHANGES.txt | 1 + src/java/org/apache/cassandra/locator/DynamicEndpointSnitch.java | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/CHANGES.txt b/CHANGES.txt index e6868e9..3e7343c 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.0-alpha5 + * Only calculate dynamicBadnessThreshold once per loop in DynamicEndpointSnitch (CASSANDRA-15798) * Cleanup redundant nodetool commands added in 4.0 (CASSANDRA-15256) * Update to Python driver 3.23 for cqlsh (CASSANDRA-15793) * Add tunable initial size and growth factor to RangeTombstoneList (CASSANDRA-15763) diff --git a/src/java/org/apache/cassandra/locator/DynamicEndpointSnitch.java b/src/java/org/apache/cassandra/locator/DynamicEndpointSnitch.java index 0b241ce..218bdd6 100644 --- a/src/java/org/apache/cassandra/locator/DynamicEndpointSnitch.java +++ b/src/java/org/apache/cassandra/locator/DynamicEndpointSnitch.java @@ -210,10 +210,12 @@ public class DynamicEndpointSnitch extends AbstractEndpointSnitch implements Lat ArrayList sortedScores = new ArrayList<>(subsnitchOrderedScores); Collections.sort(sortedScores); +// only calculate this once b/c its volatile and shouldn't be modified during the loop either +double badnessThreshold = 1.0 + dynamicBadnessThreshold; Iterator sortedScoreIterator = sortedScores.iterator(); for (Double subsnitchScore : subsnitchOrderedScores) { -if (subsnitchScore > (sortedScoreIterator.next() * (1.0 + dynamicBadnessThreshold))) +if (subsnitchScore > (sortedScoreIterator.next() * badnessThreshold)) { return sortedByProximityWithScore(address, replicas); } - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15790) EmptyType doesn't override writeValue so could attempt to write bytes when expected not to
[ https://issues.apache.org/jira/browse/CASSANDRA-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102703#comment-17102703 ] David Capwell commented on CASSANDRA-15790: --- Thanks for the review [~yifanc]! bq. Should readValue(DataInputPlus in, int maxValueSize) raise an exception in the case that maxValueSize is not 0? If it ever happens, the system is in a bad state. Here is a sample call site {code} // function org.apache.cassandra.db.rows.Cell.Serializer#deserialize value = header.getType(column).readValue(in, DatabaseDescriptor.getMaxValueSize()); {code} So, maxValueSize is not expected to be 0, so don't think so. Based off the current usage, we should never be called (call site is guarded by null check) so exception works... not sure how I feel, I kinda think that exception makes sense but so does returning null (we returned empty buffer before (see org.apache.cassandra.utils.ByteBufferUtil#read) but serializer returns null... so not consistent... (see org.apache.cassandra.serializers.EmptySerializer#deserialize) bq. Is the AssertionError intended? I can see the intention might be indicating the severity. Do you mean [this line|https://github.com/apache/cassandra/compare/trunk...dcapwell:bug/CASSANDRA-15790#diff-7dd64369e759d811269ca1be2d14086cR153]? If so yes, this means we have a bug and should NOT move forward (else we cause data loss). bq. nit: EmptyTypeTest.java has no new line at the EOF. If only we ran check style =). fixed; [see here|https://github.com/dcapwell/cassandra/commit/23f3a4f1691f6a76016f7b21d1e3a6ee4ae3c3ab] > EmptyType doesn't override writeValue so could attempt to write bytes when > expected not to > -- > > Key: CASSANDRA-15790 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15790 > Project: Cassandra > Issue Type: Bug > Components: CQL/Semantics >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0-alpha > > > EmptyType.writeValues is defined here > https://github.com/apache/cassandra/blob/e394dc0bb32f612a476269010930c617dd1ed3cb/src/java/org/apache/cassandra/db/marshal/AbstractType.java#L407-L414 > {code} > public void writeValue(ByteBuffer value, DataOutputPlus out) throws > IOException > { > assert value.hasRemaining(); > if (valueLengthIfFixed() >= 0) > out.write(value); > else > ByteBufferUtil.writeWithVIntLength(value, out); > } > {code} > This is fine when the value is empty as the write of empty no-ops (the > readValue also noops since the length is 0), but if the value is not empty > (possible during upgrades or random bugs) then this could silently cause > corruption; ideally this should throw a exception if the ByteBuffer has data. > This was called from > org.apache.cassandra.db.rows.BufferCell.Serializer#serialize, here we check > to see if data is present or not and update the flags. If data is present > then and only then do we call type.writeValue (which requires bytes is not > empty). The problem is that EmptyType never expects writes to happen, but it > still writes them; and does not read them (since it says it is fixed length > of 0, so does read(buffer, 0)). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15725) Add support for adding custom Verbs
[ https://issues.apache.org/jira/browse/CASSANDRA-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102701#comment-17102701 ] Marcus Eriksson edited comment on CASSANDRA-15725 at 5/8/20, 4:06 PM: -- patch which adds "custom verbs" where ids start at 2^14 - 1 (the largest number we can store in a 2 byte vint), and then counts down for every new custom verb we add https://github.com/krummas/cassandra/commits/marcuse/15725 was (Author: krummas): https://github.com/krummas/cassandra/commits/marcuse/15725 > Add support for adding custom Verbs > --- > > Key: CASSANDRA-15725 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15725 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0-alpha > > > It should be possible to safely add custom/internal Verbs - without risking > conflicts when new ones are added. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15725) Add support for adding custom Verbs
[ https://issues.apache.org/jira/browse/CASSANDRA-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-15725: Test and Documentation Plan: cci run Status: Patch Available (was: Open) https://github.com/krummas/cassandra/commits/marcuse/15725 > Add support for adding custom Verbs > --- > > Key: CASSANDRA-15725 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15725 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0-alpha > > > It should be possible to safely add custom/internal Verbs - without risking > conflicts when new ones are added. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15725) Add support for adding custom Verbs
[ https://issues.apache.org/jira/browse/CASSANDRA-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-15725: Change Category: Semantic Complexity: Low Hanging Fruit Component/s: Messaging/Internode Fix Version/s: 4.0-alpha Reviewers: Benedict Elliott Smith Status: Open (was: Triage Needed) > Add support for adding custom Verbs > --- > > Key: CASSANDRA-15725 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15725 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0-alpha > > > It should be possible to safely add custom/internal Verbs - without risking > conflicts when new ones are added. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15256) Clean up redundant nodetool commands added in 4.0
[ https://issues.apache.org/jira/browse/CASSANDRA-15256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-15256: Fix Version/s: 4.0-alpha Source Control Link: https://github.com/apache/cassandra/commit/d48954563802b1c2d42fd0bf5062568baae5b0eb Resolution: Fixed Status: Resolved (was: Ready to Commit) Committed as d48954563802b1c2d42fd0bf5062568baae5b0eb. Thanks [~clohfink]. > Clean up redundant nodetool commands added in 4.0 > - > > Key: CASSANDRA-15256 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15256 > Project: Cassandra > Issue Type: Improvement > Components: Tool/nodetool >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Low > Fix For: 4.0, 4.0-alpha > > > Both hintedhandoff and getendpoints had a 2nd command added that does the > exact thing but rewritten (and in getReplicas not as well) not just aliased > (like cf->tablestats). Also a minor cleanup is same command added multiple > times to nodetool command list. We should clean this up before 4.0 release > before people become reliant on the newly introduced command name. If we want > them renamed as that we should rename and link with alias like we do with > cf->table others. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch trunk updated: Clean up redundant nodetool commands added in 4.0
This is an automated email from the ASF dual-hosted git repository. jwest pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git The following commit(s) were added to refs/heads/trunk by this push: new d489545 Clean up redundant nodetool commands added in 4.0 d489545 is described below commit d48954563802b1c2d42fd0bf5062568baae5b0eb Author: Chris Lohfink AuthorDate: Fri Aug 2 12:29:09 2019 -0500 Clean up redundant nodetool commands added in 4.0 Patch by Chris Lohfink; Reviewed by Jordan West for CASSANDRA-15256 --- CHANGES.txt| 1 + .../apache/cassandra/service/StorageService.java | 10 - .../cassandra/service/StorageServiceMBean.java | 2 - src/java/org/apache/cassandra/tools/NodeProbe.java | 7 +--- src/java/org/apache/cassandra/tools/NodeTool.java | 3 -- .../cassandra/tools/nodetool/GetReplicas.java | 47 -- .../cassandra/tools/nodetool/HandoffWindow.java| 33 --- 7 files changed, 2 insertions(+), 101 deletions(-) diff --git a/CHANGES.txt b/CHANGES.txt index e617498..e6868e9 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.0-alpha5 + * Cleanup redundant nodetool commands added in 4.0 (CASSANDRA-15256) * Update to Python driver 3.23 for cqlsh (CASSANDRA-15793) * Add tunable initial size and growth factor to RangeTombstoneList (CASSANDRA-15763) * Improve debug logging in SSTableReader for index summary (CASSANDRA-15755) diff --git a/src/java/org/apache/cassandra/service/StorageService.java b/src/java/org/apache/cassandra/service/StorageService.java index c01665b..38da4e8 100644 --- a/src/java/org/apache/cassandra/service/StorageService.java +++ b/src/java/org/apache/cassandra/service/StorageService.java @@ -4002,16 +4002,6 @@ public class StorageService extends NotificationBroadcasterSupport implements IE return Replicas.stringify(replicas, true); } -public List getReplicas(String keyspaceName, String cf, String key) -{ -List res = new ArrayList<>(); -for (Replica replica : getNaturalReplicasForToken(keyspaceName, cf, key)) -{ -res.add(replica.toString()); -} -return res; -} - public EndpointsForToken getNaturalReplicasForToken(String keyspaceName, String cf, String key) { KeyspaceMetadata ksMetaData = Schema.instance.getKeyspaceMetadata(keyspaceName); diff --git a/src/java/org/apache/cassandra/service/StorageServiceMBean.java b/src/java/org/apache/cassandra/service/StorageServiceMBean.java index 432b0bc..7574010 100644 --- a/src/java/org/apache/cassandra/service/StorageServiceMBean.java +++ b/src/java/org/apache/cassandra/service/StorageServiceMBean.java @@ -219,8 +219,6 @@ public interface StorageServiceMBean extends NotificationEmitter @Deprecated public List getNaturalEndpoints(String keyspaceName, ByteBuffer key); public List getNaturalEndpointsWithPort(String keysapceName, ByteBuffer key); -public List getReplicas(String keyspaceName, String cf, String key); - /** * @deprecated use {@link #takeSnapshot(String tag, Map options, String... entities)} instead. */ diff --git a/src/java/org/apache/cassandra/tools/NodeProbe.java b/src/java/org/apache/cassandra/tools/NodeProbe.java index 9277278..f911eb5 100644 --- a/src/java/org/apache/cassandra/tools/NodeProbe.java +++ b/src/java/org/apache/cassandra/tools/NodeProbe.java @@ -822,11 +822,6 @@ public class NodeProbe implements AutoCloseable return ssProxy.getNaturalEndpoints(keyspace, cf, key); } -public List getReplicas(String keyspace, String cf, String key) -{ -return ssProxy.getReplicas(keyspace, cf, key); -} - public List getSSTables(String keyspace, String cf, String key, boolean hexFormat) { ColumnFamilyStoreMBean cfsProxy = getCfsProxy(keyspace, cf); @@ -1601,7 +1596,7 @@ public class NodeProbe implements AutoCloseable /** * Retrieve Proxy metrics - * @param connections, connectedNativeClients, connectedNativeClientsByUser, clientsByProtocolVersion + * @param metricName */ public Object getClientMetric(String metricName) { diff --git a/src/java/org/apache/cassandra/tools/NodeTool.java b/src/java/org/apache/cassandra/tools/NodeTool.java index 5af3fb1..bf5e5cc 100644 --- a/src/java/org/apache/cassandra/tools/NodeTool.java +++ b/src/java/org/apache/cassandra/tools/NodeTool.java @@ -205,12 +205,9 @@ public class NodeTool RefreshSizeEstimates.class, RelocateSSTables.class, ViewBuildStatus.class, -HandoffWindow.class, ReloadSslCertificates.class, EnableAuditLog.class, DisableAuditLog.class, -GetReplicas.class, -DisableAuditLog.class, EnableOldProtocolVersions.class,
[jira] [Commented] (CASSANDRA-15797) Fix flaky BinLogTest - org.apache.cassandra.utils.binlog.BinLogTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102685#comment-17102685 ] Jon Meredith commented on CASSANDRA-15797: -- Thanks [~yifanc], you're welcome to it - I was just recording it as I saw it go by. > Fix flaky BinLogTest - org.apache.cassandra.utils.binlog.BinLogTest > --- > > Key: CASSANDRA-15797 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15797 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Jon Meredith >Priority: Normal > Fix For: 4.0-alpha > > > An internal CI system is failing BinLogTest somewhat frequently under JDK11. > Configuration was recently changed to reduce the number of cores the tests > run with, however it is reproducible on an 8 core laptop. > {code} > [junit-timeout] OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC > was deprecated in version 9.0 and will likely be removed in a future release. > [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest > [Junit-timeout] WARNING: An illegal reflective access operation has occurred > [junit-timeout] WARNING: Illegal reflective access by > net.openhft.chronicle.core.Jvm (file:/.../lib/chronicle-core-1.16.4.jar) to > field java.nio.Bits.RESERVED_MEMORY > [junit-timeout] WARNING: Please consider reporting this to the maintainers of > net.openhft.chronicle.core.Jvm > [junit-timeout] WARNING: Use --illegal-access=warn to enable warnings of > further illegal reflective access operations > [junit-timeout] WARNING: All illegal access operations will be denied in a > future release > [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest Tests > run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 13.895 sec > [junit-timeout] > [junit-timeout] Testcase: > testPutAfterStop(org.apache.cassandra.utils.binlog.BinLogTest): FAILED > [junit-timeout] expected: but > was: > [junit-timeout] junit.framework.AssertionFailedError: expected: but > was: > [junit-timeout] at > org.apache.cassandra.utils.binlog.BinLogTest.testPutAfterStop(BinLogTest.java:431) > [junit-timeout] at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [junit-timeout] at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] > [junit-timeout] > [junit-timeout] Test org.apache.cassandra.utils.binlog.BinLogTest FAILED > {code} > There's also a different failure under JDK8 > {code} > junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest > [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest Tests > run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 15.273 sec > [junit-timeout] > [junit-timeout] Testcase: > testBinLogStartStop(org.apache.cassandra.utils.binlog.BinLogTest): FAILED > [junit-timeout] expected:<2> but was:<0> > [junit-timeout] junit.framework.AssertionFailedError: expected:<2> but was:<0> > [junit-timeout] at > org.apache.cassandra.utils.binlog.BinLogTest.testBinLogStartStop(BinLogTest.java:172) > [junit-timeout] > [junit-timeout] > [junit-timeout] Test org.apache.cassandra.utils.binlog.BinLogTest FAILED > {code} > Reproducer > {code} > PASSED=0; time { while ant testclasslist -Dtest.classlistprefix=unit > -Dtest.classlistfile=<(echo > org/apache/cassandra/utils/binlog/BinLogTest.java); do PASSED=$((PASSED+1)); > echo PASSED $PASSED; done }; echo FAILED after $PASSED runs. > {code} > In the last four attempts it has taken 31, 38, 27 and 10 rounds respectively > under JDK11 and took 51 under JDK8 (about 15 minutes). > I have not tried running in a cpu-limited container or anything like that yet. > Additionally, this went past in the logs a few times (under JDK11). No idea > if it's just an artifact of weird test setup, or something more serious. > {code} > [junit-timeout] WARNING: Please consider reporting this to the maintainers of > net.openhft.chronicle.core.Jvm > [junit-timeout] WARNING: Use --illegal-access=warn to enable warnings of > further illegal reflective access operations > [junit-timeout] WARNING: All illegal access operations will be denied in a > future release > [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest Tests > run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.839 sec > [junit-timeout] > [junit-timeout] java.lang.Throwable: 1e53135d-main creation ref-count=1 > [junit-timeout] at > net.openhft.chronicle.core.ReferenceCounter.newRefCountHistory(ReferenceCounter.java:45) > [junit-timeout] at
[jira] [Commented] (CASSANDRA-15789) Rows can get duplicated in mixed major-version clusters and after full upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-15789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102676#comment-17102676 ] Sylvain Lebresne commented on CASSANDRA-15789: -- I had a quick look at those commits, and agrees about the fix in `LegacyLayout`. And I have no strong objections on the 2 other parts, but wanted to remark 2 points: - regarding the elimination of duplicates on iterator coming from `LegacyLayout`: the patch currently merge the duplicates rather silently. What if we have another bug in `LegacyLayout` for which row duplication is only one sign, but that also lose data? Are we sure we won't regret not failing on what would be an unknown bug? - Regarding the duplicate check on all reads, I "think" this could have a measurable impact on performance for some workloads. Which isn't a reason not to add it, but as this impact all reads and will go in "stable" versions, do we want to run a few benchmarks to quantify this? Or have a way to disable the check? > Rows can get duplicated in mixed major-version clusters and after full upgrade > -- > > Key: CASSANDRA-15789 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15789 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Local/Memtable, Local/SSTable >Reporter: Aleksey Yeschenko >Assignee: Marcus Eriksson >Priority: Normal > > In a mixed 2.X/3.X major version cluster a sequence of row deletes, > collection overwrites, paging, and read repair can cause 3.X nodes to split > individual rows into several rows with identical clustering. This happens due > to 2.X paging and RT semantics, and a 3.X {{LegacyLayout}} deficiency. > To reproduce, set up a 2-node mixed major version cluster with the following > table: > {code} > CREATE TABLE distributed_test_keyspace.tlb ( > pk int, > ck int, > v map, > PRIMARY KEY (pk, ck) > ); > {code} > 1. Using either node as the coordinator, delete the row with ck=2 using > timestamp 1 > {code} > DELETE FROM tbl USING TIMESTAMP 1 WHERE pk = 1 AND ck = 2; > {code} > 2. Using either node as the coordinator, insert the following 3 rows: > {code} > INSERT INTO tbl (pk, ck, v) VALUES (1, 1, {'e':'f'}) USING TIMESTAMP 3; > INSERT INTO tbl (pk, ck, v) VALUES (1, 2, {'g':'h'}) USING TIMESTAMP 3; > INSERT INTO tbl (pk, ck, v) VALUES (1, 3, {'i':'j'}) USING TIMESTAMP 3; > {code} > 3. Flush the table on both nodes > 4. Using the 2.2 node as the coordinator, force read repar by querying the > table with page size = 2: > > {code} > SELECT * FROM tbl; > {code} > 5. Overwrite the row with ck=2 using timestamp 5: > {code} > INSERT INTO tbl (pk, ck, v) VALUES (1, 2, {'g':'h'}) USING TIMESTAMP 5;}} > {code} > 6. Query the 3.0 node and observe the split row: > {code} > cqlsh> select * from distributed_test_keyspace.tlb ; > pk | ck | v > ++ > 1 | 1 | {'e': 'f'} > 1 | 2 | {'g': 'h'} > 1 | 2 | {'k': 'l'} > 1 | 3 | {'i': 'j'} > {code} > This happens because the read to query the second page ends up generating the > following mutation for the 3.0 node: > {code} > ColumnFamily(tbl -{deletedAt=-9223372036854775808, localDeletion=2147483647, > ranges=[2:v:_-2:v:!, deletedAt=2, localDeletion=1588588821] > [2:v:!-2:!, deletedAt=1, localDeletion=1588588821] > [3:v:_-3:v:!, deletedAt=2, localDeletion=1588588821]}- > [2:v:63:false:1@3,]) > {code} > Which on 3.0 side gets incorrectly deserialized as > {code} > Mutation(keyspace='distributed_test_keyspace', key='0001', modifications=[ > [distributed_test_keyspace.tbl] key=1 > partition_deletion=deletedAt=-9223372036854775808, localDeletion=2147483647 > columns=[[] | [v]] > Row[info=[ts=-9223372036854775808] ]: ck=2 | del(v)=deletedAt=2, > localDeletion=1588588821, [v[c]=d ts=3] > Row[info=[ts=-9223372036854775808] del=deletedAt=1, > localDeletion=1588588821 ]: ck=2 | > Row[info=[ts=-9223372036854775808] ]: ck=3 | del(v)=deletedAt=2, > localDeletion=1588588821 > ]) > {code} > {{LegacyLayout}} correctly interprets a range tombstone whose start and > finish {{collectionName}} values don't match as a wrapping fragment of a > legacy row deletion that's being interrupted by a collection deletion > (correctly) - see > [code|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1874-L1889]. > Quoting the comment inline: > {code} > // Because of the way RangeTombstoneList work, we can have a tombstone where > only one of > // the bound has a collectionName. That happens if we have a big tombstone A > (spanning one > // or multiple rows) and a collection tombstone B. In that case, > RangeTombstoneList will
[jira] [Updated] (CASSANDRA-15313) Fix flaky - ChecksummingTransformerTest - org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie updated CASSANDRA-15313: -- Fix Version/s: (was: 4.0-alpha) 4.0-beta > Fix flaky - ChecksummingTransformerTest - > org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest > --- > > Key: CASSANDRA-15313 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15313 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Vinay Chella >Assignee: Brandon Williams >Priority: Normal > Fix For: 4.0-beta > > Attachments: CASSANDRA-15313-hack.patch > > > During the recent runs, this test appears to be flaky. > Example failure: > [https://circleci.com/gh/vinaykumarchella/cassandra/459#tests/containers/94] > corruptionCausesFailure-compression - > org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest > {code:java} > java.lang.OutOfMemoryError: GC overhead limit exceeded > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at org.quicktheories.impl.Precursor.(Precursor.java:17) > at > org.quicktheories.impl.ConcreteDetachedSource.(ConcreteDetachedSource.java:8) > at > org.quicktheories.impl.ConcreteDetachedSource.detach(ConcreteDetachedSource.java:23) > at org.quicktheories.generators.Retry.generate(CodePoints.java:51) > at > org.quicktheories.generators.Generate.lambda$intArrays$10(Generate.java:190) > at > org.quicktheories.generators.Generate$$Lambda$17/1847008471.generate(Unknown > Source) > at org.quicktheories.core.DescribingGenerator.generate(Gen.java:255) > at org.quicktheories.core.Gen.lambda$map$0(Gen.java:36) > at org.quicktheories.core.Gen$$Lambda$20/71399214.generate(Unknown > Source) > at org.quicktheories.core.Gen.lambda$map$0(Gen.java:36) > at org.quicktheories.core.Gen$$Lambda$20/71399214.generate(Unknown > Source) > at org.quicktheories.core.Gen.lambda$mix$10(Gen.java:184) > at org.quicktheories.core.Gen$$Lambda$45/802243390.generate(Unknown > Source) > at org.quicktheories.core.Gen.lambda$flatMap$5(Gen.java:93) > at org.quicktheories.core.Gen$$Lambda$48/363509958.generate(Unknown > Source) > at > org.quicktheories.dsl.TheoryBuilder4.lambda$prgnToTuple$12(TheoryBuilder4.java:188) > at > org.quicktheories.dsl.TheoryBuilder4$$Lambda$40/2003496028.generate(Unknown > Source) > at org.quicktheories.core.DescribingGenerator.generate(Gen.java:255) > at org.quicktheories.core.FilteredGenerator.generate(Gen.java:225) > at org.quicktheories.core.Gen.lambda$map$0(Gen.java:36) > at org.quicktheories.core.Gen$$Lambda$20/71399214.generate(Unknown > Source) > at org.quicktheories.impl.Core.generate(Core.java:150) > at org.quicktheories.impl.Core.shrink(Core.java:103) > at org.quicktheories.impl.Core.run(Core.java:39) > at org.quicktheories.impl.TheoryRunner.check(TheoryRunner.java:35) > at org.quicktheories.dsl.TheoryBuilder4.check(TheoryBuilder4.java:150) > at > org.quicktheories.dsl.TheoryBuilder4.checkAssert(TheoryBuilder4.java:162) > at > org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest.corruptionCausesFailure(ChecksummingTransformerTest.java:87) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15313) Fix flaky - ChecksummingTransformerTest - org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102648#comment-17102648 ] Josh McKenzie commented on CASSANDRA-15313: --- Moving fixver to beta since this is blocked by a beta ticket. > Fix flaky - ChecksummingTransformerTest - > org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest > --- > > Key: CASSANDRA-15313 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15313 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Vinay Chella >Assignee: Brandon Williams >Priority: Normal > Fix For: 4.0-beta > > Attachments: CASSANDRA-15313-hack.patch > > > During the recent runs, this test appears to be flaky. > Example failure: > [https://circleci.com/gh/vinaykumarchella/cassandra/459#tests/containers/94] > corruptionCausesFailure-compression - > org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest > {code:java} > java.lang.OutOfMemoryError: GC overhead limit exceeded > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at org.quicktheories.impl.Precursor.(Precursor.java:17) > at > org.quicktheories.impl.ConcreteDetachedSource.(ConcreteDetachedSource.java:8) > at > org.quicktheories.impl.ConcreteDetachedSource.detach(ConcreteDetachedSource.java:23) > at org.quicktheories.generators.Retry.generate(CodePoints.java:51) > at > org.quicktheories.generators.Generate.lambda$intArrays$10(Generate.java:190) > at > org.quicktheories.generators.Generate$$Lambda$17/1847008471.generate(Unknown > Source) > at org.quicktheories.core.DescribingGenerator.generate(Gen.java:255) > at org.quicktheories.core.Gen.lambda$map$0(Gen.java:36) > at org.quicktheories.core.Gen$$Lambda$20/71399214.generate(Unknown > Source) > at org.quicktheories.core.Gen.lambda$map$0(Gen.java:36) > at org.quicktheories.core.Gen$$Lambda$20/71399214.generate(Unknown > Source) > at org.quicktheories.core.Gen.lambda$mix$10(Gen.java:184) > at org.quicktheories.core.Gen$$Lambda$45/802243390.generate(Unknown > Source) > at org.quicktheories.core.Gen.lambda$flatMap$5(Gen.java:93) > at org.quicktheories.core.Gen$$Lambda$48/363509958.generate(Unknown > Source) > at > org.quicktheories.dsl.TheoryBuilder4.lambda$prgnToTuple$12(TheoryBuilder4.java:188) > at > org.quicktheories.dsl.TheoryBuilder4$$Lambda$40/2003496028.generate(Unknown > Source) > at org.quicktheories.core.DescribingGenerator.generate(Gen.java:255) > at org.quicktheories.core.FilteredGenerator.generate(Gen.java:225) > at org.quicktheories.core.Gen.lambda$map$0(Gen.java:36) > at org.quicktheories.core.Gen$$Lambda$20/71399214.generate(Unknown > Source) > at org.quicktheories.impl.Core.generate(Core.java:150) > at org.quicktheories.impl.Core.shrink(Core.java:103) > at org.quicktheories.impl.Core.run(Core.java:39) > at org.quicktheories.impl.TheoryRunner.check(TheoryRunner.java:35) > at org.quicktheories.dsl.TheoryBuilder4.check(TheoryBuilder4.java:150) > at > org.quicktheories.dsl.TheoryBuilder4.checkAssert(TheoryBuilder4.java:162) > at > org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest.corruptionCausesFailure(ChecksummingTransformerTest.java:87) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather then in flight schema requests when bootstrapping
[ https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102543#comment-17102543 ] Stefan Miklosovic edited comment on CASSANDRA-15158 at 5/8/20, 12:49 PM: - It seems to me that one aspect of the PR was overlooked so I just iterate on that one. The mechanim how to not flood nodes with schema pull messages is incorporated in the loop over callbacks. If you notice it, there are sleeps of various lenghts based on a request being already sent or not. This sleep will actually "delay" the next schema pull from the other node because during this time of a sleep, some schema could come from the node we just sent a message to so on the next iteration when another node is compared on schema equality, it may happen that there is not any need to pull it anymore because they are on par. Hence we are not blindly sending messages to all nodes. If there are some discrepancies, there is the global timeout set after which whole bootstrapping process will be evaluated as errorneous and (in the current code) we throw a ConfigurationException. This behaviour might be relaxed but I consider it more appropriate to just throw it there. was (Author: stefan.miklosovic): It seems to me that one aspect of the PR was overlooked so I just iterate on that one. The mechanims how to not flood nodes with schema pull messages is incorporated in a loop over callbacks. If you notice it, there are sleeps of various lenghts based on a request being already sent or not. This sleep will actually "delay" the next schema pull from the other node because during this time of a sleep, a schema could come in so on the next iteration when another node is compared on schema equality, it may happen that there is not any need to pull it anymore because they are on par. Hence we are not blindly sending messages to all nodes. If there are some discrepancies, there is the global timeout set after which whole bootstrapping process will be evaluated as errorneous and (in the current code) we throw a ConfigurationException. This behaviour might be relaxed but I consider it more appropriate to just throw it there. > Wait for schema agreement rather then in flight schema requests when > bootstrapping > -- > > Key: CASSANDRA-15158 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15158 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Cluster/Schema >Reporter: Vincent White >Assignee: Ben Bromhead >Priority: Normal > > Currently when a node is bootstrapping we use a set of latches > (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of > in-flight schema pull requests, and we don't proceed with > bootstrapping/stream until all the latches are released (or we timeout > waiting for each one). One issue with this is that if we have a large schema, > or the retrieval of the schema from the other nodes was unexpectedly slow > then we have no explicit check in place to ensure we have actually received a > schema before we proceed. > While it's possible to increase "migration_task_wait_in_seconds" to force the > node to wait on each latche longer, there are cases where this doesn't help > because the callbacks for the schema pull requests have expired off the > messaging service's callback map > (org.apache.cassandra.net.MessagingService#callbacks) after > request_timeout_in_ms (default 10 seconds) before the other nodes were able > to respond to the new node. > This patch checks for schema agreement between the bootstrapping node and the > rest of the live nodes before proceeding with bootstrapping. It also adds a > check to prevent the new node from flooding existing nodes with simultaneous > schema pull requests as can happen in large clusters. > Removing the latch system should also prevent new nodes in large clusters > getting stuck for extended amounts of time as they wait > `migration_task_wait_in_seconds` on each of the latches left orphaned by the > timed out callbacks. > > ||3.11|| > |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]| > |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]| > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15158) Wait for schema agreement rather then in flight schema requests when bootstrapping
[ https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102543#comment-17102543 ] Stefan Miklosovic commented on CASSANDRA-15158: --- It seems to me that one aspect of the PR was overlooked so I just iterate on that one. The mechanims how to not flood nodes with schema pull messages is incorporated in a loop over callbacks. If you notice it, there are sleeps of various lenghts based on a request being already sent or not. This sleep will actually "delay" the next schema pull from the other node because during this time of a sleep, a schema could come in so on the next iteration when another node is compared on schema equality, it may happen that there is not any need to pull it anymore because they are on par. Hence we are not blindly sending messages to all nodes. If there are some discrepancies, there is the global timeout set after which whole bootstrapping process will be evaluated as errorneous and (in the current code) we throw a ConfigurationException. This behaviour might be relaxed but I consider it more appropriate to just throw it there. > Wait for schema agreement rather then in flight schema requests when > bootstrapping > -- > > Key: CASSANDRA-15158 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15158 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Cluster/Schema >Reporter: Vincent White >Assignee: Ben Bromhead >Priority: Normal > > Currently when a node is bootstrapping we use a set of latches > (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of > in-flight schema pull requests, and we don't proceed with > bootstrapping/stream until all the latches are released (or we timeout > waiting for each one). One issue with this is that if we have a large schema, > or the retrieval of the schema from the other nodes was unexpectedly slow > then we have no explicit check in place to ensure we have actually received a > schema before we proceed. > While it's possible to increase "migration_task_wait_in_seconds" to force the > node to wait on each latche longer, there are cases where this doesn't help > because the callbacks for the schema pull requests have expired off the > messaging service's callback map > (org.apache.cassandra.net.MessagingService#callbacks) after > request_timeout_in_ms (default 10 seconds) before the other nodes were able > to respond to the new node. > This patch checks for schema agreement between the bootstrapping node and the > rest of the live nodes before proceeding with bootstrapping. It also adds a > check to prevent the new node from flooding existing nodes with simultaneous > schema pull requests as can happen in large clusters. > Removing the latch system should also prevent new nodes in large clusters > getting stuck for extended amounts of time as they wait > `migration_task_wait_in_seconds` on each of the latches left orphaned by the > timed out callbacks. > > ||3.11|| > |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]| > |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]| > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather then in flight schema requests when bootstrapping
[ https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102374#comment-17102374 ] Stefan Miklosovic edited comment on CASSANDRA-15158 at 5/8/20, 8:50 AM: Hi [~bdeggleston], commenting on design issues, I am not completely sure if these issues you are talking about are related to this patch or they are already existing? We could indeed focus on the points you raised but it seems to me that the current (comitted) code is worse without this patch than with as I guess these problems are already there? Isn't the goal here to have all nodes on same versions? Isn't the very fact that there are multiple versions pretty strange to begin with so we should not even try to join a node if they mismatch hence there is nothing to deal with in the first place? {quote}It will only wait until it has _some_ schema to begin bootstrapping, not all {quote} This is the most likely not true unless I am not getting something. The node to be bootstrapped will never advance in doing so unless all nodes have same versions. {quote} For instance, if a single node is reporting a schema version that no one else has, but the node is unreachable, what do we do? {quote} We should fail whole bootstrapping and one should go and fix it. {quote}For instance, if a single node is reporting a schema version that no one else has, but the node is unreachable, what do we do? {quote} How can a node report its schema while being unreachable? {quote}Next, I like how this limits the number of messages sent to a given endpoint, but we should also limit the number of messages we send out for a given schema version. If we have a large cluster, and all nodes are reporting the same version, we don't need to ask every node for it's schema. {quote} Got you, this might be tracked. When it comes to testing, I admit that adding isRunningForcibly method feels like a hack but I had very hard time to test this stuff out. It was basically the only reasonable way possible at the time I was coding it, if you know of more better version, please tell me otherwise I am not sure what might be better here and we could stick with this for a time being? The whole testing methodology was based on these callbacks and checking their inner state which results into having a methods which are accepting them so we can elaborate on their state. Without "injecting" them from outside, I would not be able to do that. was (Author: stefan.miklosovic): Hi [~bdeggleston], commenting on design issues, I am not completely sure if these issues you are talking about are related to this patch or they are already existing? We could indeed focus on the points you raised but it seems to me that the current (comitted) code is worse without this patch than with as I guess these problems are already there? Isn't the goal here to have all nodes on same versions? Isn't the very fact that there are multiple versions pretty strange to begin with so we should not even try to join a node if they mismatch hence there is nothing to deal with in the first place? {quote}It will only wait until it has _some_ schema to begin bootstrapping, not all {quote} This is the most likely not true unless I am not getting something. The node to be bootstrapped will never advance in doing so unless all nodes have same versions. {quote} For instance, if a single node is reporting a schema version that no one else has, but the node is unreachable, what do we do? {quote} We should fail whole bootstrapping and one should go and fix it. {quote}For instance, if a single node is reporting a schema version that no one else has, but the node is unreachable, what do we do? {quote} How can a node report its schema while being unreachable? {quote}Next, I like how this limits the number of messages sent to a given endpoint, but we should also limit the number of messages we send out for a given schema version. If we have a large cluster, and all nodes are reporting the same version, we don't need to ask every node for it's schema. {quote} -I am sorry, I am not following what you say here, in particular the very last sentence. I think the schema is ever pull (message is sent) _only_ in case that reported schema version from Gossipper is different, only after that we are ever sending a message.- I am taking this back, you might be right here, I see what you mean, but this make whole solution even more complicated. When it comes to testing, I admit that adding isRunningForcibly method feels like a hack but I had very hard time to test this stuff out. It was basically the only reasonable way possible at the time I was coding it, if you know of more better version, please tell me otherwise I am not sure what might be better here and we could stick with this for a time being? The whole testing methodology was based on these
[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather then in flight schema requests when bootstrapping
[ https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102374#comment-17102374 ] Stefan Miklosovic edited comment on CASSANDRA-15158 at 5/8/20, 8:42 AM: Hi [~bdeggleston], commenting on design issues, I am not completely sure if these issues you are talking about are related to this patch or they are already existing? We could indeed focus on the points you raised but it seems to me that the current (comitted) code is worse without this patch than with as I guess these problems are already there? Isn't the goal here to have all nodes on same versions? Isn't the very fact that there are multiple versions pretty strange to begin with so we should not even try to join a node if they mismatch hence there is nothing to deal with in the first place? {quote}It will only wait until it has _some_ schema to begin bootstrapping, not all {quote} This is the most likely not true unless I am not getting something. The node to be bootstrapped will never advance in doing so unless all nodes have same versions. {quote} For instance, if a single node is reporting a schema version that no one else has, but the node is unreachable, what do we do? {quote} We should fail whole bootstrapping and one should go and fix it. {quote}For instance, if a single node is reporting a schema version that no one else has, but the node is unreachable, what do we do? {quote} How can a node report its schema while being unreachable? {quote}Next, I like how this limits the number of messages sent to a given endpoint, but we should also limit the number of messages we send out for a given schema version. If we have a large cluster, and all nodes are reporting the same version, we don't need to ask every node for it's schema. {quote} -I am sorry, I am not following what you say here, in particular the very last sentence. I think the schema is ever pull (message is sent) _only_ in case that reported schema version from Gossipper is different, only after that we are ever sending a message.- I am taking this back, you might be right here, I see what you mean, but this make whole solution even more complicated. When it comes to testing, I admit that adding isRunningForcibly method feels like a hack but I had very hard time to test this stuff out. It was basically the only reasonable way possible at the time I was coding it, if you know of more better version, please tell me otherwise I am not sure what might be better here and we could stick with this for a time being? The whole testing methodology was based on these callbacks and checking their inner state which results into having a methods which are accepting them so we can elaborate on their state. Without "injecting" them from outside, I would not be able to do that. was (Author: stefan.miklosovic): Hi [~bdeggleston], commenting on design issues, I am not completely sure if these issues you are talking about are related to this patch or they are already existing? We could indeed focus on the points you raised but it seems to me that the current (comitted) code is worse without this patch than with as I guess these problems are already there? Isn't the goal here to have all nodes on same versions? Isn't the very fact that there are multiple versions pretty strange to begin with so we should not even try to join a node if they mismatch hence there is nothing to deal with in the first place? {quote}It will only wait until it has _some_ schema to begin bootstrapping, not all {quote} This is the most likely not true unless I am not getting something. The node to be bootstrapped will never advance in doing so unless all nodes have same versions. {quote} For instance, if a single node is reporting a schema version that no one else has, but the node is unreachable, what do we do? {quote} We should fail whole bootstrapping and one should go and fix it. {quote}For instance, if a single node is reporting a schema version that no one else has, but the node is unreachable, what do we do?{quote} How can a node report its schema while being unreachable? {quote}Next, I like how this limits the number of messages sent to a given endpoint, but we should also limit the number of messages we send out for a given schema version. If we have a large cluster, and all nodes are reporting the same version, we don't need to ask every node for it's schema.{quote} I am sorry, I am not following what you say here, in particular the very last sentence. I think the schema is ever pull (message is sent) _only_ in case that reported schema version from Gossipper is different, only after that we are ever sending a message. When it comes to testing, I admit that adding isRunningForcibly method feels like a hack but I had very hard time to test this stuff out. It was basically the only reasonable way possible at
[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather then in flight schema requests when bootstrapping
[ https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102374#comment-17102374 ] Stefan Miklosovic edited comment on CASSANDRA-15158 at 5/8/20, 8:34 AM: Hi [~bdeggleston], commenting on design issues, I am not completely sure if these issues you are talking about are related to this patch or they are already existing? We could indeed focus on the points you raised but it seems to me that the current (comitted) code is worse without this patch than with as I guess these problems are already there? Isn't the goal here to have all nodes on same versions? Isn't the very fact that there are multiple versions pretty strange to begin with so we should not even try to join a node if they mismatch hence there is nothing to deal with in the first place? {quote}It will only wait until it has _some_ schema to begin bootstrapping, not all {quote} This is the most likely not true unless I am not getting something. The node to be bootstrapped will never advance in doing so unless all nodes have same versions. {quote} For instance, if a single node is reporting a schema version that no one else has, but the node is unreachable, what do we do? {quote} We should fail whole bootstrapping and one should go and fix it. {quote}For instance, if a single node is reporting a schema version that no one else has, but the node is unreachable, what do we do?{quote} How can a node report its schema while being unreachable? {quote}Next, I like how this limits the number of messages sent to a given endpoint, but we should also limit the number of messages we send out for a given schema version. If we have a large cluster, and all nodes are reporting the same version, we don't need to ask every node for it's schema.{quote} I am sorry, I am not following what you say here, in particular the very last sentence. I think the schema is ever pull (message is sent) _only_ in case that reported schema version from Gossipper is different, only after that we are ever sending a message. When it comes to testing, I admit that adding isRunningForcibly method feels like a hack but I had very hard time to test this stuff out. It was basically the only reasonable way possible at the time I was coding it, if you know of more better version, please tell me otherwise I am not sure what might be better here and we could stick with this for a time being? The whole testing methodology was based on these callbacks and checking their inner state which results into having a methods which are accepting them so we can elaborate on their state. Without "injecting" them from outside, I would not be able to do that. was (Author: stefan.miklosovic): Hi [~bdeggleston], commenting on design issues, I am not completely sure if these issues you are talking about are related to this patch or they are already existing? We could indeed focus on the points you raised but it seems to me that the current (comitted) code is worse without this patch than with as I guess these problems are already there? Isn't the goal here to have all nodes on same versions? Isn't the very fact that there are multiple versions pretty strange to begin with so we should not even try to join a node if they mismatch hence there is nothing to deal with in the first place? {quote}It will only wait until it has _some_ schema to begin bootstrapping, not all {quote} This is the most likely not true unless I am not getting something. The node to be bootstrapped will never advance in doing so unless all nodes have same versions. {quote} For instance, if a single node is reporting a schema version that no one else has, but the node is unreachable, what do we do? {quote} We should fail whole bootstrapping and one should go and fix it. {quote}Next, I like how this limits the number of messages sent to a given endpoint, but we should also limit the number of messages we send out for a given schema version. If we have a large cluster, and all nodes are reporting the same version, we don't need to ask every node for it's schema.{quote} I am sorry, I am not following what you say here, in particular the very last sentence. I think the schema is ever pull (message is sent) _only_ in case that reported schema version from Gossipper is different, only after that we are ever sending a message. When it comes to testing, I admit that adding isRunningForcibly method feels like a hack but I had very hard time to test this stuff out. It was basically the only reasonable way possible at the time I was coding it, if you know of more better version, please tell me otherwise I am not sure what might be better here and we could stick with this for a time being? The whole testing methodology was based on these callbacks and checking their inner state which results into having a methods which are
[jira] [Commented] (CASSANDRA-15158) Wait for schema agreement rather then in flight schema requests when bootstrapping
[ https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102374#comment-17102374 ] Stefan Miklosovic commented on CASSANDRA-15158: --- Hi [~bdeggleston], commenting on design issues, I am not completely sure if these issues you are talking about are related to this patch or they are already existing? We could indeed focus on the points you raised but it seems to me that the current (comitted) code is worse without this patch than with as I guess these problems are already there? Isn't the goal here to have all nodes on same versions? Isn't the very fact that there are multiple versions pretty strange to begin with so we should not even try to join a node if they mismatch hence there is nothing to deal with in the first place? {quote}It will only wait until it has _some_ schema to begin bootstrapping, not all {quote} This is the most likely not true unless I am not getting something. The node to be bootstrapped will never advance in doing so unless all nodes have same versions. {quote} For instance, if a single node is reporting a schema version that no one else has, but the node is unreachable, what do we do? {quote} We should fail whole bootstrapping and one should go and fix it. {quote}Next, I like how this limits the number of messages sent to a given endpoint, but we should also limit the number of messages we send out for a given schema version. If we have a large cluster, and all nodes are reporting the same version, we don't need to ask every node for it's schema.{quote} I am sorry, I am not following what you say here, in particular the very last sentence. I think the schema is ever pull (message is sent) _only_ in case that reported schema version from Gossipper is different, only after that we are ever sending a message. When it comes to testing, I admit that adding isRunningForcibly method feels like a hack but I had very hard time to test this stuff out. It was basically the only reasonable way possible at the time I was coding it, if you know of more better version, please tell me otherwise I am not sure what might be better here and we could stick with this for a time being? The whole testing methodology was based on these callbacks and checking their inner state which results into having a methods which are accepting them so we can elaborate on their state. Without "injecting" them from outside, I would not be able to do that. > Wait for schema agreement rather then in flight schema requests when > bootstrapping > -- > > Key: CASSANDRA-15158 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15158 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Cluster/Schema >Reporter: Vincent White >Assignee: Ben Bromhead >Priority: Normal > > Currently when a node is bootstrapping we use a set of latches > (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of > in-flight schema pull requests, and we don't proceed with > bootstrapping/stream until all the latches are released (or we timeout > waiting for each one). One issue with this is that if we have a large schema, > or the retrieval of the schema from the other nodes was unexpectedly slow > then we have no explicit check in place to ensure we have actually received a > schema before we proceed. > While it's possible to increase "migration_task_wait_in_seconds" to force the > node to wait on each latche longer, there are cases where this doesn't help > because the callbacks for the schema pull requests have expired off the > messaging service's callback map > (org.apache.cassandra.net.MessagingService#callbacks) after > request_timeout_in_ms (default 10 seconds) before the other nodes were able > to respond to the new node. > This patch checks for schema agreement between the bootstrapping node and the > rest of the live nodes before proceeding with bootstrapping. It also adds a > check to prevent the new node from flooding existing nodes with simultaneous > schema pull requests as can happen in large clusters. > Removing the latch system should also prevent new nodes in large clusters > getting stuck for extended amounts of time as they wait > `migration_task_wait_in_seconds` on each of the latches left orphaned by the > timed out callbacks. > > ||3.11|| > |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]| > |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]| > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail:
[jira] [Updated] (CASSANDRA-15791) dtest.consistency_test/TestAccuracy/test_simple_strategy_each_quorum_counters/
[ https://issues.apache.org/jira/browse/CASSANDRA-15791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Berenguer Blasi updated CASSANDRA-15791: Test and Documentation Plan: In [PR|https://github.com/apache/cassandra/pull/584] Status: Patch Available (was: In Progress) > dtest.consistency_test/TestAccuracy/test_simple_strategy_each_quorum_counters/ > -- > > Key: CASSANDRA-15791 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15791 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Ekaterina Dimitrova >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-alpha > > > Flakey dtest, failure details below: > https://jenkins-cm4.apache.org/job/Cassandra-trunk-dtest/69/testReport/junit/dtest.consistency_test/TestAccuracy/test_simple_strategy_each_quorum_counters/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org