[jira] [Updated] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15861: - Test and Documentation Plan: [https://app.circleci.com/pipelines/github/jasonstack/cassandra/300/workflows/f41f6585-cd97-4791-abdc-a2935694948e] (was: [https://app.circleci.com/pipelines/github/jasonstack/cassandra/298/workflows/4e56c8b2-a998-4785-9daf-0ceee52a9a83]) > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > Currently, entire-sstable-streaming requires sstable components to be > immutable, because \{{ComponentManifest}} > with component sizes are
[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188976#comment-17188976 ] ZhaoYang commented on CASSANDRA-15861: -- Pushed a unit test to verify "compressionMetadata" is used to calculate the transferred size for compressed sstable. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > Currently, entire-sstable-streaming requires sstable components to be > immutable, because \{{ComponentManifest}} > with component sizes are sent before sending actual files. This isn't a > problem in legacy streaming as STATS file length didn't matter. >
[jira] [Updated] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15861: - Test and Documentation Plan: [https://app.circleci.com/pipelines/github/jasonstack/cassandra/298/workflows/4e56c8b2-a998-4785-9daf-0ceee52a9a83] (was: https://circleci.com/workflow-run/610e8169-e60c-420b-a556-4120967db6cb) > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > Currently, entire-sstable-streaming requires sstable components to be > immutable, because \{{ComponentManifest}} > with component sizes are sent before sending actual files. This isn't
[jira] [Updated] (CASSANDRA-16095) Nodetool tablestats WriteLatency error
[ https://issues.apache.org/jira/browse/CASSANDRA-16095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Altan Özlü updated CASSANDRA-16095: --- Labels: (was: 4) > Nodetool tablestats WriteLatency error > -- > > Key: CASSANDRA-16095 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16095 > Project: Cassandra > Issue Type: Bug >Reporter: Altan Özlü >Priority: Normal > > when i use > {code:java} > bin/nodetool tablestats {code} > i get this error > {code:java} > error: > org.apache.cassandra.metrics:type=Table,keyspace=,scope=feed,name=WriteLatency > -- StackTrace -- > javax.management.InstanceNotFoundException: > org.apache.cassandra.metrics:type=Table,keyspace=,scope=feed,name=WriteLatency > {code} > if i restart the node and check it works but if i write something then i get > the error again > * Cassandra 4.0-beta1 > * Ubuntu 20 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16095) Nodetool tablestats WriteLatency error
[ https://issues.apache.org/jira/browse/CASSANDRA-16095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Altan Özlü updated CASSANDRA-16095: --- Labels: 4 (was: ) > Nodetool tablestats WriteLatency error > -- > > Key: CASSANDRA-16095 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16095 > Project: Cassandra > Issue Type: Bug >Reporter: Altan Özlü >Priority: Normal > Labels: 4 > > when i use > {code:java} > bin/nodetool tablestats {code} > i get this error > {code:java} > error: > org.apache.cassandra.metrics:type=Table,keyspace=,scope=feed,name=WriteLatency > -- StackTrace -- > javax.management.InstanceNotFoundException: > org.apache.cassandra.metrics:type=Table,keyspace=,scope=feed,name=WriteLatency > {code} > if i restart the node and check it works but if i write something then i get > the error again > * Cassandra 4.0-beta1 > * Ubuntu 20 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16095) Nodetool tablestats WriteLatency error
Altan Özlü created CASSANDRA-16095: -- Summary: Nodetool tablestats WriteLatency error Key: CASSANDRA-16095 URL: https://issues.apache.org/jira/browse/CASSANDRA-16095 Project: Cassandra Issue Type: Bug Reporter: Altan Özlü when i use {code:java} bin/nodetool tablestats {code} i get this error {code:java} error: org.apache.cassandra.metrics:type=Table,keyspace=,scope=feed,name=WriteLatency -- StackTrace -- javax.management.InstanceNotFoundException: org.apache.cassandra.metrics:type=Table,keyspace=,scope=feed,name=WriteLatency {code} if i restart the node and check it works but if i write something then i get the error again * Cassandra 4.0-beta1 * Ubuntu 20 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15909) Make Table/Keyspace Metric Names Consistent With Each Other
[ https://issues.apache.org/jira/browse/CASSANDRA-15909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Joshi updated CASSANDRA-15909: - Reviewers: Caleb Rackliffe, David Capwell, Dinesh Joshi (was: Caleb Rackliffe, David Capwell) > Make Table/Keyspace Metric Names Consistent With Each Other > --- > > Key: CASSANDRA-15909 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15909 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Stephen Mallette >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 4.0-beta > > Time Spent: 2h 20m > Remaining Estimate: 0h > > As part of CASSANDRA-15821 it became apparent that certain metric names found > in keyspace and tables had different names but were in fact the same metric - > they are as follows: > * Table.SyncTime == Keyspace.RepairSyncTime > * Table.RepairedDataTrackingOverreadRows == Keyspace.RepairedOverreadRows > * Table.RepairedDataTrackingOverreadTime == Keyspace.RepairedOverreadTime > * Table.AllMemtablesHeapSize == Keyspace.AllMemtablesOnHeapDataSize > * Table.AllMemtablesOffHeapSize == Keyspace.AllMemtablesOffHeapDataSize > * Table.MemtableOnHeapSize == Keyspace.MemtableOnHeapDataSize > * Table.MemtableOffHeapSize == Keyspace.MemtableOffHeapDataSize > Also, client metrics are the only metrics to start with a lower case letter. > Change those to upper case to match all the other metrics. > Unifying this naming would help make metrics more consistent as part of > CASSANDRA-15582 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15909) Make Table/Keyspace Metric Names Consistent With Each Other
[ https://issues.apache.org/jira/browse/CASSANDRA-15909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188837#comment-17188837 ] Dinesh Joshi commented on CASSANDRA-15909: -- I'll take a look this week. > Make Table/Keyspace Metric Names Consistent With Each Other > --- > > Key: CASSANDRA-15909 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15909 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Stephen Mallette >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 4.0-beta > > Time Spent: 2h 20m > Remaining Estimate: 0h > > As part of CASSANDRA-15821 it became apparent that certain metric names found > in keyspace and tables had different names but were in fact the same metric - > they are as follows: > * Table.SyncTime == Keyspace.RepairSyncTime > * Table.RepairedDataTrackingOverreadRows == Keyspace.RepairedOverreadRows > * Table.RepairedDataTrackingOverreadTime == Keyspace.RepairedOverreadTime > * Table.AllMemtablesHeapSize == Keyspace.AllMemtablesOnHeapDataSize > * Table.AllMemtablesOffHeapSize == Keyspace.AllMemtablesOffHeapDataSize > * Table.MemtableOnHeapSize == Keyspace.MemtableOnHeapDataSize > * Table.MemtableOffHeapSize == Keyspace.MemtableOffHeapDataSize > Also, client metrics are the only metrics to start with a lower case letter. > Change those to upper case to match all the other metrics. > Unifying this naming would help make metrics more consistent as part of > CASSANDRA-15582 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-16057) Should update in-jvm dtest to expose stdout and stderr for nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai reassigned CASSANDRA-16057: - Assignee: Yifan Cai > Should update in-jvm dtest to expose stdout and stderr for nodetool > --- > > Key: CASSANDRA-16057 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16057 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest/java >Reporter: David Capwell >Assignee: Yifan Cai >Priority: Normal > > Many nodetool commands output to stdout or stderr so running nodetool using > in-jvm dtest should expose that to tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15909) Make Table/Keyspace Metric Names Consistent With Each Other
[ https://issues.apache.org/jira/browse/CASSANDRA-15909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188004#comment-17188004 ] Caleb Rackliffe edited comment on CASSANDRA-15909 at 9/1/20, 8:25 PM: -- Created CASSANDRA-16090 to track eventual removal of the (new and existing) clutter in {{TableMetrics}}, and CASSANDRA-16094 to address the preexisting flaky test in {{TestIncRepair}}. was (Author: maedhroz): Created CASSANDRA-16090 to track eventual removal of the (new and existing) clutter in {{TableMetrics}}. > Make Table/Keyspace Metric Names Consistent With Each Other > --- > > Key: CASSANDRA-15909 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15909 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Stephen Mallette >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 4.0-beta > > Time Spent: 2h 20m > Remaining Estimate: 0h > > As part of CASSANDRA-15821 it became apparent that certain metric names found > in keyspace and tables had different names but were in fact the same metric - > they are as follows: > * Table.SyncTime == Keyspace.RepairSyncTime > * Table.RepairedDataTrackingOverreadRows == Keyspace.RepairedOverreadRows > * Table.RepairedDataTrackingOverreadTime == Keyspace.RepairedOverreadTime > * Table.AllMemtablesHeapSize == Keyspace.AllMemtablesOnHeapDataSize > * Table.AllMemtablesOffHeapSize == Keyspace.AllMemtablesOffHeapDataSize > * Table.MemtableOnHeapSize == Keyspace.MemtableOnHeapDataSize > * Table.MemtableOffHeapSize == Keyspace.MemtableOffHeapDataSize > Also, client metrics are the only metrics to start with a lower case letter. > Change those to upper case to match all the other metrics. > Unifying this naming would help make metrics more consistent as part of > CASSANDRA-15582 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16094) Flaky Test: TestIncRepair.test_repaired_tracking_with_mismatching_replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-16094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-16094: Labels: dtest incremental_repair repair (was: ) > Flaky Test: TestIncRepair.test_repaired_tracking_with_mismatching_replicas > -- > > Key: CASSANDRA-16094 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16094 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Caleb Rackliffe >Priority: Normal > Labels: dtest, incremental_repair, repair > Fix For: 4.0-beta > > > We have two recent failures for this test on trunk: > 1.) > https://app.circleci.com/pipelines/github/maedhroz/cassandra/102/workflows/37ed8dab-9da4-4730-a883-20b7a99d88b4/jobs/518/tests > (CASSANDRA-15909) > 2.) > https://app.circleci.com/pipelines/github/jolynch/cassandra/6/workflows/41e080e0-d7ff-4256-899e-b4010c6ef5ab/jobs/716/tests > (CASSANDRA-15379) > The test expects there to be mismatches and then read repair executed on a > following SELECT, but either those mismatches aren’t there, read repair isn’t > happening, or both. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16094) Flaky Test: TestIncRepair.test_repaired_tracking_with_mismatching_replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-16094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-16094: Bug Category: Parent values: Correctness(12982)Level 1 values: Test Failure(12990) Complexity: Normal Discovered By: Adhoc Test Fix Version/s: 4.0-beta Severity: Normal Status: Open (was: Triage Needed) > Flaky Test: TestIncRepair.test_repaired_tracking_with_mismatching_replicas > -- > > Key: CASSANDRA-16094 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16094 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Caleb Rackliffe >Priority: Normal > Fix For: 4.0-beta > > > We have two recent failures for this test on trunk: > 1.) > https://app.circleci.com/pipelines/github/maedhroz/cassandra/102/workflows/37ed8dab-9da4-4730-a883-20b7a99d88b4/jobs/518/tests > (CASSANDRA-15909) > 2.) > https://app.circleci.com/pipelines/github/jolynch/cassandra/6/workflows/41e080e0-d7ff-4256-899e-b4010c6ef5ab/jobs/716/tests > (CASSANDRA-15379) > The test expects there to be mismatches and then read repair executed on a > following SELECT, but either those mismatches aren’t there, read repair isn’t > happening, or both. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16094) Flaky Test: TestIncRepair.test_repaired_tracking_with_mismatching_replicas
Caleb Rackliffe created CASSANDRA-16094: --- Summary: Flaky Test: TestIncRepair.test_repaired_tracking_with_mismatching_replicas Key: CASSANDRA-16094 URL: https://issues.apache.org/jira/browse/CASSANDRA-16094 Project: Cassandra Issue Type: Bug Components: Test/dtest/python Reporter: Caleb Rackliffe We have two recent failures for this test on trunk: 1.) https://app.circleci.com/pipelines/github/maedhroz/cassandra/102/workflows/37ed8dab-9da4-4730-a883-20b7a99d88b4/jobs/518/tests (CASSANDRA-15909) 2.) https://app.circleci.com/pipelines/github/jolynch/cassandra/6/workflows/41e080e0-d7ff-4256-899e-b4010c6ef5ab/jobs/716/tests (CASSANDRA-15379) The test expects there to be mismatches and then read repair executed on a following SELECT, but either those mismatches aren’t there, read repair isn’t happening, or both. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15909) Make Table/Keyspace Metric Names Consistent With Each Other
[ https://issues.apache.org/jira/browse/CASSANDRA-15909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188787#comment-17188787 ] Caleb Rackliffe commented on CASSANDRA-15909: - UPDATE: Looking for a second committer +1 here... > Make Table/Keyspace Metric Names Consistent With Each Other > --- > > Key: CASSANDRA-15909 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15909 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Stephen Mallette >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 4.0-beta > > Time Spent: 2h 20m > Remaining Estimate: 0h > > As part of CASSANDRA-15821 it became apparent that certain metric names found > in keyspace and tables had different names but were in fact the same metric - > they are as follows: > * Table.SyncTime == Keyspace.RepairSyncTime > * Table.RepairedDataTrackingOverreadRows == Keyspace.RepairedOverreadRows > * Table.RepairedDataTrackingOverreadTime == Keyspace.RepairedOverreadTime > * Table.AllMemtablesHeapSize == Keyspace.AllMemtablesOnHeapDataSize > * Table.AllMemtablesOffHeapSize == Keyspace.AllMemtablesOffHeapDataSize > * Table.MemtableOnHeapSize == Keyspace.MemtableOnHeapDataSize > * Table.MemtableOffHeapSize == Keyspace.MemtableOffHeapDataSize > Also, client metrics are the only metrics to start with a lower case letter. > Change those to upper case to match all the other metrics. > Unifying this naming would help make metrics more consistent as part of > CASSANDRA-15582 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16093) Cassandra website is building/including the wrong versioned nodetool docs
[ https://issues.apache.org/jira/browse/CASSANDRA-16093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188738#comment-17188738 ] Michael Semb Wever commented on CASSANDRA-16093: `ant realclean` will remove the final in-tree generated docs at {{build/html}} but it doesn't remove the nodetool generated docs at {{doc/source/tools/nodetool}} The docker-compose.yaml file in cassandra-website, rebuilds the versions using the same cassandra directory just switching branches/tags. > Cassandra website is building/including the wrong versioned nodetool docs > - > > Key: CASSANDRA-16093 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16093 > Project: Cassandra > Issue Type: Bug > Components: Documentation/Website >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > > For example > https://cassandra.apache.org/doc/3.11/tools/nodetool/enablefullquerylog.html > shouldn't be under the 3.11 documentation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16093) Cassandra website is building/including the wrong versioned nodetool docs
[ https://issues.apache.org/jira/browse/CASSANDRA-16093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-16093: --- Summary: Cassandra website is building/including the wrong versioned nodetool docs (was: Cassandra website is building the wrong versioned docs) > Cassandra website is building/including the wrong versioned nodetool docs > - > > Key: CASSANDRA-16093 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16093 > Project: Cassandra > Issue Type: Bug > Components: Documentation/Website >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > > For example > https://cassandra.apache.org/doc/3.11/tools/nodetool/enablefullquerylog.html > shouldn't be under the 3.11 documentation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15701) Does Cassandra 3.11.3/3.11.5 is affected by CVE-2019-10712 or not ?
[ https://issues.apache.org/jira/browse/CASSANDRA-15701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188714#comment-17188714 ] Brandon Williams commented on CASSANDRA-15701: -- It's hard to say since no details of that vulnerability seem to be published. CASSANDRA-15867 removed this in 3.11.7 anyway, however. > Does Cassandra 3.11.3/3.11.5 is affected by CVE-2019-10712 or not ? > - > > Key: CASSANDRA-15701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15701 > Project: Cassandra > Issue Type: Bug > Components: Dependencies >Reporter: wht >Priority: Normal > > Because cassandra 3.11.3/3.11.5 rely on jackson-mapper-asl-1.9.13.jar which > has been reported a vulnerability CVE-2019-10172, > [https://nvd.nist.gov/vuln/detail/CVE-2019-10172], so I want to know if it > has an impact to cassandra. Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16093) Cassandra website is building the wrong versioned docs
[ https://issues.apache.org/jira/browse/CASSANDRA-16093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-16093: --- Summary: Cassandra website is building the wrong versioned docs (was: Cassandra docs are building the wrong versioned docs) > Cassandra website is building the wrong versioned docs > -- > > Key: CASSANDRA-16093 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16093 > Project: Cassandra > Issue Type: Bug > Components: Documentation/Website >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > > For example > https://cassandra.apache.org/doc/3.11/tools/nodetool/enablefullquerylog.html > shouldn't be under the 3.11 documentation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16091) rpc server gets wrongly initialized with rpc_enabled:false
[ https://issues.apache.org/jira/browse/CASSANDRA-16091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-16091: - Fix Version/s: 3.11.9 Since Version: 3.11.8 > rpc server gets wrongly initialized with rpc_enabled:false > -- > > Key: CASSANDRA-16091 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16091 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Tom van der Woerdt >Priority: Normal > Fix For: 3.11.9 > > > After upgrading to Cassandra 3.11.8, Cassandra no longer starts. An exception > is thrown: > {code:java} > java.lang.RuntimeException: Client SSL is not supported for non-blocking > sockets (hsha). Please remove client ssl from the configuration. > at > org.apache.cassandra.thrift.THsHaDisruptorServer$Factory.buildTServer(THsHaDisruptorServer.java:74) > at > org.apache.cassandra.thrift.TServerCustomFactory.buildTServer(TServerCustomFactory.java:55) > at > org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.(ThriftServer.java:128) > at org.apache.cassandra.thrift.ThriftServer.start(ThriftServer.java:55) > at > org.apache.cassandra.service.CassandraDaemon.startNativeTransport(CassandraDaemon.java:713) > at > org.apache.cassandra.service.CassandraDaemon.start(CassandraDaemon.java:538) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:643) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:768) > {code} > No configuration changed between 3.11.7 and 3.11.8. rpc_enabled is false in > both versions. > I created this Jira issue because clearly something changed between 3.11.7 > and 3.11.8. Maybe intentional, maybe not. Changing `rpc_server_type` (which > is not clearly documented to be about Thrift only) from `hsha` to `sync` does > resolve the issue, as expected, but this does seem like a regression, hence > the Jira issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16091) rpc server gets wrongly initialized with rpc_enabled:false
[ https://issues.apache.org/jira/browse/CASSANDRA-16091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-16091: - Bug Category: Parent values: Availability(12983)Level 1 values: Unavailable(12994) Complexity: Normal Discovered By: User Report Severity: Normal Status: Open (was: Triage Needed) > rpc server gets wrongly initialized with rpc_enabled:false > -- > > Key: CASSANDRA-16091 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16091 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Tom van der Woerdt >Priority: Normal > > After upgrading to Cassandra 3.11.8, Cassandra no longer starts. An exception > is thrown: > {code:java} > java.lang.RuntimeException: Client SSL is not supported for non-blocking > sockets (hsha). Please remove client ssl from the configuration. > at > org.apache.cassandra.thrift.THsHaDisruptorServer$Factory.buildTServer(THsHaDisruptorServer.java:74) > at > org.apache.cassandra.thrift.TServerCustomFactory.buildTServer(TServerCustomFactory.java:55) > at > org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.(ThriftServer.java:128) > at org.apache.cassandra.thrift.ThriftServer.start(ThriftServer.java:55) > at > org.apache.cassandra.service.CassandraDaemon.startNativeTransport(CassandraDaemon.java:713) > at > org.apache.cassandra.service.CassandraDaemon.start(CassandraDaemon.java:538) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:643) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:768) > {code} > No configuration changed between 3.11.7 and 3.11.8. rpc_enabled is false in > both versions. > I created this Jira issue because clearly something changed between 3.11.7 > and 3.11.8. Maybe intentional, maybe not. Changing `rpc_server_type` (which > is not clearly documented to be about Thrift only) from `hsha` to `sync` does > resolve the issue, as expected, but this does seem like a regression, hence > the Jira issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16072) Reduce thread contention in CommitLogSegment and HintsBuffer by rewriting CAS loops to atomic adds
[ https://issues.apache.org/jira/browse/CASSANDRA-16072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188608#comment-17188608 ] Benjamin Lerer commented on CASSANDRA-16072: Thanks a lot. The patches look go to me. Feel free to commit if CI is green. > Reduce thread contention in CommitLogSegment and HintsBuffer by rewriting CAS > loops to atomic adds > -- > > Key: CASSANDRA-16072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16072 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Hints, Local/Commit Log >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > > Follow up to CASSANDRA-15922 > Both CommitLogSegment and HintsBuffer use AtomicIntegers for the current > offset when allocating. Like in CASSANDRA\-15922 the loops on > {{.compareAndSet(..)}} can be replaced with atomic adds using the {{. > getAndAdd(..)}} method. > In highly contended environments the CAS failures can be high, starving > writes in a running Cassandra node. On the same cluster CASSANDRA\-15922 was > found, after CASSANDRA\-15922's fix was deployed, there was still problems > around commit log flushing and hints. No flamegraph was collected that > demonstrated the thread contention as clearly as was found in > CASSANDRA\-15922, but the performance fix proposed here hopefully is obvious > enough. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15158) Wait for schema agreement rather than in flight schema requests when bootstrapping
[ https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-15158: -- Status: Changes Suggested (was: Review In Progress) > Wait for schema agreement rather than in flight schema requests when > bootstrapping > -- > > Key: CASSANDRA-15158 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15158 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Cluster/Schema >Reporter: Vincent White >Assignee: Blake Eggleston >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > Currently when a node is bootstrapping we use a set of latches > (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of > in-flight schema pull requests, and we don't proceed with > bootstrapping/stream until all the latches are released (or we timeout > waiting for each one). One issue with this is that if we have a large schema, > or the retrieval of the schema from the other nodes was unexpectedly slow > then we have no explicit check in place to ensure we have actually received a > schema before we proceed. > While it's possible to increase "migration_task_wait_in_seconds" to force the > node to wait on each latche longer, there are cases where this doesn't help > because the callbacks for the schema pull requests have expired off the > messaging service's callback map > (org.apache.cassandra.net.MessagingService#callbacks) after > request_timeout_in_ms (default 10 seconds) before the other nodes were able > to respond to the new node. > This patch checks for schema agreement between the bootstrapping node and the > rest of the live nodes before proceeding with bootstrapping. It also adds a > check to prevent the new node from flooding existing nodes with simultaneous > schema pull requests as can happen in large clusters. > Removing the latch system should also prevent new nodes in large clusters > getting stuck for extended amounts of time as they wait > `migration_task_wait_in_seconds` on each of the latches left orphaned by the > timed out callbacks. > > ||3.11|| > |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]| > |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]| > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather than in flight schema requests when bootstrapping
[ https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188581#comment-17188581 ] Aleksey Yeschenko edited comment on CASSANDRA-15158 at 9/1/20, 3:49 PM: Pushed some minor tweaks [here|https://github.com/iamaleksey/cassandra/commits/15158-review]. Made some bits more idiomatic, and changed the way in-flight requests are being kept track of. In general, this does the job and solves the problem in the description. It doesn't, however, fully deal with storms in large clusters caused by a sequence of updates in quick succession, but, it's not intended to, either. EDIT: the amount of synchronisation here bothers me a tiny bit, as all of it will likely have to be eventually gotten rid of, when and if TPC happens, but I can live with it. was (Author: iamaleksey): Pushed some minor tweaks [here|https://github.com/iamaleksey/cassandra/commits/15158-review]. Made some bits more idiomatic, and changed the way in-flight requests are being kept track of. In general, this does the job and solves the problem in the description. It doesn't, however, fully deal with storms in large clusters caused by a sequence of updates in quick succession, but, it's not intended to, either. > Wait for schema agreement rather than in flight schema requests when > bootstrapping > -- > > Key: CASSANDRA-15158 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15158 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Cluster/Schema >Reporter: Vincent White >Assignee: Blake Eggleston >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > Currently when a node is bootstrapping we use a set of latches > (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of > in-flight schema pull requests, and we don't proceed with > bootstrapping/stream until all the latches are released (or we timeout > waiting for each one). One issue with this is that if we have a large schema, > or the retrieval of the schema from the other nodes was unexpectedly slow > then we have no explicit check in place to ensure we have actually received a > schema before we proceed. > While it's possible to increase "migration_task_wait_in_seconds" to force the > node to wait on each latche longer, there are cases where this doesn't help > because the callbacks for the schema pull requests have expired off the > messaging service's callback map > (org.apache.cassandra.net.MessagingService#callbacks) after > request_timeout_in_ms (default 10 seconds) before the other nodes were able > to respond to the new node. > This patch checks for schema agreement between the bootstrapping node and the > rest of the live nodes before proceeding with bootstrapping. It also adds a > check to prevent the new node from flooding existing nodes with simultaneous > schema pull requests as can happen in large clusters. > Removing the latch system should also prevent new nodes in large clusters > getting stuck for extended amounts of time as they wait > `migration_task_wait_in_seconds` on each of the latches left orphaned by the > timed out callbacks. > > ||3.11|| > |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]| > |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]| > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15158) Wait for schema agreement rather than in flight schema requests when bootstrapping
[ https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188581#comment-17188581 ] Aleksey Yeschenko commented on CASSANDRA-15158: --- Pushed some minor tweaks [here|https://github.com/iamaleksey/cassandra/commits/15158-review]. Made some bits more idiomatic, and changed the way in-flight requests are being kept track of. In general, this does the job and solves the problem in the description. It doesn't, however, fully deal with storms in large clusters caused by a sequence of updates in quick succession, but, it's not intended to, either. > Wait for schema agreement rather than in flight schema requests when > bootstrapping > -- > > Key: CASSANDRA-15158 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15158 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Cluster/Schema >Reporter: Vincent White >Assignee: Blake Eggleston >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > Currently when a node is bootstrapping we use a set of latches > (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of > in-flight schema pull requests, and we don't proceed with > bootstrapping/stream until all the latches are released (or we timeout > waiting for each one). One issue with this is that if we have a large schema, > or the retrieval of the schema from the other nodes was unexpectedly slow > then we have no explicit check in place to ensure we have actually received a > schema before we proceed. > While it's possible to increase "migration_task_wait_in_seconds" to force the > node to wait on each latche longer, there are cases where this doesn't help > because the callbacks for the schema pull requests have expired off the > messaging service's callback map > (org.apache.cassandra.net.MessagingService#callbacks) after > request_timeout_in_ms (default 10 seconds) before the other nodes were able > to respond to the new node. > This patch checks for schema agreement between the bootstrapping node and the > rest of the live nodes before proceeding with bootstrapping. It also adds a > check to prevent the new node from flooding existing nodes with simultaneous > schema pull requests as can happen in large clusters. > Removing the latch system should also prevent new nodes in large clusters > getting stuck for extended amounts of time as they wait > `migration_task_wait_in_seconds` on each of the latches left orphaned by the > timed out callbacks. > > ||3.11|| > |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]| > |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]| > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188539#comment-17188539 ] Benjamin Lerer commented on CASSANDRA-15861: This part of the code was obviously broken. Having some tests to ensure that it does not happen again makes total sense. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > Currently, entire-sstable-streaming requires sstable components to be > immutable, because \{{ComponentManifest}} > with component sizes are sent before sending actual files. This isn't a > problem in legacy streaming as STATS file length
[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188529#comment-17188529 ] Stefan Miklosovic commented on CASSANDRA-15861: --- Thanks [~blerer], but anyway, a test as I mentioned would be awesome to see when the code / logic which is touching this has changed little bit and we would be sure it is just computed right. The fact that we found that issue as part of 15406 is telling ... It would also simplify a lot my patch because right now I am literally parsing the output of netstats and I track that these numbers do make sense and the total is eventually equal to that sum. I am not so strong in internals so I am not able to write that "low level" test on my own. But ultimately it is not about me having things easier, but imho we should really test that and this patch seems to be like a good candidate to do it. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair.
[jira] [Updated] (CASSANDRA-16093) Cassandra docs are building the wrong versioned docs
[ https://issues.apache.org/jira/browse/CASSANDRA-16093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-16093: --- Bug Category: Parent values: Correctness(12982)Level 1 values: API / Semantic Definition(13162) Complexity: Normal Discovered By: User Report Severity: Low Status: Open (was: Triage Needed) > Cassandra docs are building the wrong versioned docs > > > Key: CASSANDRA-16093 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16093 > Project: Cassandra > Issue Type: Bug > Components: Documentation/Website >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > > For example > https://cassandra.apache.org/doc/3.11/tools/nodetool/enablefullquerylog.html > shouldn't be under the 3.11 documentation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-16093) Cassandra docs are building the wrong versioned docs
[ https://issues.apache.org/jira/browse/CASSANDRA-16093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever reassigned CASSANDRA-16093: -- Assignee: Michael Semb Wever > Cassandra docs are building the wrong versioned docs > > > Key: CASSANDRA-16093 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16093 > Project: Cassandra > Issue Type: Bug > Components: Documentation/Website >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > > For example > https://cassandra.apache.org/doc/3.11/tools/nodetool/enablefullquerylog.html > shouldn't be under the 3.11 documentation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16093) Cassandra docs are building the wrong versioned docs
Michael Semb Wever created CASSANDRA-16093: -- Summary: Cassandra docs are building the wrong versioned docs Key: CASSANDRA-16093 URL: https://issues.apache.org/jira/browse/CASSANDRA-16093 Project: Cassandra Issue Type: Bug Components: Documentation/Website Reporter: Michael Semb Wever For example https://cassandra.apache.org/doc/3.11/tools/nodetool/enablefullquerylog.html shouldn't be under the 3.11 documentation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16092) Add Index Group Interface for Storage Attached Index
ZhaoYang created CASSANDRA-16092: Summary: Add Index Group Interface for Storage Attached Index Key: CASSANDRA-16092 URL: https://issues.apache.org/jira/browse/CASSANDRA-16092 Project: Cassandra Issue Type: New Feature Components: Feature/SASI Reporter: ZhaoYang [Index group|https://github.com/datastax/cassandra/blob/storage_attached_index/src/java/org/apache/cassandra/index/Index.java#L634] interface allows: * indexes on the same table to receive centralized lifecycle events called secondary index groups. Sharing of data between multiple column indexes on the same table allows SAI disk usage to realise significant space savings over other index implementations. * index-group to analyze user query and provide a query plan that leverages all available indexes within the group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188453#comment-17188453 ] Benjamin Lerer commented on CASSANDRA-15861: [~stefan.miklosovic] CASSANDRA-15406 as been going on for so long that I perfectly understand your frustration and the fact that you are worried to go to square one. The good news is that I am reviewer on both patches and I will do my best to ensure that there are no problem between them. [~jasonstack] found an interesting issue with the missing part in the size calculation. I missed that part by not digging deeper enough in the code history. That would have caused us to reintroduce CASSANDRA-10680. I need to have a new look at that part for CASSANDRA-15406 > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded
[jira] [Commented] (CASSANDRA-15958) org.apache.cassandra.net.ConnectionTest testMessagePurging
[ https://issues.apache.org/jira/browse/CASSANDRA-15958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188398#comment-17188398 ] Aleksey Yeschenko commented on CASSANDRA-15958: --- Ping [~sbtourist] / CASSANDRA-15700 > org.apache.cassandra.net.ConnectionTest testMessagePurging > -- > > Key: CASSANDRA-15958 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15958 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: Adam Holmberg >Priority: Normal > Fix For: 4.0-beta > > > Build: > https://ci-cassandra.apache.org/job/Cassandra-trunk-test/196/testReport/junit/org.apache.cassandra.net/ConnectionTest/testMessagePurging/ > Build: > https://ci-cassandra.apache.org/job/Cassandra-trunk-test/194/testReport/junit/org.apache.cassandra.net/ConnectionTest/testMessagePurging/ > java.util.concurrent.TimeoutException > at org.apache.cassandra.net.AsyncPromise.get(AsyncPromise.java:258) > at org.apache.cassandra.net.FutureDelegate.get(FutureDelegate.java:143) > at > org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:268) > at > org.apache.cassandra.net.ConnectionTest.testManual(ConnectionTest.java:236) > at > org.apache.cassandra.net.ConnectionTest.testMessagePurging(ConnectionTest.java:679) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16072) Reduce thread contention in CommitLogSegment and HintsBuffer by rewriting CAS loops to atomic adds
[ https://issues.apache.org/jira/browse/CASSANDRA-16072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188367#comment-17188367 ] Michael Semb Wever commented on CASSANDRA-16072: Thanks for the feedback [~blerer]. bq. If the slab size is around 1GB, the maximum hint size will be around 500MB. I see now that the slab size can in fact be 2GB, and max mutation size 1GB. This makes the problem worse (even if very edge-case). {{HintsBuffer.position}} has been changed to {{AtomicLong}} bq. Regarding CommitLogSegment, it will be good to have a comment explaining the negative value logic. Done. Two comments added, explaining when the overflow is harmless, and when it isn't and hence the cast to long. Patches - [3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...thelastpickle:mck/cassandra-3.11_cas_improvements] with CI [run|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/301/pipeline] - [trunk|https://github.com/apache/cassandra/compare/trunk...thelastpickle:mck/trunk_cas_improvements] with CI [run|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/303/pipeline] > Reduce thread contention in CommitLogSegment and HintsBuffer by rewriting CAS > loops to atomic adds > -- > > Key: CASSANDRA-16072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16072 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Hints, Local/Commit Log >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > > Follow up to CASSANDRA-15922 > Both CommitLogSegment and HintsBuffer use AtomicIntegers for the current > offset when allocating. Like in CASSANDRA\-15922 the loops on > {{.compareAndSet(..)}} can be replaced with atomic adds using the {{. > getAndAdd(..)}} method. > In highly contended environments the CAS failures can be high, starving > writes in a running Cassandra node. On the same cluster CASSANDRA\-15922 was > found, after CASSANDRA\-15922's fix was deployed, there was still problems > around commit log flushing and hints. No flamegraph was collected that > demonstrated the thread contention as clearly as was found in > CASSANDRA\-15922, but the performance fix proposed here hopefully is obvious > enough. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra-website] branch master updated (5f47e16 -> 85519e8)
This is an automated email from the ASF dual-hosted git repository. mck pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/cassandra-website.git. from 5f47e16 Add 3.11.8, 3.11.9 4.0-beta2, 4.0-beta3 add 85519e8 add generated docs for 3.11.9 and 4.0-beta2 No new revisions were added by this update. Summary of changes: src/doc/3.11 |2 +- src/doc/3.11.9/.buildinfo |4 + .../_images/eclipse_debug0.png | Bin .../_images/eclipse_debug1.png | Bin .../_images/eclipse_debug2.png | Bin .../_images/eclipse_debug3.png | Bin .../_images/eclipse_debug4.png | Bin .../_images/eclipse_debug5.png | Bin .../_images/eclipse_debug6.png | Bin .../_sources/architecture/dynamo.rst.txt |0 .../_sources/architecture/guarantees.rst.txt |0 .../_sources/architecture/index.rst.txt|0 .../_sources/architecture/overview.rst.txt |0 .../_sources/architecture/storage_engine.rst.txt |0 src/doc/{3.11.8 => 3.11.9}/_sources/bugs.rst.txt |0 .../configuration/cassandra_config_file.rst.txt| 1920 ++ .../_sources/configuration/index.rst.txt |0 .../{3.11.8 => 3.11.9}/_sources/contactus.rst.txt |0 .../_sources/cql/appendices.rst.txt|0 .../_sources/cql/changes.rst.txt |0 .../{3.11.8 => 3.11.9}/_sources/cql/ddl.rst.txt|0 .../_sources/cql/definitions.rst.txt |0 .../{4.0-beta2 => 3.11.9}/_sources/cql/dml.rst.txt |0 .../_sources/cql/functions.rst.txt |0 .../{3.11.8 => 3.11.9}/_sources/cql/index.rst.txt |0 .../_sources/cql/indexes.rst.txt |0 .../_sources/cql/json.rst.txt |0 .../{3.11.8 => 3.11.9}/_sources/cql/mvs.rst.txt|0 .../_sources/cql/security.rst.txt |0 .../_sources/cql/triggers.rst.txt |0 .../_sources/cql/types.rst.txt |0 .../_sources/data_modeling/index.rst.txt |0 .../_sources/development/code_style.rst.txt|0 .../_sources/development/how_to_commit.rst.txt |0 .../_sources/development/how_to_review.rst.txt |0 .../_sources/development/ide.rst.txt |0 .../_sources/development/index.rst.txt |0 .../_sources/development/patches.rst.txt |0 .../_sources/development/testing.rst.txt |0 .../{3.11.8 => 3.11.9}/_sources/faq/index.rst.txt |0 .../_sources/getting_started/configuring.rst.txt |0 .../_sources/getting_started/drivers.rst.txt |0 .../_sources/getting_started/index.rst.txt |0 .../_sources/getting_started/installing.rst.txt|0 .../_sources/getting_started/querying.rst.txt |0 src/doc/{3.11.8 => 3.11.9}/_sources/index.rst.txt |0 .../_sources/operating/backups.rst.txt |0 .../_sources/operating/bloom_filters.rst.txt |0 .../_sources/operating/bulk_loading.rst.txt|0 .../_sources/operating/cdc.rst.txt |0 .../_sources/operating/compaction.rst.txt |0 .../_sources/operating/compression.rst.txt |0 .../_sources/operating/hardware.rst.txt|0 .../_sources/operating/hints.rst.txt |0 .../_sources/operating/index.rst.txt |0 src/doc/3.11.9/_sources/operating/metrics.rst.txt | 710 +++ .../_sources/operating/read_repair.rst.txt |0 .../_sources/operating/repair.rst.txt |0 .../_sources/operating/security.rst.txt|0 .../_sources/operating/snitch.rst.txt |0 .../_sources/operating/topo_changes.rst.txt|0 .../_sources/tools/cqlsh.rst.txt |0 .../_sources/tools/index.rst.txt |0 .../_sources/tools/nodetool.rst.txt|0 .../_sources/tools/nodetool/assassinate.rst.txt|0 .../_sources/tools/nodetool/bootstrap.rst.txt |0 .../_sources/tools/nodetool/cleanup.rst.txt|0 .../_sources/tools/nodetool/clearsnapshot.rst.txt |0 .../_sources/tools/nodetool/clientstats.rst.txt|0 .../_sources/tools/nodetool/compact.rst.txt|0 .../tools/nodetool/compactionhistory.rst.txt |0 .../tools/nodetool/compactionstats.rst.txt |0 .../_sources/tools/nodetool/decommission.rst.txt |0 .../tools/nodetool/describecluster.rst.txt |0 .../_sources/tools/nodetool/describering.rst.txt |0 .../tools/nodetool/disableauditlog.rst.txt |0 .../tools/nodetool/disableautocompaction.rst.txt |0 .../_sources/tools/nodetool/disablebackup.rst.txt |
[jira] [Updated] (CASSANDRA-16091) rpc server gets wrongly initialized with rpc_enabled:false
[ https://issues.apache.org/jira/browse/CASSANDRA-16091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom van der Woerdt updated CASSANDRA-16091: --- Description: After upgrading to Cassandra 3.11.8, Cassandra no longer starts. An exception is thrown: {code:java} java.lang.RuntimeException: Client SSL is not supported for non-blocking sockets (hsha). Please remove client ssl from the configuration. at org.apache.cassandra.thrift.THsHaDisruptorServer$Factory.buildTServer(THsHaDisruptorServer.java:74) at org.apache.cassandra.thrift.TServerCustomFactory.buildTServer(TServerCustomFactory.java:55) at org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.(ThriftServer.java:128) at org.apache.cassandra.thrift.ThriftServer.start(ThriftServer.java:55) at org.apache.cassandra.service.CassandraDaemon.startNativeTransport(CassandraDaemon.java:713) at org.apache.cassandra.service.CassandraDaemon.start(CassandraDaemon.java:538) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:643) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:768) {code} No configuration changed between 3.11.7 and 3.11.8. rpc_enabled is false in both versions. I created this Jira issue because clearly something changed between 3.11.7 and 3.11.8. Maybe intentional, maybe not. Changing `rpc_server_type` (which is not clearly documented to be about Thrift only) from `hsha` to `sync` does resolve the issue, as expected, but this does seem like a regression, hence the Jira issue. was: After upgrading to Cassandra 3.11.8, Cassandra no longer starts. An exception is thrown: {code:java} java.lang.RuntimeException: Client SSL is not supported for non-blocking sockets (hsha). Please remove client ssl from the configuration. at org.apache.cassandra.thrift.THsHaDisruptorServer$Factory.buildTServer(THsHaDisruptorServer.java:74) at org.apache.cassandra.thrift.TServerCustomFactory.buildTServer(TServerCustomFactory.java:55) at org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.(ThriftServer.java:128) at org.apache.cassandra.thrift.ThriftServer.start(ThriftServer.java:55) at org.apache.cassandra.service.CassandraDaemon.startNativeTransport(CassandraDaemon.java:713) at org.apache.cassandra.service.CassandraDaemon.start(CassandraDaemon.java:538) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:643) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:768) {code} No configuration changed between 3.11.7 and 3.11.8. rpc_enabled is false in both versions. > rpc server gets wrongly initialized with rpc_enabled:false > -- > > Key: CASSANDRA-16091 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16091 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Tom van der Woerdt >Priority: Normal > > After upgrading to Cassandra 3.11.8, Cassandra no longer starts. An exception > is thrown: > {code:java} > java.lang.RuntimeException: Client SSL is not supported for non-blocking > sockets (hsha). Please remove client ssl from the configuration. > at > org.apache.cassandra.thrift.THsHaDisruptorServer$Factory.buildTServer(THsHaDisruptorServer.java:74) > at > org.apache.cassandra.thrift.TServerCustomFactory.buildTServer(TServerCustomFactory.java:55) > at > org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.(ThriftServer.java:128) > at org.apache.cassandra.thrift.ThriftServer.start(ThriftServer.java:55) > at > org.apache.cassandra.service.CassandraDaemon.startNativeTransport(CassandraDaemon.java:713) > at > org.apache.cassandra.service.CassandraDaemon.start(CassandraDaemon.java:538) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:643) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:768) > {code} > No configuration changed between 3.11.7 and 3.11.8. rpc_enabled is false in > both versions. > I created this Jira issue because clearly something changed between 3.11.7 > and 3.11.8. Maybe intentional, maybe not. Changing `rpc_server_type` (which > is not clearly documented to be about Thrift only) from `hsha` to `sync` does > resolve the issue, as expected, but this does seem like a regression, hence > the Jira issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16091) rpc server gets wrongly initialized with rpc_enabled:false
Tom van der Woerdt created CASSANDRA-16091: -- Summary: rpc server gets wrongly initialized with rpc_enabled:false Key: CASSANDRA-16091 URL: https://issues.apache.org/jira/browse/CASSANDRA-16091 Project: Cassandra Issue Type: Bug Components: Local/Config Reporter: Tom van der Woerdt After upgrading to Cassandra 3.11.8, Cassandra no longer starts. An exception is thrown: {code:java} java.lang.RuntimeException: Client SSL is not supported for non-blocking sockets (hsha). Please remove client ssl from the configuration. at org.apache.cassandra.thrift.THsHaDisruptorServer$Factory.buildTServer(THsHaDisruptorServer.java:74) at org.apache.cassandra.thrift.TServerCustomFactory.buildTServer(TServerCustomFactory.java:55) at org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.(ThriftServer.java:128) at org.apache.cassandra.thrift.ThriftServer.start(ThriftServer.java:55) at org.apache.cassandra.service.CassandraDaemon.startNativeTransport(CassandraDaemon.java:713) at org.apache.cassandra.service.CassandraDaemon.start(CassandraDaemon.java:538) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:643) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:768) {code} No configuration changed between 3.11.7 and 3.11.8. rpc_enabled is false in both versions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra-builds] branch master updated: ninja-fix: in finish_release.sh fix quote character in echo'd instructions at end
This is an automated email from the ASF dual-hosted git repository. mck pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/cassandra-builds.git The following commit(s) were added to refs/heads/master by this push: new 4b6caff ninja-fix: in finish_release.sh fix quote character in echo'd instructions at end 4b6caff is described below commit 4b6caffd63b635eaf330ba0232095f7863e946c9 Author: Mick Semb Wever AuthorDate: Tue Sep 1 12:35:02 2020 +0200 ninja-fix: in finish_release.sh fix quote character in echo'd instructions at end --- cassandra-release/finish_release.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cassandra-release/finish_release.sh b/cassandra-release/finish_release.sh index ce4cf5c..45aea38 100755 --- a/cassandra-release/finish_release.sh +++ b/cassandra-release/finish_release.sh @@ -262,4 +262,4 @@ echo ' 9) tweet from @cassandra' echo ' 10) release version in JIRA' echo ' 11) remove old version (eg: `svn rm https://dist.apache.org/repos/dist/release/cassandra/`)' echo ' 12) increment build.xml base.version for the next release' -echo ' 13) follow instructions in email you will receive from the \"Apache Reporter Service\" to update the project\'s list of releases in reporter.apache.org' +echo ' 13) follow instructions in email you will receive from the \"Apache Reporter Service\" to update the project`s list of releases in reporter.apache.org' - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15991) 15583 - Add UX tests to intree LHF tooling
[ https://issues.apache.org/jira/browse/CASSANDRA-15991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188324#comment-17188324 ] Berenguer Blasi commented on CASSANDRA-15991: - Thanks a lot, really appreciated. I'll comment on the PR. > 15583 - Add UX tests to intree LHF tooling > -- > > Key: CASSANDRA-15991 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15991 > Project: Cassandra > Issue Type: Improvement > Components: Test/unit >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-beta > > > As per CASSANDRA-15583 many in tree tools lack proper UX tooling: mandatory > params are indeed mandatory, 'help' produces an actual help, return codes etc > This ticket is an attempt to add it to those tools that classify as LHF. > Other tools such as nodetool, with many sub-commands, deserve a separate > ticket of their own -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16072) Reduce thread contention in CommitLogSegment and HintsBuffer by rewriting CAS loops to atomic adds
[ https://issues.apache.org/jira/browse/CASSANDRA-16072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188321#comment-17188321 ] Benjamin Lerer commented on CASSANDRA-16072: If we do some rough calculation on the limits of things for {{HintBuffer}}: If the slab size is around {{1GB}}, the maximum hint size will be around {{500MB}}. Once we exceed the slab capacity the {{position}} will be set to {{-2GB}}. It will only take 5 hints at a size close from the maximum hint size to be back on the positive side. It seems to me that we are on a risky side. What do you think? Regarding {{CommitLogSegment}}, it will be good to have a comment explaining the negative value logic. People might not get it straight away as it is an obvious way to handle overflow. > Reduce thread contention in CommitLogSegment and HintsBuffer by rewriting CAS > loops to atomic adds > -- > > Key: CASSANDRA-16072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16072 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Hints, Local/Commit Log >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > > Follow up to CASSANDRA-15922 > Both CommitLogSegment and HintsBuffer use AtomicIntegers for the current > offset when allocating. Like in CASSANDRA\-15922 the loops on > {{.compareAndSet(..)}} can be replaced with atomic adds using the {{. > getAndAdd(..)}} method. > In highly contended environments the CAS failures can be high, starving > writes in a running Cassandra node. On the same cluster CASSANDRA\-15922 was > found, after CASSANDRA\-15922's fix was deployed, there was still problems > around commit log flushing and hints. No flamegraph was collected that > demonstrated the thread contention as clearly as was found in > CASSANDRA\-15922, but the performance fix proposed here hopefully is obvious > enough. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15701) Does Cassandra 3.11.3/3.11.5 is affected by CVE-2019-10712 or not ?
[ https://issues.apache.org/jira/browse/CASSANDRA-15701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188281#comment-17188281 ] Tigran Mardanyan commented on CASSANDRA-15701: -- Sorry guys, but my security scanners warn about this as well. It seems Apache Cassandra is not affected but need confirmation from you experts. If this is not correct place to talk about this issue, please direct me to the correct platform where I can ask for impact analyze. Any updated on this much appreciated. > Does Cassandra 3.11.3/3.11.5 is affected by CVE-2019-10712 or not ? > - > > Key: CASSANDRA-15701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15701 > Project: Cassandra > Issue Type: Bug > Components: Dependencies >Reporter: wht >Priority: Normal > > Because cassandra 3.11.3/3.11.5 rely on jackson-mapper-asl-1.9.13.jar which > has been reported a vulnerability CVE-2019-10172, > [https://nvd.nist.gov/vuln/detail/CVE-2019-10172], so I want to know if it > has an impact to cassandra. Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188181#comment-17188181 ] Stefan Miklosovic edited comment on CASSANDRA-15861 at 9/1/20, 6:48 AM: I understand that this is the most probably just more important patch than my "percentage tracking" where I am just accidentally fixing some size-related bug and thats life ... But I would really like to see the test where we are checking this properly because netstats can again report some weird numbers and we are back in square one. The ideal test would look like - generate some sstables compressed / uncompressed and try to stream it in their entirety / without and check that the total sizes to be streamed are equal to sum of sizes of all individual tables. For all 4 combinations. I have checked your PR and I dont see this kind of test and I am worried that it will be not be matching again. was (Author: stefan.miklosovic): I understand that this is the most probably just more important patch then my "percentage tracking" where I am just accidentally fixing some size-related bug and thats life ... But I would really like to see the test where we are checking this properly because netstats can again report some weird numbers and we are back in square one. The ideal test would look like - generate some sstables compressed / uncompressed and try to stream it in their entirety / without and check that the total sizes to be streamed are equal to sum of sizes of all individual tables. For all 4 combinations. I have checked your PR and I dont see this kind of test and I am worried that it will be not be matching again. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what
[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188181#comment-17188181 ] Stefan Miklosovic commented on CASSANDRA-15861: --- I understand that this is the most probably just more important patch that my "percentage tracking" where I am just accidentally fixing some size-related bug and thats life ... But I would really like to see the test where we are checking this properly because netstats can again report some weird numbers and we are back in square one. The ideal test would look like - generate some sstables compressed / uncompressed and try to stream it in their entirety / without and check that the total sizes to be streamed are equal to sum of sizes of all individual tables. For all 4 combinations. I have checked your PR and I dont see this kind of test and I am worried that it will be not be matching again. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more >
[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188181#comment-17188181 ] Stefan Miklosovic edited comment on CASSANDRA-15861 at 9/1/20, 6:48 AM: I understand that this is the most probably just more important patch then my "percentage tracking" where I am just accidentally fixing some size-related bug and thats life ... But I would really like to see the test where we are checking this properly because netstats can again report some weird numbers and we are back in square one. The ideal test would look like - generate some sstables compressed / uncompressed and try to stream it in their entirety / without and check that the total sizes to be streamed are equal to sum of sizes of all individual tables. For all 4 combinations. I have checked your PR and I dont see this kind of test and I am worried that it will be not be matching again. was (Author: stefan.miklosovic): I understand that this is the most probably just more important patch that my "percentage tracking" where I am just accidentally fixing some size-related bug and thats life ... But I would really like to see the test where we are checking this properly because netstats can again report some weird numbers and we are back in square one. The ideal test would look like - generate some sstables compressed / uncompressed and try to stream it in their entirety / without and check that the total sizes to be streamed are equal to sum of sizes of all individual tables. For all 4 combinations. I have checked your PR and I dont see this kind of test and I am worried that it will be not be matching again. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what
[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188178#comment-17188178 ] ZhaoYang edited comment on CASSANDRA-15861 at 9/1/20, 6:46 AM: --- bq. Could you elaborate on what needs to be changed specifically in mine code so it will be fully ok again? hmm.. some moved codes in {{CassandraOutgoingFiles}}. What are you worried about? bq. Are you already sure that your changes are computing sizes in both compressed and uncompressed paths right? this patch is for zero-copy-streaming to avoid partial written files, not about how size is calculated. The change in {{CassandraStreamHeder}} around compressed size is to restore original behavior (reduce GC) before storege-engine refactoring. was (Author: jasonstack): bq. Could you elaborate on what needs to be changed specifically in mine code so it will be fully ok again? hmm.. some moved codes in {{CassandraOutgoingFiles}}. What are you worried about? bq. Are you already sure that your changes are computing sizes in both compressed and uncompressed paths right? this patch is for zero-copy-streaming to avoid partial written files, not about how size is calculated. The change in {{CassandraStreamHeder}} around compressed size is to restore original behavior before storege-engine refactoring. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to
[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188178#comment-17188178 ] ZhaoYang edited comment on CASSANDRA-15861 at 9/1/20, 6:45 AM: --- bq. Could you elaborate on what needs to be changed specifically in mine code so it will be fully ok again? hmm.. some moved codes in {{CassandraOutgoingFiles}}. What are you worried about? bq. Are you already sure that your changes are computing sizes in both compressed and uncompressed paths right? this patch is for zero-copy-streaming to avoid partial written files, not about how size is calculated. The change in {{CassandraStreamHeder}} around compressed size is to restore original behavior before storege-engine refactoring. was (Author: jasonstack): bq. Could you elaborate on what needs to be changed specifically in mine code so it will be fully ok again? hmm.. some moved codes in {{CassandraOutgoingFiles}}. What are you worried about? bq. Are you already sure that your changes are computing sizes in both compressed and uncompressed paths right? this patch is for zero-copy-streaming to avoid partial written files, not about how size is calculated. The change in {{ComponentMetadata}} around compressed size is to restore original behavior before storege-engine refactoring. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast
[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188178#comment-17188178 ] ZhaoYang edited comment on CASSANDRA-15861 at 9/1/20, 6:44 AM: --- bq. Could you elaborate on what needs to be changed specifically in mine code so it will be fully ok again? hmm.. some moved codes in {{CassandraOutgoingFiles}}. What are you worried about? bq. Are you already sure that your changes are computing sizes in both compressed and uncompressed paths right? this patch is for zero-copy-streaming to avoid partial written files, not about how size is calculated. The change in {{ComponentMetadata}} around compressed size is to restore original behavior before storege-engine refactoring. was (Author: jasonstack): bq. Could you elaborate on what needs to be changed specifically in mine code so it will be fully ok again? hmm.. some moved codes in {{CassandraOutgoingFiles}}. What are you worried about? bq. Are you already sure that your changes are computing sizes in both compressed and uncompressed paths right? this patch is for zero-copy-streaming to avoid partial written files, not about how size is calculated. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its
[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188178#comment-17188178 ] ZhaoYang commented on CASSANDRA-15861: -- bq. Could you elaborate on what needs to be changed specifically in mine code so it will be fully ok again? hmm.. some moved codes in {{CassandraOutgoingFiles}}. What are you worried about? bq. Are you already sure that your changes are computing sizes in both compressed and uncompressed paths right? this patch is for zero-copy-streaming to avoid partial written files, not about how size is calculated. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in >
[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188172#comment-17188172 ] Stefan Miklosovic commented on CASSANDRA-15861: --- Could you elaborate on what needs to be changed specifically in mine code so it will be fully ok again? Are you already sure that your changes are computing sizes in both compressed and uncompressed paths right? (with and without entire sstable streaming) > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > Currently, entire-sstable-streaming requires sstable components to be > immutable, because
[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188166#comment-17188166 ] ZhaoYang commented on CASSANDRA-15861: -- [~stefan.miklosovic] I took a brief look at the patch in 15406, I think there are just minor superficial conflicts, no compatibility issue. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > Currently, entire-sstable-streaming requires sstable components to be > immutable, because \{{ComponentManifest}} > with component sizes are sent before sending actual files. This isn't a > problem in legacy streaming as STATS file
[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188162#comment-17188162 ] Stefan Miklosovic edited comment on CASSANDRA-15861 at 9/1/20, 6:21 AM: This clashes with https://issues.apache.org/jira/browse/CASSANDRA-15406 I am working on for a very long time and I havent been able to manage to merge it. There seems to be a clash with some functionality related to how size of files is computed because it is broken and it reports wrong numbers in netstats. Could somebody verify that this patch is compatible with 15406 and it does not break things? It is pretty frustrating to continuously rewrite already fully prepared and tested code. The problem with the original code is nicely summarized in this comment downwards and input from Benjamin Lerer https://issues.apache.org/jira/browse/CASSANDRA-15406?focusedCommentId=17181389=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17181389 was (Author: stefan.miklosovic): This clashes with https://issues.apache.org/jira/browse/CASSANDRA-15406 I am working on for a very long time and I havent been able to manage to merge it. There seems to be a clash with some functionality related to how size of files is computed because it is broken and it reports wrong numbers in netstats. Could somebody verify that this patch is compatible with 15406 and it does not break things? It is pretty frustrating to continuously rewrite already fully prepared and tested code. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2.
[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188162#comment-17188162 ] Stefan Miklosovic commented on CASSANDRA-15861: --- This clashes with https://issues.apache.org/jira/browse/CASSANDRA-15406 I am working on for a very long time and I havent been able to manage to merge it. There seems to be a clash with some functionality related to how size of files is computed because it is broken and it reports wrong numbers in netstats. Could somebody verify that this patch is compatible with 15406 and it does not break things? It is pretty frustrating to continuously rewrite already fully prepared and tested code. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation