[jira] [Commented] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149099#comment-17149099 ] David Capwell commented on CASSANDRA-15579: --- Also membership changes > 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, > and Read Repair > > > Key: CASSANDRA-15579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15579 > Project: Cassandra > Issue Type: Task > Components: Test/unit >Reporter: Josh McKenzie >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Blake Eggleston* > Testing in this area focuses on non-node-local aspects of the read-write > path: coordination, replication, read repair, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149093#comment-17149093 ] David Capwell commented on CASSANDRA-15579: --- One thing to say, Cassandra tests tend to lack failure mode testing, so would be good to start looking into where things could fail and if we have tests to handle them; thats what I was doing for repair. We also have issues with upgrades, and issues with older SSTable formats. Another thing to look into is the interaction between different features, if enabling/disabling a feature interacts with something, make sure we include testing with it (and failures there). > 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, > and Read Repair > > > Key: CASSANDRA-15579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15579 > Project: Cassandra > Issue Type: Task > Components: Test/unit >Reporter: Josh McKenzie >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Blake Eggleston* > Testing in this area focuses on non-node-local aspects of the read-write > path: coordination, replication, read repair, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15792) test_speculative_data_request - read_repair_test.TestSpeculativeReadRepair
[ https://issues.apache.org/jira/browse/CASSANDRA-15792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianluca Righetto updated CASSANDRA-15792: -- Status: Patch Available (was: In Progress) > test_speculative_data_request - read_repair_test.TestSpeculativeReadRepair > -- > > Key: CASSANDRA-15792 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15792 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Ekaterina Dimitrova >Assignee: Gianluca Righetto >Priority: Normal > Fix For: 4.0-beta > > > Failing on the latest trunk here: > https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/127/workflows/dfba669d-4a5c-4553-b6a2-85647d0d8d2b/jobs/668/tests > Failing once in 30 times as per Jenkins: > https://jenkins-cm4.apache.org/job/Cassandra-trunk-dtest/69/testReport/dtest.read_repair_test/TestSpeculativeReadRepair/test_speculative_data_request/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149079#comment-17149079 ] Caleb Rackliffe commented on CASSANDRA-15579: - It also seems like this could really leverage CASSANDRA-15348. > 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, > and Read Repair > > > Key: CASSANDRA-15579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15579 > Project: Cassandra > Issue Type: Task > Components: Test/unit >Reporter: Josh McKenzie >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Blake Eggleston* > Testing in this area focuses on non-node-local aspects of the read-write > path: coordination, replication, read repair, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149076#comment-17149076 ] Caleb Rackliffe commented on CASSANDRA-15579: - [~adelapena] I might have some cycles to help here if there's enough work to split up. > 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, > and Read Repair > > > Key: CASSANDRA-15579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15579 > Project: Cassandra > Issue Type: Task > Components: Test/unit >Reporter: Josh McKenzie >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Blake Eggleston* > Testing in this area focuses on non-node-local aspects of the read-write > path: coordination, replication, read repair, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149074#comment-17149074 ] Caleb Rackliffe edited comment on CASSANDRA-15579 at 7/1/20, 3:33 AM: -- Yeah, there's also the {{AbstractReadRepairTest}} subclasses, two versions of {{ReadRepairTest}} in different packages, {{MixedModeReadRepairTest}} (which looks pretty sparse in terms of its version combinations?), and {{SimpleReadWriteTest}}. It might make more sense to work on CASSANDRA-14697 than include transient replication in this ticket, but not sure what everyone else thinks... was (Author: maedhroz): Yeah, there's also the {{AbstractReadRepairTest}} subclasses, two versions of {{ReadRepairTest}} in different packages, {{MixedModeReadRepairTest}} (which looks pretty sparse in terms of its version combinations?), and {{SimpleReadWriteTest}}. It might make more sense to work on CASSANDRA-14697 than include transient replication here, but not sure what everyone else thinks... > 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, > and Read Repair > > > Key: CASSANDRA-15579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15579 > Project: Cassandra > Issue Type: Task > Components: Test/unit >Reporter: Josh McKenzie >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Blake Eggleston* > Testing in this area focuses on non-node-local aspects of the read-write > path: coordination, replication, read repair, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149074#comment-17149074 ] Caleb Rackliffe commented on CASSANDRA-15579: - Yeah, there's also the {{AbstractReadRepairTest}} subclasses, two versions of {{ReadRepairTest}} in different packages, {{MixedModeReadRepairTest}} (which looks pretty sparse in terms of its version combinations?), and {{SimpleReadWriteTest}}. It might make more sense to work on CASSANDRA-14697 than include transient replication here, but not sure what everyone else thinks... > 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, > and Read Repair > > > Key: CASSANDRA-15579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15579 > Project: Cassandra > Issue Type: Task > Components: Test/unit >Reporter: Josh McKenzie >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Blake Eggleston* > Testing in this area focuses on non-node-local aspects of the read-write > path: coordination, replication, read repair, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL
[ https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149071#comment-17149071 ] ZhaoYang commented on CASSANDRA-15900: -- rebased and submit another round of ci: [j8|https://circleci.com/workflow-run/cdf55335-c876-450b-8bf9-1d778a2df806] and [j11|https://circleci.com/workflow-run/2080f225-f689-4243-ad67-288bef608640] > Close channel and reduce buffer allocation during entire sstable streaming > with SSL > --- > > Key: CASSANDRA-15900 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15900 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk > file into user-space off-heap buffer when SSL is enabled, because netty > doesn't support zero-copy with SSL. > But there are two issues: > # file channel is not closed. > # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, > thus it's all allocated outside the pool and will cause large amount of > allocations. > [Patch|https://github.com/apache/cassandra/pull/651]: > # close file channel when the last batch is loaded into off-heap bytebuffer. > I don't think we need to wait until buffer is flushed by netty. > # reduce the batch to 64kb which is more buffer pool friendly when streaming > entire sstable with SSL. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL
[ https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149066#comment-17149066 ] Caleb Rackliffe commented on CASSANDRA-15900: - Let's see... {{test_restart_node_localhost - pushed_notifications_test.TestPushedNotifications}} should have been addressed by CASSANDRA-15677 a few days ago. {{test_describe - cqlsh_tests.test_cqlsh.TestCqlsh}} and its materialized view equivalent have a history of flakiness, and don't look directly related to this patch. (Is there an issue around {{read_repair}} showing up in the table DDL where it isn't expected?) > Close channel and reduce buffer allocation during entire sstable streaming > with SSL > --- > > Key: CASSANDRA-15900 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15900 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk > file into user-space off-heap buffer when SSL is enabled, because netty > doesn't support zero-copy with SSL. > But there are two issues: > # file channel is not closed. > # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, > thus it's all allocated outside the pool and will cause large amount of > allocations. > [Patch|https://github.com/apache/cassandra/pull/651]: > # close file channel when the last batch is loaded into off-heap bytebuffer. > I don't think we need to wait until buffer is flushed by netty. > # reduce the batch to 64kb which is more buffer pool friendly when streaming > entire sstable with SSL. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15907) Operational Improvements & Hardening for Replica Filtering Protection
[ https://issues.apache.org/jira/browse/CASSANDRA-15907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-15907: Description: CASSANDRA-8272 uses additional space on the heap to ensure correctness for 2i and filtering queries at consistency levels above ONE/LOCAL_ONE. There are a few things we should follow up on, however, to make life a bit easier for operators and generally de-risk usage: (Note: Line numbers are based on {{trunk}} as of {{3cfe3c9f0dcf8ca8b25ad111800a21725bf152cb}}.) *Minor Optimizations* * {{ReplicaFilteringProtection:114}} - Given we size them up-front, we may be able to use simple arrays instead of lists for {{rowsToFetch}} and {{originalPartitions}}. Alternatively (or also), we may be able to null out references in these two collections more aggressively. (ex. Using {{ArrayList#set()}} instead of {{get()}} in {{queryProtectedPartitions()}}, assuming we pass {{toFetch}} as an argument to {{querySourceOnKey()}}.) * {{ReplicaFilteringProtection:323}} - We may be able to use {{EncodingStats.merge()}} and remove the custom {{stats()}} method. * {{DataResolver:111 & 228}} - Cache an instance of {{UnaryOperator#identity()}} instead of creating one on the fly. * {{ReplicaFilteringProtection:217}} - We may be able to scatter/gather rather than serially querying every row that needs to be completed. This isn't a clear win perhaps, given it targets the latency of single queries and adds some complexity. (Certainly a decent candidate to kick even out of this issue.) *Documentation and Intelligibility* * There are a few places (CHANGES.txt, tracing output in {{ReplicaFilteringProtection}}, etc.) where we mention "replica-side filtering protection" (which makes it seem like the coordinator doesn't filter) rather than "replica filtering protection" (which sounds more like what we actually do, which is protect ourselves against incorrect replica filtering results). It's a minor fix, but would avoid confusion. * The method call chain in {{DataResolver}} might be a bit simpler if we put the {{repairedDataTracker}} in {{ResolveContext}}. *Testing* * I want to bite the bullet and get some basic tests for RFP (including any guardrails we might add here) onto the in-JVM dtest framework. *Guardrails* * As it stands, we don't have a way to enforce an upper bound on the memory usage of {{ReplicaFilteringProtection}} which caches row responses from the first round of requests. (Remember, these are later used to merged with the second round of results to complete the data for filtering.) Operators will likely need a way to protect themselves, i.e. simply fail queries if they hit a particular threshold rather than GC nodes into oblivion. (Having control over limits and page sizes doesn't quite get us there, because stale results _expand_ the number of incomplete results we must cache.) The fun question is how we do this, with the primary axes being scope (per-query, global, etc.) and granularity (per-partition, per-row, per-cell, actual heap usage, etc.). My starting disposition on the right trade-off between performance/complexity and accuracy is having something along the lines of cached rows per query. Prior art suggests this probably makes sense alongside things like {{tombstone_failure_threshold}} in {{cassandra.yaml}}. was: CASSANDRA-8272 uses additional space on the heap to ensure correctness for 2i and filtering queries at consistency levels above ONE/LOCAL_ONE. There are a few things we should follow up on, however, to make life a bit easier for operators and generally de-risk usage: (Note: Line numbers are based on {{trunk}} as of {{3cfe3c9f0dcf8ca8b25ad111800a21725bf152cb}}.) *Minor Optimizations* * {{ReplicaFilteringProtection:114}} - Given we size them up-front, we may be able to use simple arrays instead of lists for {{rowsToFetch}} and {{originalPartitions}}. Alternatively (or also), we may be able to null out references in these two collections more aggressively. (ex. Using {{ArrayList#set()}} instead of {{get()}} in {{queryProtectedPartitions()}}, assuming we pass {{toFetch}} as an argument to {{querySourceOnKey()}}.) * {{ReplicaFilteringProtection:323}} - We may be able to use {{EncodingStats.merge()}} and remove the custom {{stats()}} method. * {{DataResolver:111 & 228}} - Cache an instance of {{UnaryOperator#identity()}} instead of creating one on the fly. * {{ReplicaFilteringProtection:217}} - We may be able to scatter/gather rather than serially querying every row that needs to be completed. This isn't a clear win perhaps, given it targets the latency of single queries and adds some complexity. (Certainly a decent candidate to kick even out of this issue.) *Documentation and Intelligibility* * There are a few places (CHANGES.txt, tracing output in {{ReplicaFilteringProtection}}, etc.) where we mention "replica-side fi
[jira] [Commented] (CASSANDRA-15907) Operational Improvements & Hardening for Replica Filtering Protection
[ https://issues.apache.org/jira/browse/CASSANDRA-15907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149059#comment-17149059 ] Caleb Rackliffe commented on CASSANDRA-15907: - ...and of course, if we want to punt on a redesign for now, we can always proceed w/ the [guardrails approach|https://issues.apache.org/jira/browse/CASSANDRA-15907?focusedCommentId=17148207&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17148207], which is basically like [~adelapena]'s latest idea, but with a large N. > Operational Improvements & Hardening for Replica Filtering Protection > - > > Key: CASSANDRA-15907 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15907 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Coordination, Feature/2i Index >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Labels: 2i, memory > Fix For: 3.0.x, 3.11.x, 4.0-beta > > > CASSANDRA-8272 uses additional space on the heap to ensure correctness for 2i > and filtering queries at consistency levels above ONE/LOCAL_ONE. There are a > few things we should follow up on, however, to make life a bit easier for > operators and generally de-risk usage: > (Note: Line numbers are based on {{trunk}} as of > {{3cfe3c9f0dcf8ca8b25ad111800a21725bf152cb}}.) > *Minor Optimizations* > * {{ReplicaFilteringProtection:114}} - Given we size them up-front, we may be > able to use simple arrays instead of lists for {{rowsToFetch}} and > {{originalPartitions}}. Alternatively (or also), we may be able to null out > references in these two collections more aggressively. (ex. Using > {{ArrayList#set()}} instead of {{get()}} in {{queryProtectedPartitions()}}, > assuming we pass {{toFetch}} as an argument to {{querySourceOnKey()}}.) > * {{ReplicaFilteringProtection:323}} - We may be able to use > {{EncodingStats.merge()}} and remove the custom {{stats()}} method. > * {{DataResolver:111 & 228}} - Cache an instance of > {{UnaryOperator#identity()}} instead of creating one on the fly. > * {{ReplicaFilteringProtection:217}} - We may be able to scatter/gather > rather than serially querying every row that needs to be completed. This > isn't a clear win perhaps, given it targets the latency of single queries and > adds some complexity. (Certainly a decent candidate to kick even out of this > issue.) > *Documentation and Intelligibility* > * There are a few places (CHANGES.txt, tracing output in > {{ReplicaFilteringProtection}}, etc.) where we mention "replica-side > filtering protection" (which makes it seem like the coordinator doesn't > filter) rather than "replica filtering protection" (which sounds more like > what we actually do, which is protect ourselves against incorrect replica > filtering results). It's a minor fix, but would avoid confusion. > * The method call chain in {{DataResolver}} might be a bit simpler if we put > the {{repairedDataTracker}} in {{ResolveContext}}. > *Guardrails* > * As it stands, we don't have a way to enforce an upper bound on the memory > usage of {{ReplicaFilteringProtection}} which caches row responses from the > first round of requests. (Remember, these are later used to merged with the > second round of results to complete the data for filtering.) Operators will > likely need a way to protect themselves, i.e. simply fail queries if they hit > a particular threshold rather than GC nodes into oblivion. (Having control > over limits and page sizes doesn't quite get us there, because stale results > _expand_ the number of incomplete results we must cache.) The fun question is > how we do this, with the primary axes being scope (per-query, global, etc.) > and granularity (per-partition, per-row, per-cell, actual heap usage, etc.). > My starting disposition on the right trade-off between > performance/complexity and accuracy is having something along the lines of > cached rows per query. Prior art suggests this probably makes sense alongside > things like {{tombstone_failure_threshold}} in {{cassandra.yaml}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15907) Operational Improvements & Hardening for Replica Filtering Protection
[ https://issues.apache.org/jira/browse/CASSANDRA-15907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149000#comment-17149000 ] Caleb Rackliffe commented on CASSANDRA-15907: - At this point, we're sitting on what appears to be 4 distinct approaches to addressing the problems in the current implementation. Before trying to contrast them all, I want to think through the kinds of usage we expect and the consequences of that. Future indexing implementations aside, neither filtering queries nor secondary index queries are currently meant to be used at scale (especially at CL > ONE/LOCAL_ONE) without partition restrictions. Optimizing for that case seems reasonable. The other big axis is how common out of sync replicas actually are, and how responsive we have to be from "rare" to "entire replica datasets are out of sync". What's currently in trunk does just fine if there is very little out-of-sync data, especially in the common case that we're limited to a partition. (i.e. The actual number of protection queries is very low, because we group by partition.) Its weakness is the edge case. bq. Issue blocking RFP read immediately at {{MergeListener#onMergedRows}} when detecting potential outdated rows This single-pass solution would excel in situations where there are very few silent replicas and put very little stress on the heap, given it could simply forgo caching merged rows that don't satisfy the query filter. It also appears to be a fairly simple change to the existing logic. The downside of this approach is that it would start to issue a pretty high volume of individual row protection queries as it came across more silent replicas, without even the benefit of mitigating partition grouping. It wouldn't require any new guardrails around memory usage, and the worst that could happen is a query timeout. bq. We could try to not cache all the results but advance in blocks of a certain fixed number of cached results, so we limit the number of cached results while we can still group keys to do less queries. That is, we could have that pessimistic SRP read prefetching and caching N rows completed with extra queries to the silent replicas, plugged to another group of unmerged-merged counters to prefetch more results if (probably) needed This seems to retain all the nice characteristics of the current trunk implementation (most importantly partition grouping for RFP queries), with the added benefit that it should only use heap proportional to the actual user limit (although not precisely, given the different between the batch size and the limit). It wouldn't really require any new guardrails around memory usage, given the tighter coupling to the limit or page size, and the worse case is also a timeout. The stumbling block feels like complexity, but that might just be my lack of creativity. [~adelapena] Wouldn't we have to avoid SRP in the first phase of the query to limit the size of the result cache during batches? I've been trying to figure out a way to merge these two ideas, i.e. to batch partition/completion reads in the RFP {{MergeListener}}. Combined w/ filtering, also in the {{MergeListener}}, we could discard (i.e. avoid caching) the rows that don't pass the filter. The problem is that the return value of {{onMergedRows()}} is what presently informs SRP/controls the counter. > Operational Improvements & Hardening for Replica Filtering Protection > - > > Key: CASSANDRA-15907 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15907 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Coordination, Feature/2i Index >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Labels: 2i, memory > Fix For: 3.0.x, 3.11.x, 4.0-beta > > > CASSANDRA-8272 uses additional space on the heap to ensure correctness for 2i > and filtering queries at consistency levels above ONE/LOCAL_ONE. There are a > few things we should follow up on, however, to make life a bit easier for > operators and generally de-risk usage: > (Note: Line numbers are based on {{trunk}} as of > {{3cfe3c9f0dcf8ca8b25ad111800a21725bf152cb}}.) > *Minor Optimizations* > * {{ReplicaFilteringProtection:114}} - Given we size them up-front, we may be > able to use simple arrays instead of lists for {{rowsToFetch}} and > {{originalPartitions}}. Alternatively (or also), we may be able to null out > references in these two collections more aggressively. (ex. Using > {{ArrayList#set()}} instead of {{get()}} in {{queryProtectedPartitions()}}, > assuming we pass {{toFetch}} as an argument to {{querySourceOnKey()}}.) > * {{ReplicaFilteringProtection:323}} - We may be able to use > {{EncodingStats.m
[jira] [Updated] (CASSANDRA-15905) cqlsh not able to fetch all rows when in batch mode
[ https://issues.apache.org/jira/browse/CASSANDRA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-15905: - Source Control Link: https://github.com/apache/cassandra/commit/9251b8116ff89b528b6b9eaa43d4dc2d1bc0bbaf Resolution: Fixed Status: Resolved (was: Ready to Commit) Committed to 3.11 (with very small backport) and trunk, thanks! > cqlsh not able to fetch all rows when in batch mode > --- > > Key: CASSANDRA-15905 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15905 > Project: Cassandra > Issue Type: Bug > Components: Legacy/CQL >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: 3.11.7, 4.0-alpha5 > > Time Spent: 20m > Remaining Estimate: 0h > > The cqlsh in trunk only display the first page when running in the batch > mode, i.e. using {{--execute}} or {{--file}} option. > > It is a change of behavior. In 3.x branches, the cqlsh returns all rows. > > It can be reproduced in 3 steps. > {code:java} > 1. ccm create trunk -v git:trunk -n1 && ccm start > 2. tools/bin/cassandra-stress write n=1k -schema keyspace="keyspace1" // > write 1000 rows > 3. bin/cqlsh -e "SELECT * FROM keyspace1.standard1;"// > fetch all rows > {code} > > There are 1000 rows written. But the output in step 3 will only list 100 > rows, which is the first page. > {code:java} > ➜ bin/cqlsh -e "SELECT * FROM keyspace1.standard1" | wc -l > 105{code} > > The related change was introduced in > https://issues.apache.org/jira/browse/CASSANDRA-11534, where the cqlsh.py > script no longer fetch all rows when not using tty in the print_result > method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15905) cqlsh not able to fetch all rows when in batch mode
[ https://issues.apache.org/jira/browse/CASSANDRA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-15905: - Reviewers: Brandon Williams, Brandon Williams (was: Brandon Williams) Brandon Williams, Brandon Williams Status: Review In Progress (was: Patch Available) > cqlsh not able to fetch all rows when in batch mode > --- > > Key: CASSANDRA-15905 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15905 > Project: Cassandra > Issue Type: Bug > Components: Legacy/CQL >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: 3.11.7, 4.0-alpha5 > > Time Spent: 20m > Remaining Estimate: 0h > > The cqlsh in trunk only display the first page when running in the batch > mode, i.e. using {{--execute}} or {{--file}} option. > > It is a change of behavior. In 3.x branches, the cqlsh returns all rows. > > It can be reproduced in 3 steps. > {code:java} > 1. ccm create trunk -v git:trunk -n1 && ccm start > 2. tools/bin/cassandra-stress write n=1k -schema keyspace="keyspace1" // > write 1000 rows > 3. bin/cqlsh -e "SELECT * FROM keyspace1.standard1;"// > fetch all rows > {code} > > There are 1000 rows written. But the output in step 3 will only list 100 > rows, which is the first page. > {code:java} > ➜ bin/cqlsh -e "SELECT * FROM keyspace1.standard1" | wc -l > 105{code} > > The related change was introduced in > https://issues.apache.org/jira/browse/CASSANDRA-11534, where the cqlsh.py > script no longer fetch all rows when not using tty in the print_result > method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15905) cqlsh not able to fetch all rows when in batch mode
[ https://issues.apache.org/jira/browse/CASSANDRA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-15905: - Test and Documentation Plan: dtest added Status: Patch Available (was: Open) > cqlsh not able to fetch all rows when in batch mode > --- > > Key: CASSANDRA-15905 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15905 > Project: Cassandra > Issue Type: Bug > Components: Legacy/CQL >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: 3.11.7, 4.0-alpha5 > > Time Spent: 20m > Remaining Estimate: 0h > > The cqlsh in trunk only display the first page when running in the batch > mode, i.e. using {{--execute}} or {{--file}} option. > > It is a change of behavior. In 3.x branches, the cqlsh returns all rows. > > It can be reproduced in 3 steps. > {code:java} > 1. ccm create trunk -v git:trunk -n1 && ccm start > 2. tools/bin/cassandra-stress write n=1k -schema keyspace="keyspace1" // > write 1000 rows > 3. bin/cqlsh -e "SELECT * FROM keyspace1.standard1;"// > fetch all rows > {code} > > There are 1000 rows written. But the output in step 3 will only list 100 > rows, which is the first page. > {code:java} > ➜ bin/cqlsh -e "SELECT * FROM keyspace1.standard1" | wc -l > 105{code} > > The related change was introduced in > https://issues.apache.org/jira/browse/CASSANDRA-11534, where the cqlsh.py > script no longer fetch all rows when not using tty in the print_result > method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15905) cqlsh not able to fetch all rows when in batch mode
[ https://issues.apache.org/jira/browse/CASSANDRA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-15905: - Status: Ready to Commit (was: Review In Progress) > cqlsh not able to fetch all rows when in batch mode > --- > > Key: CASSANDRA-15905 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15905 > Project: Cassandra > Issue Type: Bug > Components: Legacy/CQL >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: 3.11.7, 4.0-alpha5 > > Time Spent: 20m > Remaining Estimate: 0h > > The cqlsh in trunk only display the first page when running in the batch > mode, i.e. using {{--execute}} or {{--file}} option. > > It is a change of behavior. In 3.x branches, the cqlsh returns all rows. > > It can be reproduced in 3 steps. > {code:java} > 1. ccm create trunk -v git:trunk -n1 && ccm start > 2. tools/bin/cassandra-stress write n=1k -schema keyspace="keyspace1" // > write 1000 rows > 3. bin/cqlsh -e "SELECT * FROM keyspace1.standard1;"// > fetch all rows > {code} > > There are 1000 rows written. But the output in step 3 will only list 100 > rows, which is the first page. > {code:java} > ➜ bin/cqlsh -e "SELECT * FROM keyspace1.standard1" | wc -l > 105{code} > > The related change was introduced in > https://issues.apache.org/jira/browse/CASSANDRA-11534, where the cqlsh.py > script no longer fetch all rows when not using tty in the print_result > method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra-dtest] branch master updated: Add test for CASSANDRA-15905
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/cassandra-dtest.git The following commit(s) were added to refs/heads/master by this push: new f5bc21c Add test for CASSANDRA-15905 f5bc21c is described below commit f5bc21c40ccd4bc2b9bc118ec5888bad3cc15b16 Author: Yifan Cai AuthorDate: Tue Jun 30 14:23:04 2020 -0700 Add test for CASSANDRA-15905 Patch by Yifan Cai, reviewed by brandonwilliams for CASSANDRA-15905 --- cqlsh_tests/test_cqlsh.py | 27 +++ 1 file changed, 27 insertions(+) diff --git a/cqlsh_tests/test_cqlsh.py b/cqlsh_tests/test_cqlsh.py index a47f942..f261833 100644 --- a/cqlsh_tests/test_cqlsh.py +++ b/cqlsh_tests/test_cqlsh.py @@ -2031,6 +2031,33 @@ Tracing session:""") assert_all(session, "SELECT * FROM ks.cf", [[0]]) +def test_fetch_all_rows_in_batch_mode(self): +""" +Test: cqlsh -e "" with more rows than 1 page +@jira_ticket CASSANDRA-15905 +""" +self.cluster.populate(1) +self.cluster.start(wait_for_binary_proto=True) +node1, = self.cluster.nodelist() +session = self.patient_cql_connection(node1) + +session.execute("CREATE KEYSPACE ks WITH REPLICATION={'class':'SimpleStrategy','replication_factor':1};") +session.execute("CREATE TABLE ks.test (key uuid primary key);") + +num_rows = 200 +expected_lines = num_rows + 5 # 5: header + empty lines + +for i in range(num_rows): +session.execute("INSERT INTO ks.test (key) VALUES (uuid())") + +stdout, err = self.run_cqlsh(node1, cmds="", cqlsh_options=['-e', 'SELECT * FROM ks.test;']) +assert err == "" +output_lines = stdout.splitlines() +assert expected_lines == len(output_lines) +assert output_lines[0].strip() == '' +assert output_lines[-2].strip() == '' +assert output_lines[-1].strip() == "({} rows)".format(num_rows) + def run_cqlsh(self, node, cmds, cqlsh_options=None, env_vars=None): """ Local version of run_cqlsh to open a cqlsh subprocess with - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] 01/01: Merge branch 'cassandra-3.11' into trunk
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git commit eebb9e02cd10cde576bcf860417ec3d011c7c165 Merge: 3b8ed1e 9251b81 Author: Brandon Williams AuthorDate: Tue Jun 30 17:57:22 2020 -0500 Merge branch 'cassandra-3.11' into trunk CHANGES.txt | 1 + bin/cqlsh.py | 42 ++ pylib/cqlshlib/tracing.py | 2 +- 3 files changed, 28 insertions(+), 17 deletions(-) diff --cc CHANGES.txt index 1c30b58,d89d22b..8fafb7d --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,57 -1,8 +1,58 @@@ -3.11.7 +4.0-alpha5 + * Prune expired messages less frequently in internode messaging (CASSANDRA-15700) + * Fix Ec2Snitch handling of legacy mode for dc names matching both formats, eg "us-west-2" (CASSANDRA-15878) + * Add support for server side DESCRIBE statements (CASSANDRA-14825) + * Fail startup if -Xmn is set when the G1 garbage collector is used (CASSANDRA-15839) + * generateSplits method replaced the generateRandomTokens for ReplicationAwareTokenAllocator. (CASSANDRA-15877) + * Several mbeans are not unregistered when dropping a keyspace and table (CASSANDRA-14888) + * Update defaults for server and client TLS settings (CASSANDRA-15262) + * Differentiate follower/initator in StreamMessageHeader (CASSANDRA-15665) + * Add a startup check to detect if LZ4 uses java rather than native implementation (CASSANDRA-15884) + * Fix missing topology events when running multiple nodes on the same network interface (CASSANDRA-15677) + * Create config.yml.MIDRES (CASSANDRA-15712) + * Fix handling of fully purged static rows in repaired data tracking (CASSANDRA-15848) + * Prevent validation request submission from blocking ANTI_ENTROPY stage (CASSANDRA-15812) + * Add fqltool and auditlogviewer to rpm and deb packages (CASSANDRA-14712) + * Include DROPPED_COLUMNS in schema digest computation (CASSANDRA-15843) + * Fix Cassandra restart from rpm install (CASSANDRA-15830) + * Improve handling of 2i initialization failures (CASSANDRA-13606) + * Add completion_ratio column to sstable_tasks virtual table (CASANDRA-15759) + * Add support for adding custom Verbs (CASSANDRA-15725) + * Speed up entire-file-streaming file containment check and allow entire-file-streaming for all compaction strategies (CASSANDRA-15657,CASSANDRA-15783) + * Provide ability to configure IAuditLogger (CASSANDRA-15748) + * Fix nodetool enablefullquerylog blocking param parsing (CASSANDRA-15819) + * Add isTransient to SSTableMetadataView (CASSANDRA-15806) + * Fix tools/bin/fqltool for all shells (CASSANDRA-15820) + * Fix clearing of legacy size_estimates (CASSANDRA-15776) + * Update port when reconnecting to pre-4.0 SSL storage (CASSANDRA-15727) + * Only calculate dynamicBadnessThreshold once per loop in DynamicEndpointSnitch (CASSANDRA-15798) + * Cleanup redundant nodetool commands added in 4.0 (CASSANDRA-15256) + * Update to Python driver 3.23 for cqlsh (CASSANDRA-15793) + * Add tunable initial size and growth factor to RangeTombstoneList (CASSANDRA-15763) + * Improve debug logging in SSTableReader for index summary (CASSANDRA-15755) + * bin/sstableverify should support user provided token ranges (CASSANDRA-15753) + * Improve logging when mutation passed to commit log is too large (CASSANDRA-14781) + * replace LZ4FastDecompressor with LZ4SafeDecompressor (CASSANDRA-15560) + * Fix buffer pool NPE with concurrent release due to in-progress tiny pool eviction (CASSANDRA-15726) + * Avoid race condition when completing stream sessions (CASSANDRA-15666) + * Flush with fast compressors by default (CASSANDRA-15379) + * Fix CqlInputFormat regression from the switch to system.size_estimates (CASSANDRA-15637) + * Allow sending Entire SSTables over SSL (CASSANDRA-15740) + * Fix CQLSH UTF-8 encoding issue for Python 2/3 compatibility (CASSANDRA-15739) + * Fix batch statement preparation when multiple tables and parameters are used (CASSANDRA-15730) + * Fix regression with traceOutgoingMessage printing message size (CASSANDRA-15687) + * Ensure repaired data tracking reads a consistent amount of data across replicas (CASSANDRA-15601) + * Fix CQLSH to avoid arguments being evaluated (CASSANDRA-15660) + * Correct Visibility and Improve Safety of Methods in LatencyMetrics (CASSANDRA-15597) + * Allow cqlsh to run with Python2.7/Python3.6+ (CASSANDRA-15659,CASSANDRA-15573) + * Improve logging around incremental repair (CASSANDRA-15599) + * Do not check cdc_raw_directory filesystem space if CDC disabled (CASSANDRA-15688) + * Replace array iterators with get by index (CASSANDRA-15394) + * Minimize BTree iterator allocations (CASSANDRA-15389) +Merged from 3.11: + * Fix cqlsh output when fetching all rows in batch mode (CASSANDRA-15905) * Upgrade Jackson to 2.9.10 (CASSANDRA-15867) * Fix CQL formatting of read command res
[cassandra] branch cassandra-3.11 updated: Fix cqlsh output when fetching all rows in batch mode
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch cassandra-3.11 in repository https://gitbox.apache.org/repos/asf/cassandra.git The following commit(s) were added to refs/heads/cassandra-3.11 by this push: new 9251b81 Fix cqlsh output when fetching all rows in batch mode 9251b81 is described below commit 9251b8116ff89b528b6b9eaa43d4dc2d1bc0bbaf Author: yifan-c AuthorDate: Tue Jun 30 00:15:18 2020 -0700 Fix cqlsh output when fetching all rows in batch mode Patch by Yifan Cai, reviewed by brandonwilliams for CASSANDRA-15905 --- CHANGES.txt | 1 + bin/cqlsh.py | 44 pylib/cqlshlib/tracing.py | 2 +- 3 files changed, 30 insertions(+), 17 deletions(-) diff --git a/CHANGES.txt b/CHANGES.txt index 9b4cf55..d89d22b 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.11.7 + * Fix cqlsh output when fetching all rows in batch mode (CASSANDRA-15905) * Upgrade Jackson to 2.9.10 (CASSANDRA-15867) * Fix CQL formatting of read command restrictions for slow query log (CASSANDRA-15503) * Allow sstableloader to use SSL on the native port (CASSANDRA-14904) diff --git a/bin/cqlsh.py b/bin/cqlsh.py index 2e5e2d4..44d4d50 100644 --- a/bin/cqlsh.py +++ b/bin/cqlsh.py @@ -1079,7 +1079,7 @@ class Shell(cmd.Cmd): elif result: # CAS INSERT/UPDATE self.writeresult("") -self.print_static_result(result, self.parse_for_update_meta(statement.query_string)) +self.print_static_result(result, self.parse_for_update_meta(statement.query_string), with_header=True, tty=self.tty) self.flush_output() return True, future @@ -1087,20 +1087,30 @@ class Shell(cmd.Cmd): self.decoding_errors = [] self.writeresult("") -if result.has_more_pages and self.tty: + +def print_all(result, table_meta, tty): +# Return the number of rows in total num_rows = 0 +isFirst = True while True: -if result.current_rows: +# Always print for the first page even it is empty +if result.current_rows or isFirst: num_rows += len(result.current_rows) -self.print_static_result(result, table_meta) +with_header = isFirst or tty +self.print_static_result(result, table_meta, with_header, tty) if result.has_more_pages: -raw_input("---MORE---") +if self.shunted_query_out is None and tty: +# Only pause when not capturing. +raw_input("---MORE---") result.fetch_next_page() else: +if not tty: +self.writeresult("") break -else: -num_rows = len(result.current_rows) -self.print_static_result(result, table_meta) +isFirst = False +return num_rows + +num_rows = print_all(result, table_meta, self.tty) self.writeresult("(%d rows)" % num_rows) if self.decoding_errors: @@ -1110,7 +1120,7 @@ class Shell(cmd.Cmd): self.writeresult('%d more decoding errors suppressed.' % (len(self.decoding_errors) - 2), color=RED) -def print_static_result(self, result, table_meta): +def print_static_result(self, result, table_meta, with_header, tty): if not result.column_names and not table_meta: return @@ -1118,7 +1128,7 @@ class Shell(cmd.Cmd): formatted_names = [self.myformat_colname(name, table_meta) for name in column_names] if not result.current_rows: # print header only -self.print_formatted_result(formatted_names, None) +self.print_formatted_result(formatted_names, None, with_header=True, tty=tty) return cql_types = [] @@ -1132,9 +1142,9 @@ class Shell(cmd.Cmd): if self.expand_enabled: self.print_formatted_result_vertically(formatted_names, formatted_values) else: -self.print_formatted_result(formatted_names, formatted_values) +self.print_formatted_result(formatted_names, formatted_values, with_header, tty) -def print_formatted_result(self, formatted_names, formatted_values): +def print_formatted_result(self, formatted_names, formatted_values, with_header, tty): # determine column widths widths = [n.displaywidth for n in formatted_names] if formatted_values is not None: @@ -1143,9 +1153,10 @@ class Shell(cmd.Cmd): widths[num] = max(widths[num], col.displaywidth) # print header -header = ' | '.join(hdr.ljust(w, color=self.color) for (hdr, w) in zip(formatted_n
[cassandra] branch trunk updated (3b8ed1e -> eebb9e0)
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a change to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git. from 3b8ed1e Fix a log message typo in StartupChecks new 9251b81 Fix cqlsh output when fetching all rows in batch mode new eebb9e0 Merge branch 'cassandra-3.11' into trunk The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: CHANGES.txt | 1 + bin/cqlsh.py | 42 ++ pylib/cqlshlib/tracing.py | 2 +- 3 files changed, 28 insertions(+), 17 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15905) cqlsh not able to fetch all rows when in batch mode
[ https://issues.apache.org/jira/browse/CASSANDRA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148988#comment-17148988 ] Yifan Cai commented on CASSANDRA-15905: --- ||Cassandra||Dtest|| |[PR|https://github.com/apache/cassandra/pull/661]|[PR|https://github.com/apache/cassandra-dtest/pull/82]| |[Code|https://github.com/yifan-c/cassandra/tree/CASSANDRA-15905-cqlsh-fetch-all-rows-in-batch-mode]|[Code|https://github.com/yifan-c/cassandra-dtest]| Test: [https://app.circleci.com/pipelines/github/yifan-c/cassandra/66/workflows/2b590ea0-2b4a-4d79-8abc-347cecded0cc] The dtest failures should not be related to the change. The errors can be reproduced by running the dtest against trunk. There is no failure from tests in {{test_cqlsh.py}}. Briefly, the changes are * Fetch and print all pages iteratively. ({{cqlsh.py::Shell::print_result}}) * Print compactly when in batch mode. * Always print header and new line at the bottom for each page if in tty mode, in order to have the same behavior. > cqlsh not able to fetch all rows when in batch mode > --- > > Key: CASSANDRA-15905 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15905 > Project: Cassandra > Issue Type: Bug > Components: Legacy/CQL >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: 3.11.7, 4.0-alpha5 > > Time Spent: 20m > Remaining Estimate: 0h > > The cqlsh in trunk only display the first page when running in the batch > mode, i.e. using {{--execute}} or {{--file}} option. > > It is a change of behavior. In 3.x branches, the cqlsh returns all rows. > > It can be reproduced in 3 steps. > {code:java} > 1. ccm create trunk -v git:trunk -n1 && ccm start > 2. tools/bin/cassandra-stress write n=1k -schema keyspace="keyspace1" // > write 1000 rows > 3. bin/cqlsh -e "SELECT * FROM keyspace1.standard1;"// > fetch all rows > {code} > > There are 1000 rows written. But the output in step 3 will only list 100 > rows, which is the first page. > {code:java} > ➜ bin/cqlsh -e "SELECT * FROM keyspace1.standard1" | wc -l > 105{code} > > The related change was introduced in > https://issues.apache.org/jira/browse/CASSANDRA-11534, where the cqlsh.py > script no longer fetch all rows when not using tty in the print_result > method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta
[ https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148948#comment-17148948 ] Olivier Michallat commented on CASSANDRA-15299: --- About the name change, I would like to advocate one last time for renaming the new outer type, not the legacy inner type. I know you'd prefer the other way, and in a vacuum I would agree. But I think that in this case maintaining continuity is more important than perfect naming. For example: * the mere size of the patch. This will affect hundreds of unrelated lines. * those changes will get in the way later: create more conflicts when you backport something to a legacy branch, obscure {{git blame}} output, etc. * old commits still use the old naming. If you need to look at something in git history, you'll have to make the mental switch constantly. Not the end of the world, but it's just one more little thing. > CASSANDRA-13304 follow-up: improve checksumming and compression in protocol > v5-beta > --- > > Key: CASSANDRA-15299 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15299 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Client >Reporter: Aleksey Yeschenko >Assignee: Sam Tunnicliffe >Priority: Normal > Labels: protocolv5 > Fix For: 4.0-alpha > > > CASSANDRA-13304 made an important improvement to our native protocol: it > introduced checksumming/CRC32 to request and response bodies. It’s an > important step forward, but it doesn’t cover the entire stream. In > particular, the message header is not covered by a checksum or a crc, which > poses a correctness issue if, for example, {{streamId}} gets corrupted. > Additionally, we aren’t quite using CRC32 correctly, in two ways: > 1. We are calculating the CRC32 of the *decompressed* value instead of > computing the CRC32 on the bytes written on the wire - losing the properties > of the CRC32. In some cases, due to this sequencing, attempting to decompress > a corrupt stream can cause a segfault by LZ4. > 2. When using CRC32, the CRC32 value is written in the incorrect byte order, > also losing some of the protections. > See https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf for > explanation for the two points above. > Separately, there are some long-standing issues with the protocol - since > *way* before CASSANDRA-13304. Importantly, both checksumming and compression > operate on individual message bodies rather than frames of multiple complete > messages. In reality, this has several important additional downsides. To > name a couple: > # For compression, we are getting poor compression ratios for smaller > messages - when operating on tiny sequences of bytes. In reality, for most > small requests and responses we are discarding the compressed value as it’d > be smaller than the uncompressed one - incurring both redundant allocations > and compressions. > # For checksumming and CRC32 we pay a high overhead price for small messages. > 4 bytes extra is *a lot* for an empty write response, for example. > To address the correctness issue of {{streamId}} not being covered by the > checksum/CRC32 and the inefficiency in compression and checksumming/CRC32, we > should switch to a framing protocol with multiple messages in a single frame. > I suggest we reuse the framing protocol recently implemented for internode > messaging in CASSANDRA-15066 to the extent that its logic can be borrowed, > and that we do it before native protocol v5 graduates from beta. See > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderCrc.java > and > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderLZ4.java. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15907) Operational Improvements & Hardening for Replica Filtering Protection
[ https://issues.apache.org/jira/browse/CASSANDRA-15907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-15907: Fix Version/s: 3.11.x 3.0.x > Operational Improvements & Hardening for Replica Filtering Protection > - > > Key: CASSANDRA-15907 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15907 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Coordination, Feature/2i Index >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Labels: 2i, memory > Fix For: 3.0.x, 3.11.x, 4.0-beta > > > CASSANDRA-8272 uses additional space on the heap to ensure correctness for 2i > and filtering queries at consistency levels above ONE/LOCAL_ONE. There are a > few things we should follow up on, however, to make life a bit easier for > operators and generally de-risk usage: > (Note: Line numbers are based on {{trunk}} as of > {{3cfe3c9f0dcf8ca8b25ad111800a21725bf152cb}}.) > *Minor Optimizations* > * {{ReplicaFilteringProtection:114}} - Given we size them up-front, we may be > able to use simple arrays instead of lists for {{rowsToFetch}} and > {{originalPartitions}}. Alternatively (or also), we may be able to null out > references in these two collections more aggressively. (ex. Using > {{ArrayList#set()}} instead of {{get()}} in {{queryProtectedPartitions()}}, > assuming we pass {{toFetch}} as an argument to {{querySourceOnKey()}}.) > * {{ReplicaFilteringProtection:323}} - We may be able to use > {{EncodingStats.merge()}} and remove the custom {{stats()}} method. > * {{DataResolver:111 & 228}} - Cache an instance of > {{UnaryOperator#identity()}} instead of creating one on the fly. > * {{ReplicaFilteringProtection:217}} - We may be able to scatter/gather > rather than serially querying every row that needs to be completed. This > isn't a clear win perhaps, given it targets the latency of single queries and > adds some complexity. (Certainly a decent candidate to kick even out of this > issue.) > *Documentation and Intelligibility* > * There are a few places (CHANGES.txt, tracing output in > {{ReplicaFilteringProtection}}, etc.) where we mention "replica-side > filtering protection" (which makes it seem like the coordinator doesn't > filter) rather than "replica filtering protection" (which sounds more like > what we actually do, which is protect ourselves against incorrect replica > filtering results). It's a minor fix, but would avoid confusion. > * The method call chain in {{DataResolver}} might be a bit simpler if we put > the {{repairedDataTracker}} in {{ResolveContext}}. > *Guardrails* > * As it stands, we don't have a way to enforce an upper bound on the memory > usage of {{ReplicaFilteringProtection}} which caches row responses from the > first round of requests. (Remember, these are later used to merged with the > second round of results to complete the data for filtering.) Operators will > likely need a way to protect themselves, i.e. simply fail queries if they hit > a particular threshold rather than GC nodes into oblivion. (Having control > over limits and page sizes doesn't quite get us there, because stale results > _expand_ the number of incomplete results we must cache.) The fun question is > how we do this, with the primary axes being scope (per-query, global, etc.) > and granularity (per-partition, per-row, per-cell, actual heap usage, etc.). > My starting disposition on the right trade-off between > performance/complexity and accuracy is having something along the lines of > cached rows per query. Prior art suggests this probably makes sense alongside > things like {{tombstone_failure_threshold}} in {{cassandra.yaml}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148893#comment-17148893 ] Caleb Rackliffe commented on CASSANDRA-15861: - [~jasonstack] I thought a bit more about our earlier chat (and had a quick chat w/ [~bdeggleston]), and it seems like the simplest thing might be handling the stats an index summary in slightly different ways. The STATS component is small. We could just buffer it up, use that buffered size in the manifest, and stream that buffer. It special-cases this component, but we more or less avoid having to reason about the risk of blocking compactions, a repair completing, etc. For the SUMMARY, we take advantage of the fact that possibly/infrequently delaying the redistribution task isn't a big suboptimal outcome. We have a simple lock that protects it (on {{SSTableRader}}, similar to what you've already mentioned or as a threadsafe set of readers in a central location), i.e. streaming acquires it when the manifest is created and releases it when the index summary completes streaming (where that "completion" happens in the non-SSL case isn't 100% clear to me)...and index redistribution acquires it _before_ it creates a transaction in {{getRestributionTransactions()}}, then releases it when the redistribution is complete (so we never have to block a compaction). Streaming might have to deal with a short delay if a redistribution is running, but a.) that doesn't happen that often and b.) the summary (I think) is usually not very large. ({{getRestributionTransactions()}} can ignore streaming SSTables just like it ignores compacting ones. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair starte
[jira] [Commented] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL
[ https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148877#comment-17148877 ] Dinesh Joshi commented on CASSANDRA-15900: -- [~maedhroz] [~jasonstack] looks like there are a few failures. They're likely unrelated but it would be great to double check and make sure. > Close channel and reduce buffer allocation during entire sstable streaming > with SSL > --- > > Key: CASSANDRA-15900 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15900 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk > file into user-space off-heap buffer when SSL is enabled, because netty > doesn't support zero-copy with SSL. > But there are two issues: > # file channel is not closed. > # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, > thus it's all allocated outside the pool and will cause large amount of > allocations. > [Patch|https://github.com/apache/cassandra/pull/651]: > # close file channel when the last batch is loaded into off-heap bytebuffer. > I don't think we need to wait until buffer is flushed by netty. > # reduce the batch to 64kb which is more buffer pool friendly when streaming > entire sstable with SSL. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15538) 4.0 quality testing: Local Read/Write Path: Other Areas
[ https://issues.apache.org/jira/browse/CASSANDRA-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie updated CASSANDRA-15538: -- Authors: Ekaterina Dimitrova, Sylvain Lebresne (was: Sylvain Lebresne) Reviewers: Blake Eggleston, Sam Tunnicliffe (was: Blake Eggleston, Ekaterina Dimitrova, Sam Tunnicliffe) > 4.0 quality testing: Local Read/Write Path: Other Areas > --- > > Key: CASSANDRA-15538 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15538 > Project: Cassandra > Issue Type: Task > Components: Test/dtest >Reporter: Josh McKenzie >Assignee: Sylvain Lebresne >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Aleksey Yeschenko* > Testing in this area refers to the local read/write path (StorageProxy, > ColumnFamilyStore, Memtable, SSTable reading/writing, etc). We are still > finding numerous bugs and issues with the 3.0 storage engine rewrite > (CASSANDRA-8099). For 4.0 we want to ensure that we thoroughly cover the > local read/write path with techniques such as property-based testing, fuzzing > ([example|http://cassandra.apache.org/blog/2018/10/17/finding_bugs_with_property_based_testing.html]), > and a source audit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15580) 4.0 quality testing: Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie reassigned CASSANDRA-15580: - Assignee: Benjamin Lerer (was: Berenguer Blasi) > 4.0 quality testing: Repair > --- > > Key: CASSANDRA-15580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15580 > Project: Cassandra > Issue Type: Task > Components: Test/dtest >Reporter: Josh McKenzie >Assignee: Benjamin Lerer >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Blake Eggleston* > We aim for 4.0 to have the first fully functioning incremental repair > solution (CASSANDRA-9143)! Furthermore we aim to verify that all types of > repair: (full range, sub range, incremental) function as expected as well as > ensuring community tools such as Reaper work. CASSANDRA-3200 adds an > experimental option to reduce the amount of data streamed during repair, we > should write more tests and see how it works with big nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15580) 4.0 quality testing: Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie updated CASSANDRA-15580: -- Authors: Benjamin Lerer, Stephen Mallette (was: Berenguer Blasi) Reviewers: Marcus Eriksson, Vinay Chella (was: Marcus Eriksson, Stephen Mallette, Vinay Chella) > 4.0 quality testing: Repair > --- > > Key: CASSANDRA-15580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15580 > Project: Cassandra > Issue Type: Task > Components: Test/dtest >Reporter: Josh McKenzie >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Blake Eggleston* > We aim for 4.0 to have the first fully functioning incremental repair > solution (CASSANDRA-9143)! Furthermore we aim to verify that all types of > repair: (full range, sub range, incremental) function as expected as well as > ensuring community tools such as Reaper work. CASSANDRA-3200 adds an > experimental option to reduce the amount of data streamed during repair, we > should write more tests and see how it works with big nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie updated CASSANDRA-15579: -- Authors: Andres de la Peña, Sylvain Lebresne (was: Andres de la Peña) Reviewers: (was: Sylvain Lebresne) > 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, > and Read Repair > > > Key: CASSANDRA-15579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15579 > Project: Cassandra > Issue Type: Task > Components: Test/unit >Reporter: Josh McKenzie >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Blake Eggleston* > Testing in this area focuses on non-node-local aspects of the read-write > path: coordination, replication, read repair, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15234) Standardise config and JVM parameters
[ https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148858#comment-17148858 ] Benedict Elliott Smith commented on CASSANDRA-15234: bq. Given the framework makes it trivial to support old names, having no properties marked for removal of 5.0 works for me +1 > Standardise config and JVM parameters > - > > Key: CASSANDRA-15234 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15234 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Benedict Elliott Smith >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0-alpha > > Attachments: CASSANDRA-15234-3-DTests-JAVA8.txt > > > We have a bunch of inconsistent names and config patterns in the codebase, > both from the yams and JVM properties. It would be nice to standardise the > naming (such as otc_ vs internode_) as well as the provision of values with > units - while maintaining perpetual backwards compatibility with the old > parameter names, of course. > For temporal units, I would propose parsing strings with suffixes of: > {{code}} > u|micros(econds?)? > ms|millis(econds?)? > s(econds?)? > m(inutes?)? > h(ours?)? > d(ays?)? > mo(nths?)? > {{code}} > For rate units, I would propose parsing any of the standard {{B/s, KiB/s, > MiB/s, GiB/s, TiB/s}}. > Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or > powers of 1000 such as {{KB/s}}, given these are regularly used for either > their old or new definition e.g. {{KiB/s}}, or we could support them and > simply log the value in bytes/s. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15909) Make Table/Keyspace Metric Names Consistent With Each Other
[ https://issues.apache.org/jira/browse/CASSANDRA-15909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148856#comment-17148856 ] David Capwell commented on CASSANDRA-15909: --- As long as this is done without breaking the old names, then sounds good to me. > Make Table/Keyspace Metric Names Consistent With Each Other > --- > > Key: CASSANDRA-15909 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15909 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Stephen Mallette >Assignee: Stephen Mallette >Priority: Normal > Fix For: 4.0-beta > > > As part of CASSANDRA-15821 it became apparent that certain metric names found > in keyspace and tables had different names but were in fact the same metric - > they are as follows: > * Table.SyncTime == Keyspace.RepairSyncTime > * Table.RepairedDataTrackingOverreadRows == Keyspace.RepairedOverreadRows > * Table.RepairedDataTrackingOverreadTime == Keyspace.RepairedOverreadTime > * Table.AllMemtablesHeapSize == Keyspace.AllMemtablesOnHeapDataSize > * Table.AllMemtablesOffHeapSize == Keyspace.AllMemtablesOffHeapDataSize > * Table.MemtableOnHeapSize == Keyspace.MemtableOnHeapDataSize > * Table.MemtableOffHeapSize == Keyspace.MemtableOffHeapDataSize > Also, client metrics are the only metrics to start with a lower case letter. > Change those to upper case to match all the other metrics. > Unifying this naming would help make metrics more consistent as part of > CASSANDRA-15582 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15580) 4.0 quality testing: Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie updated CASSANDRA-15580: -- Reviewers: Marcus Eriksson, Stephen Mallette, Vinay Chella (was: Marcus Eriksson, Vinay Chella) > 4.0 quality testing: Repair > --- > > Key: CASSANDRA-15580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15580 > Project: Cassandra > Issue Type: Task > Components: Test/dtest >Reporter: Josh McKenzie >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Blake Eggleston* > We aim for 4.0 to have the first fully functioning incremental repair > solution (CASSANDRA-9143)! Furthermore we aim to verify that all types of > repair: (full range, sub range, incremental) function as expected as well as > ensuring community tools such as Reaper work. CASSANDRA-3200 adds an > experimental option to reduce the amount of data streamed during repair, we > should write more tests and see how it works with big nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15538) 4.0 quality testing: Local Read/Write Path: Other Areas
[ https://issues.apache.org/jira/browse/CASSANDRA-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie updated CASSANDRA-15538: -- Reviewers: Blake Eggleston, Ekaterina Dimitrova, Sam Tunnicliffe (was: Blake Eggleston, Sam Tunnicliffe) > 4.0 quality testing: Local Read/Write Path: Other Areas > --- > > Key: CASSANDRA-15538 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15538 > Project: Cassandra > Issue Type: Task > Components: Test/dtest >Reporter: Josh McKenzie >Assignee: Sylvain Lebresne >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Aleksey Yeschenko* > Testing in this area refers to the local read/write path (StorageProxy, > ColumnFamilyStore, Memtable, SSTable reading/writing, etc). We are still > finding numerous bugs and issues with the 3.0 storage engine rewrite > (CASSANDRA-8099). For 4.0 we want to ensure that we thoroughly cover the > local read/write path with techniques such as property-based testing, fuzzing > ([example|http://cassandra.apache.org/blog/2018/10/17/finding_bugs_with_property_based_testing.html]), > and a source audit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie updated CASSANDRA-15579: -- Reviewers: Sylvain Lebresne > 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, > and Read Repair > > > Key: CASSANDRA-15579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15579 > Project: Cassandra > Issue Type: Task > Components: Test/unit >Reporter: Josh McKenzie >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Blake Eggleston* > Testing in this area focuses on non-node-local aspects of the read-write > path: coordination, replication, read repair, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15234) Standardise config and JVM parameters
[ https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148852#comment-17148852 ] David Capwell commented on CASSANDRA-15234: --- Given the framework provided by this patch, the following wouldn't be hard to support (all, not just 1 or 2) 1) no warning or plan to remove 2) warning that it will be removed some day 3) warning on specific version which will remove So, we could do the following {code} // provide a warning that this will not longer be supported after 5.0 @Replaces(oldName = "native_transport_idle_timeout_in_ms", converter = Converter.MillisDurationConverter.class, scheduledRemoveBy = "5.0") public volatile Duration native_transport_idle_timeout = new Duration("0ms"); // provide a warning that the property is deprecated and will be removed one day @Replaces(oldName = "native_transport_idle_timeout_in_ms", converter = Converter.MillisDurationConverter.class, deprecated = true) public volatile Duration native_transport_idle_timeout = new Duration("0ms"); // no warning, both properties are fully supported @Replaces(oldName = "native_transport_idle_timeout_in_ms", converter = Converter.MillisDurationConverter.class) public volatile Duration native_transport_idle_timeout = new Duration("0ms"); {code} Given the framework makes it trivial to support old names, having no properties marked for removal of 5.0 works for me. If we really want to migrate usage to a new name, then mark it to be removed one day, and stuff which is personal preference (such as enable at the start or end of the name) can have no warning; does this make sense? > Standardise config and JVM parameters > - > > Key: CASSANDRA-15234 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15234 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Benedict Elliott Smith >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0-alpha > > Attachments: CASSANDRA-15234-3-DTests-JAVA8.txt > > > We have a bunch of inconsistent names and config patterns in the codebase, > both from the yams and JVM properties. It would be nice to standardise the > naming (such as otc_ vs internode_) as well as the provision of values with > units - while maintaining perpetual backwards compatibility with the old > parameter names, of course. > For temporal units, I would propose parsing strings with suffixes of: > {{code}} > u|micros(econds?)? > ms|millis(econds?)? > s(econds?)? > m(inutes?)? > h(ours?)? > d(ays?)? > mo(nths?)? > {{code}} > For rate units, I would propose parsing any of the standard {{B/s, KiB/s, > MiB/s, GiB/s, TiB/s}}. > Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or > powers of 1000 such as {{KB/s}}, given these are regularly used for either > their old or new definition e.g. {{KiB/s}}, or we could support them and > simply log the value in bytes/s. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15234) Standardise config and JVM parameters
[ https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148852#comment-17148852 ] David Capwell edited comment on CASSANDRA-15234 at 6/30/20, 5:33 PM: - Given the framework provided by this patch, the following wouldn't be hard to support (all, not just 1 or 2) 1) no warning or plan to remove 2) warning that it will be removed some day 3) warning on specific version which will remove So, we could do the following {code} // provide a warning that this will not longer be supported after 5.0 @Replaces(oldName = "native_transport_idle_timeout_in_ms", scheduledRemoveBy = "5.0") public volatile Duration native_transport_idle_timeout = new Duration("0ms"); // provide a warning that the property is deprecated and will be removed one day @Replaces(oldName = "native_transport_idle_timeout_in_ms", deprecated = true) public volatile Duration native_transport_idle_timeout = new Duration("0ms"); // no warning, both properties are fully supported @Replaces(oldName = "native_transport_idle_timeout_in_ms") public volatile Duration native_transport_idle_timeout = new Duration("0ms"); {code} Given the framework makes it trivial to support old names, having no properties marked for removal of 5.0 works for me. If we really want to migrate usage to a new name, then mark it to be removed one day, and stuff which is personal preference (such as enable at the start or end of the name) can have no warning; does this make sense? was (Author: dcapwell): Given the framework provided by this patch, the following wouldn't be hard to support (all, not just 1 or 2) 1) no warning or plan to remove 2) warning that it will be removed some day 3) warning on specific version which will remove So, we could do the following {code} // provide a warning that this will not longer be supported after 5.0 @Replaces(oldName = "native_transport_idle_timeout_in_ms", converter = Converter.MillisDurationConverter.class, scheduledRemoveBy = "5.0") public volatile Duration native_transport_idle_timeout = new Duration("0ms"); // provide a warning that the property is deprecated and will be removed one day @Replaces(oldName = "native_transport_idle_timeout_in_ms", converter = Converter.MillisDurationConverter.class, deprecated = true) public volatile Duration native_transport_idle_timeout = new Duration("0ms"); // no warning, both properties are fully supported @Replaces(oldName = "native_transport_idle_timeout_in_ms", converter = Converter.MillisDurationConverter.class) public volatile Duration native_transport_idle_timeout = new Duration("0ms"); {code} Given the framework makes it trivial to support old names, having no properties marked for removal of 5.0 works for me. If we really want to migrate usage to a new name, then mark it to be removed one day, and stuff which is personal preference (such as enable at the start or end of the name) can have no warning; does this make sense? > Standardise config and JVM parameters > - > > Key: CASSANDRA-15234 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15234 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Benedict Elliott Smith >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0-alpha > > Attachments: CASSANDRA-15234-3-DTests-JAVA8.txt > > > We have a bunch of inconsistent names and config patterns in the codebase, > both from the yams and JVM properties. It would be nice to standardise the > naming (such as otc_ vs internode_) as well as the provision of values with > units - while maintaining perpetual backwards compatibility with the old > parameter names, of course. > For temporal units, I would propose parsing strings with suffixes of: > {{code}} > u|micros(econds?)? > ms|millis(econds?)? > s(econds?)? > m(inutes?)? > h(ours?)? > d(ays?)? > mo(nths?)? > {{code}} > For rate units, I would propose parsing any of the standard {{B/s, KiB/s, > MiB/s, GiB/s, TiB/s}}. > Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or > powers of 1000 such as {{KB/s}}, given these are regularly used for either > their old or new definition e.g. {{KiB/s}}, or we could support them and > simply log the value in bytes/s. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15909) Make Table/Keyspace Metric Names Consistent With Each Other
[ https://issues.apache.org/jira/browse/CASSANDRA-15909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Mallette updated CASSANDRA-15909: - Description: As part of CASSANDRA-15821 it became apparent that certain metric names found in keyspace and tables had different names but were in fact the same metric - they are as follows: * Table.SyncTime == Keyspace.RepairSyncTime * Table.RepairedDataTrackingOverreadRows == Keyspace.RepairedOverreadRows * Table.RepairedDataTrackingOverreadTime == Keyspace.RepairedOverreadTime * Table.AllMemtablesHeapSize == Keyspace.AllMemtablesOnHeapDataSize * Table.AllMemtablesOffHeapSize == Keyspace.AllMemtablesOffHeapDataSize * Table.MemtableOnHeapSize == Keyspace.MemtableOnHeapDataSize * Table.MemtableOffHeapSize == Keyspace.MemtableOffHeapDataSize Also, client metrics are the only metrics to start with a lower case letter. Change those to upper case to match all the other metrics. Unifying this naming would help make metrics more consistent as part of CASSANDRA-15582 was: As part of CASSANDRA-15821 it became apparent that certain metric names found in keyspace and tables had different names but were in fact the same metric - they are as follows: * Table.SyncTime == Keyspace.RepairSyncTime * Table.RepairedDataTrackingOverreadRows == Keyspace.RepairedOverreadRows * Table.RepairedDataTrackingOverreadTime == Keyspace.RepairedOverreadTime * Table.AllMemtablesHeapSize == Keyspace.AllMemtablesOnHeapDataSize * Table.AllMemtablesOffHeapSize == Keyspace.AllMemtablesOffHeapDataSize * Table.MemtableOnHeapSize == Keyspace.MemtableOnHeapDataSize * Table.MemtableOffHeapSize == Keyspace.MemtableOffHeapDataSize Unifying this naming would help make metrics more consistent as part of CASSANDRA-15582 > Make Table/Keyspace Metric Names Consistent With Each Other > --- > > Key: CASSANDRA-15909 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15909 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Stephen Mallette >Assignee: Stephen Mallette >Priority: Normal > Fix For: 4.0-beta > > > As part of CASSANDRA-15821 it became apparent that certain metric names found > in keyspace and tables had different names but were in fact the same metric - > they are as follows: > * Table.SyncTime == Keyspace.RepairSyncTime > * Table.RepairedDataTrackingOverreadRows == Keyspace.RepairedOverreadRows > * Table.RepairedDataTrackingOverreadTime == Keyspace.RepairedOverreadTime > * Table.AllMemtablesHeapSize == Keyspace.AllMemtablesOnHeapDataSize > * Table.AllMemtablesOffHeapSize == Keyspace.AllMemtablesOffHeapDataSize > * Table.MemtableOnHeapSize == Keyspace.MemtableOnHeapDataSize > * Table.MemtableOffHeapSize == Keyspace.MemtableOffHeapDataSize > Also, client metrics are the only metrics to start with a lower case letter. > Change those to upper case to match all the other metrics. > Unifying this naming would help make metrics more consistent as part of > CASSANDRA-15582 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15870) When 3.0 reads 2.1 data with a regular column set it expects the cellName to contain a element and fails if not true
[ https://issues.apache.org/jira/browse/CASSANDRA-15870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148832#comment-17148832 ] David Capwell commented on CASSANDRA-15870: --- I have been bad about updating this; sorry. I have a patch but have 4 other corruption issues on my plate, so prioritizing those over submitting this patch. If anyone thinks they are bitten by this issue I can try to give higher priority to OSSing this patch > When 3.0 reads 2.1 data with a regular column set it expects the > cellName to contain a element and fails if not true > -- > > Key: CASSANDRA-15870 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15870 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema, Local/SSTable >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > {code} > java.lang.AssertionError > at org.apache.cassandra.db.rows.BufferCell.(BufferCell.java:48) > at > org.apache.cassandra.db.LegacyLayout$CellGrouper.addCell(LegacyLayout.java:1461) > at > org.apache.cassandra.db.LegacyLayout$CellGrouper.addAtom(LegacyLayout.java:1380) > at > org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.readRow(UnfilteredDeserializer.java:549) > at > org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.hasNext(UnfilteredDeserializer.java:523) > at > org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer.hasNext(UnfilteredDeserializer.java:336) > at > org.apache.cassandra.io.sstable.SSTableSimpleIterator$OldFormatIterator.readStaticRow(SSTableSimpleIterator.java:133) > at > org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:59) > at > org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:364) > at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48) > at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:65) > at > org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:132) > at > org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:123) > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:207) > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:160) > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > at > org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.hasNext(UnfilteredPartitionIterators.java:174) > at > org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93) > at > org.apache.cassandra.db.compaction.CompactionIterator.hasNext(CompactionIterator.java:240) > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:191) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89) > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:100) > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:345) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:83) > at java.lang.Thread.run(Thread.java:748) > {code} > This exception is similar to other JIRA such as CASSANDRA-14113 but under > root causing both exceptions, they only share the same symptom and not the > same root cause; hence a new JIRA. > This was found when a frozen collection was found when a multi-cell > collection was expected. When this happened LegacyCellName#collectionElement > comes back as null which eventually gets asserted against in BufferCell > (complex cell needs a path). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta
[ https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148831#comment-17148831 ] Sam Tunnicliffe edited comment on CASSANDRA-15299 at 6/30/20, 5:08 PM: --- Thanks for the comments [~ifesdjeen] & [~omichallat], I've pushed a few commits to the [branch|https://github.com/beobal/cassandra/commits/15299-trunk]. {quote} There are several things that I wanted to bring to your attention: {quote} I've handled most of these in a refactor of Flusher. As you suggested, for framed items we now collate the frames and only allocate the payloads when we flush to the netty channel. So now, we allocate the payload based on the actual number of bytes required for the specific channel. {quote}{{ExceptionHandlers$PostV5ExceptionHandler#exceptionCaught}}: when flushing an exception, we don't call release on the payload. {quote} Included in the {{minor cleanups}} commit {quote}There are several places in {{SimpleClient}} where {{largePayload#release}} isn't called. {quote} I've refactored the flushing large messages in {{SimpleClient}} to match {{Flusher}}, so this is working properly now. {quote}Other things... {quote} {quote}{{Dispatcher#processRequest}}, we don't need to cast error to {{Message.Response}} if we change its type to {{ErrorMessage}}. {quote} In {{CqlMessageHandler#releaseAfterFlush}}, we can call {{sourceFrame#release()}} instead of {{sourceFrame.body.release()}} for consistency with other calls Both in {{minor cleanups}} {quote}{{Server#requestPayloadInFlightPerEndpoint}} can be a non-static {{Server}} member. {quote} If you don't mind I'd prefer to leave this as it is for now as it's pre-existing and changing would require reworking CASSANDRA-15519 (changing limits at runtime). {quote}Should we hide {{flusher.queued.add()}} behind a method to disallow accessing queue directly? {quote} I've done this, but I'm not 100% convinced of its utility. As the two {{Flusher}} subclasses need access to the queue, we have to provide package private methods {{poll}} and {{isEmpty}} as well as one to {{enqueue}}. So unless we move {{Flusher}} to its own subpackage, the queue is effectively visible to everything else in {{o.a.c.Transport}} {quote}We can change the code a bit to make {{FlushItemConverter}} instances explicit. Right now, we basically have two converters both called {{#toFlushItem}} in {{CQLMessageHandler}} and {{LegacyDispatchHandler}}. We could have them as inner classes. It's somewhat useful since if you change the signature of this method, or stop using it, it'll be hard to find that it is actually an implementation of converter. {quote} I've left this as it is just for the moment. I'm working on some tests which supply a lambda to act as the converter, so I'll come back to this when those have solidified a bit more. {quote}Looks like {{MessageConsumer}} could be generic, since we cast it to either request or response. {quote} I've parameterised {{MessageConsumer}} & {{CQLMessageHandler}} according to the subclass of Message they expect and extended this a bit by moving the logic out of {{Message$ProtocolEncoder}} to an abstract {{Message$Decoder}} with concrete subclasses for {{Request}} and {{Response}}. {quote}Looks like {{CQLMessageHandler#processCorruptFrame}}, initially had an intention of handling recovery, but now just throws a CRC exception regardless. This does match description, but usage of {{isRecoverable}} seems to be redundant here, unless we change semantics of recovery. {quote} It is somewhat redundant here, except that it logs a slightly different message to indicate whether the CRC mismatch was found in the frame header or body. I'll leave it as it is for now as it's technically possible to recover from a corrupt body, but would be problematic for clients just now. I still have some comments to address, as well as those from [~omichallat] ... {quote}{{Frame$Decoder}} and other classes that are related to legacy path can be extracted to a separate class, since {{Frame}} itself is still useful, but classes that facilitate legacy encoding/decoding/etc can be extracted. {quote} {quote}{{Frame#encodeHeaderInto}} seems to be duplicating the logic we have in {{Frame$Encoder#encodeHeader}}, should we unify the two? Maybe we can have encoding/decoding methods shared for both legacy and new paths, for example, as static methods? {quote} {quote}As you have mentioned, it would be great to rename {{Frame}} to something different, like {{Envelope}}, since right now we have {{FrameDecoder#Frame}} and {{Frame$Decoder}} and variable names that correspond with class names, which makes it all hard to follow. {quote} was (Author: beobal): Thanks for the comments [~ifesdjeen] & [~omichallat], I've pushed a few commits to the [branch|https://github.com/beobal/cassandra/commits/15299-trunk]. {qu
[jira] [Commented] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta
[ https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148831#comment-17148831 ] Sam Tunnicliffe commented on CASSANDRA-15299: - Thanks for the comments [~ifesdjeen] & [~omichallat]. {quote} There are several things that I wanted to bring to your attention: {quote} I've handled most of these in a refactor of Flusher. As you suggested, for framed items we now collate the frames and only allocate the payloads when we flush to the netty channel. So now, we allocate the payload based on the actual number of bytes required for the specific channel. {quote}{{ExceptionHandlers$PostV5ExceptionHandler#exceptionCaught}}: when flushing an exception, we don't call release on the payload. {quote} Included in the {{minor cleanups}} commit {quote}There are several places in {{SimpleClient}} where {{largePayload#release}} isn't called. {quote} I've refactored the flushing large messages in {{SimpleClient}} to match {{Flusher}}, so this is working properly now. {quote}Other things... {quote} {quote}{{Dispatcher#processRequest}}, we don't need to cast error to {{Message.Response}} if we change its type to {{ErrorMessage}}. {quote} In {{CqlMessageHandler#releaseAfterFlush}}, we can call {{sourceFrame#release()}} instead of {{sourceFrame.body.release()}} for consistency with other calls Both in {{minor cleanups}} {quote}{{Server#requestPayloadInFlightPerEndpoint}} can be a non-static {{Server}} member. {quote} If you don't mind I'd prefer to leave this as it is for now as it's pre-existing and changing would require reworking CASSANDRA-15519 (changing limits at runtime). {quote}Should we hide {{flusher.queued.add()}} behind a method to disallow accessing queue directly? {quote} I've done this, but I'm not 100% convinced of its utility. As the two {{Flusher}} subclasses need access to the queue, we have to provide package private methods {{poll}} and {{isEmpty}} as well as one to {{enqueue}}. So unless we move {{Flusher}} to its own subpackage, the queue is effectively visible to everything else in {{o.a.c.Transport}} {quote}We can change the code a bit to make {{FlushItemConverter}} instances explicit. Right now, we basically have two converters both called {{#toFlushItem}} in {{CQLMessageHandler}} and {{LegacyDispatchHandler}}. We could have them as inner classes. It's somewhat useful since if you change the signature of this method, or stop using it, it'll be hard to find that it is actually an implementation of converter. {quote} > I've left this as it is just for the moment. I'm working on some tests which > supply a lambda to act as the converter, so I'll come back to this when those > have solidified a bit more. {quote}Looks like {{MessageConsumer}} could be generic, since we cast it to either request or response. {quote} I've parameterised {{MessageConsumer}} & {{CQLMessageHandler}} according to the subclass of Message they expect and extended this a bit by moving the logic out of {{Message$ProtocolEncoder}} to an abstract\{{ Message$Decoder}} with concrete subclasses for {{Request}} and {{Response}}. >bq.Looks like {{CQLMessageHandler#processCorruptFrame}}, initially had an >intention of handling recovery, but now just throws a CRC exception >regardless. This does match description, but usage of {{isRecoverable}} seems >to be redundant here, unless we change semantics of recovery. It is somewhat redundant here, except that it logs a slightly different message to indicate whether the CRC mismatch was found in the frame header or body. I'll leave it as it is for now as it's technically possible to recover from a corrupt body, but would be problematic for clients just now. I still have some comments to address, as well as those from [~omichallat] ... {quote}Frame$Decoder and other classes that are related to legacy path can be extracted to a separate class, since Frame itself is still useful, but classes that facilitate legacy encoding/decoding/etc can be extracted. {quote} {quote}Frame#encodeHeaderInto seems to be duplicating the logic we have in Frame$Encoder#encodeHeader, should we unify the two? Maybe we can have encoding/decoding methods shared for both legacy and new paths, for example, as static methods? {quote} {quote}As you have mentioned, it would be great to rename Frame to something different, like Envelope, since right now we have FrameDecoder#Frame and Frame$Decoder and variable names that correspond with class names, which makes it all hard to follow. {quote} > CASSANDRA-13304 follow-up: improve checksumming and compression in protocol > v5-beta > --- > > Key: CASSANDRA-15299 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15299 > Project: Cassandra > Issue Type: Improvement >
[jira] [Comment Edited] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta
[ https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148831#comment-17148831 ] Sam Tunnicliffe edited comment on CASSANDRA-15299 at 6/30/20, 5:06 PM: --- Thanks for the comments [~ifesdjeen] & [~omichallat], I've pushed a few commits to the [branch|https://github.com/beobal/cassandra/commits/15299-trunk]. {quote} There are several things that I wanted to bring to your attention: {quote} I've handled most of these in a refactor of Flusher. As you suggested, for framed items we now collate the frames and only allocate the payloads when we flush to the netty channel. So now, we allocate the payload based on the actual number of bytes required for the specific channel. {quote}{{ExceptionHandlers$PostV5ExceptionHandler#exceptionCaught}}: when flushing an exception, we don't call release on the payload. {quote} Included in the {{minor cleanups}} commit {quote}There are several places in {{SimpleClient}} where {{largePayload#release}} isn't called. {quote} I've refactored the flushing large messages in {{SimpleClient}} to match {{Flusher}}, so this is working properly now. {quote}Other things... {quote} {quote}{{Dispatcher#processRequest}}, we don't need to cast error to {{Message.Response}} if we change its type to {{ErrorMessage}}. {quote} In {{CqlMessageHandler#releaseAfterFlush}}, we can call {{sourceFrame#release()}} instead of {{sourceFrame.body.release()}} for consistency with other calls Both in {{minor cleanups}} {quote}{{Server#requestPayloadInFlightPerEndpoint}} can be a non-static {{Server}} member. {quote} If you don't mind I'd prefer to leave this as it is for now as it's pre-existing and changing would require reworking CASSANDRA-15519 (changing limits at runtime). {quote}Should we hide {{flusher.queued.add()}} behind a method to disallow accessing queue directly? {quote} I've done this, but I'm not 100% convinced of its utility. As the two {{Flusher}} subclasses need access to the queue, we have to provide package private methods {{poll}} and {{isEmpty}} as well as one to {{enqueue}}. So unless we move {{Flusher}} to its own subpackage, the queue is effectively visible to everything else in {{o.a.c.Transport}} {quote}We can change the code a bit to make {{FlushItemConverter}} instances explicit. Right now, we basically have two converters both called {{#toFlushItem}} in {{CQLMessageHandler}} and {{LegacyDispatchHandler}}. We could have them as inner classes. It's somewhat useful since if you change the signature of this method, or stop using it, it'll be hard to find that it is actually an implementation of converter. {quote} > I've left this as it is just for the moment. I'm working on some tests which > supply a lambda to act as the converter, so I'll come back to this when those > have solidified a bit more. {quote}Looks like {{MessageConsumer}} could be generic, since we cast it to either request or response. {quote} I've parameterised {{MessageConsumer}} & {{CQLMessageHandler}} according to the subclass of Message they expect and extended this a bit by moving the logic out of {{Message$ProtocolEncoder}} to an abstract\{{ Message$Decoder}} with concrete subclasses for {{Request}} and {{Response}}. >bq.Looks like {{CQLMessageHandler#processCorruptFrame}}, initially had an >intention of handling recovery, but now just throws a CRC exception >regardless. This does match description, but usage of {{isRecoverable}} seems >to be redundant here, unless we change semantics of recovery. It is somewhat redundant here, except that it logs a slightly different message to indicate whether the CRC mismatch was found in the frame header or body. I'll leave it as it is for now as it's technically possible to recover from a corrupt body, but would be problematic for clients just now. I still have some comments to address, as well as those from [~omichallat] ... {quote}Frame$Decoder and other classes that are related to legacy path can be extracted to a separate class, since Frame itself is still useful, but classes that facilitate legacy encoding/decoding/etc can be extracted. {quote} {quote}Frame#encodeHeaderInto seems to be duplicating the logic we have in Frame$Encoder#encodeHeader, should we unify the two? Maybe we can have encoding/decoding methods shared for both legacy and new paths, for example, as static methods? {quote} {quote}As you have mentioned, it would be great to rename Frame to something different, like Envelope, since right now we have FrameDecoder#Frame and Frame$Decoder and variable names that correspond with class names, which makes it all hard to follow. {quote} was (Author: beobal): Thanks for the comments [~ifesdjeen] & [~omichallat]. {quote} There are several things that I wanted to bring to your attention: {quote} I've handled most of these in a refactor of Flusher
[jira] [Commented] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148804#comment-17148804 ] Andres de la Peña commented on CASSANDRA-15579: --- I'm keen to start work on unblocking this, but I don't know what should be the scope of this ticket or where to start. We have a fair number of specific dtests around this area, at least: * [consistency_test|https://github.com/apache/cassandra-dtest/blob/master/consistency_test.py] * [replication_test|https://github.com/apache/cassandra-dtest/blob/master/replication_test.py] * [read_repair_test|https://github.com/apache/cassandra-dtest/blob/master/read_repair_test.py] * [replica_side_filtering_test|https://github.com/apache/cassandra-dtest/blob/master/replica_side_filtering_test.py] We also have some related in-jvm distributed tests, and things like coordination are also implicitly included in some other tests. [~bdeggleston] Do we have a more specific list of what things do need testing, or what cases are missed in the existing tests? Have we identified especially suspicious components or use cases that can be prioritized? > 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, > and Read Repair > > > Key: CASSANDRA-15579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15579 > Project: Cassandra > Issue Type: Task > Components: Test/unit >Reporter: Josh McKenzie >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Blake Eggleston* > Testing in this area focuses on non-node-local aspects of the read-write > path: coordination, replication, read repair, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148785#comment-17148785 ] Caleb Rackliffe edited comment on CASSANDRA-15861 at 6/30/20, 3:59 PM: --- bq. if the sstables are already in compacting state, does it mean entire-sstable-streaming will be blocked until compaction is finished? [~jasonstack] What if we just abort the ongoing compaction involving the SSTable we want to stream? (Then we can mark it ourselves for the period including manifest generation, stats streaming, and index summary streaming?) The danger, I guess, is aborting compactions that are almost done. Two ways around that I can see. One is to try to prioritize ZCS for non-compacting SSTables first. The other is just to fall back to legacy streaming if the SSTable is already compacting. Or we can do both of those things. was (Author: maedhroz): bq. if the sstables are already in compacting state, does it mean entire-sstable-streaming will be blocked until compaction is finished? [~jasonstack] What if we just abort the ongoing compaction involving the SSTable we want to stream? (Then we can mark it ourselves for the period including manifest generation, stats streaming, and index summary streaming?) The danger, I guess, is aborting compactions that are almost done. Two ways around that I can see. One is to try to prioritize ZCS for non-compacting SSTables first. The other is just to fall back to legacy streaming if the SSTable is already compacting. Or we can combine them :) > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 a
[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148785#comment-17148785 ] Caleb Rackliffe edited comment on CASSANDRA-15861 at 6/30/20, 3:57 PM: --- bq. if the sstables are already in compacting state, does it mean entire-sstable-streaming will be blocked until compaction is finished? [~jasonstack] What if we just abort the ongoing compaction involving the SSTable we want to stream? (Then we can mark it ourselves for the period including manifest generation, stats streaming, and index summary streaming?) The danger, I guess, is aborting compactions that are almost done. Two ways around that I can see. One is to try to prioritize ZCS for non-compacting SSTables first. The other is just to fall back to legacy streaming if the SSTable is already compacting. Or we can combine them :) was (Author: maedhroz): bq. if the sstables are already in compacting state, does it mean entire-sstable-streaming will be blocked until compaction is finished? [~jasonstack] What if we just abort the ongoing compaction involving the SSTable we want to stream? (Then we can mark it ourselves for the period including manifest generation, stats streaming, and index summary streaming?) > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{
[cassandra] branch trunk updated: Fix a log message typo in StartupChecks
This is an automated email from the ASF dual-hosted git repository. aleksey pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git The following commit(s) were added to refs/heads/trunk by this push: new 3b8ed1e Fix a log message typo in StartupChecks 3b8ed1e is described below commit 3b8ed1eb4000119779e618935e60f46f80bad42f Author: Aleksey Yeshchenko AuthorDate: Tue Jun 30 16:53:20 2020 +0100 Fix a log message typo in StartupChecks --- src/java/org/apache/cassandra/service/StartupChecks.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/java/org/apache/cassandra/service/StartupChecks.java b/src/java/org/apache/cassandra/service/StartupChecks.java index e8a60f4..12bb309 100644 --- a/src/java/org/apache/cassandra/service/StartupChecks.java +++ b/src/java/org/apache/cassandra/service/StartupChecks.java @@ -150,7 +150,7 @@ public class StartupChecks } catch (AssertionError e) { -logger.warn("lz4-java was unable to load native librarires; this will lower the performance of lz4 (network/sstables/etc.): {}", Throwables.getRootCause(e).getMessage()); +logger.warn("lz4-java was unable to load native libraries; this will lower the performance of lz4 (network/sstables/etc.): {}", Throwables.getRootCause(e).getMessage()); } }; - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148785#comment-17148785 ] Caleb Rackliffe commented on CASSANDRA-15861: - bq. if the sstables are already in compacting state, does it mean entire-sstable-streaming will be blocked until compaction is finished? [~jasonstack] What if we just abort the ongoing compaction involving the SSTable we want to stream? (Then we can mark it ourselves for the period including manifest generation, stats streaming, and index summary streaming?) > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#
[jira] [Commented] (CASSANDRA-15901) Fix unit tests to load test/conf/cassandra.yaml (so to listen on a valid ip)
[ https://issues.apache.org/jira/browse/CASSANDRA-15901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148764#comment-17148764 ] Michael Semb Wever commented on CASSANDRA-15901: Agreed! New run [here|https://ci-cassandra.apache.org/job/Cassandra-devbranch-test/157/] (on cassandra35) > Fix unit tests to load test/conf/cassandra.yaml (so to listen on a valid ip) > > > Key: CASSANDRA-15901 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15901 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Many of the ci-cassandra jenkins runs fail on {{ip-10-0-5-5: Name or service > not known}}. CASSANDRA-15622 addressed some of these but many still remain. > Currently test C* nodes are either failing or listening on a public ip > depending on which agent they end up. > The idea behind this ticket is to make ant force the private VPC ip in the > cassandra yaml when building, this will force the nodes to listen on the > correct ip. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13994) Remove dead compact storage code before 4.0 release
[ https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-13994: Status: Patch Available (was: Review In Progress) > Remove dead compact storage code before 4.0 release > --- > > Key: CASSANDRA-13994 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13994 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Local Write-Read Paths >Reporter: Alex Petrov >Assignee: Ekaterina Dimitrova >Priority: Low > Fix For: 4.0, 4.0-beta > > > 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after > [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of > the related functionality is useless. > There are still some things to consider: > 1. One of the system tables (built indexes) was compact. For now, we just > added {{value}} column to it to make sure it's backwards-compatible, but we > might want to make sure it's just a "normal" table and doesn't have redundant > columns. > 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is > trivial, but this would mean that all built indexes will be defunct. We could > log a warning for now and ask users to migrate off those for now and > completely remove it from future releases. It's just a couple of classes > though. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13994) Remove dead compact storage code before 4.0 release
[ https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-13994: Status: In Progress (was: Patch Available) > Remove dead compact storage code before 4.0 release > --- > > Key: CASSANDRA-13994 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13994 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Local Write-Read Paths >Reporter: Alex Petrov >Assignee: Ekaterina Dimitrova >Priority: Low > Fix For: 4.0, 4.0-beta > > > 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after > [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of > the related functionality is useless. > There are still some things to consider: > 1. One of the system tables (built indexes) was compact. For now, we just > added {{value}} column to it to make sure it's backwards-compatible, but we > might want to make sure it's just a "normal" table and doesn't have redundant > columns. > 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is > trivial, but this would mean that all built indexes will be defunct. We could > log a warning for now and ask users to migrate off those for now and > completely remove it from future releases. It's just a couple of classes > though. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13994) Remove dead compact storage code before 4.0 release
[ https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148762#comment-17148762 ] Ekaterina Dimitrova commented on CASSANDRA-13994: - Thank you [~slebresne] and [~aleksey] for your input. I just moved it to beta so we can concentrate now on the final outstanding alpha tickets. I will rebase and cut the scope of the patch to the removal of the dead code as agreed, also will take into consideration the points [~slebresne] made in his initial review. Moving it back to open to show that there is still work to be done but not working on it in this very moment. > Remove dead compact storage code before 4.0 release > --- > > Key: CASSANDRA-13994 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13994 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Local Write-Read Paths >Reporter: Alex Petrov >Assignee: Ekaterina Dimitrova >Priority: Low > Fix For: 4.0, 4.0-beta > > > 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after > [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of > the related functionality is useless. > There are still some things to consider: > 1. One of the system tables (built indexes) was compact. For now, we just > added {{value}} column to it to make sure it's backwards-compatible, but we > might want to make sure it's just a "normal" table and doesn't have redundant > columns. > 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is > trivial, but this would mean that all built indexes will be defunct. We could > log a warning for now and ask users to migrate off those for now and > completely remove it from future releases. It's just a couple of classes > though. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15891) provide a configuration option such as endpoint_verification_method
[ https://issues.apache.org/jira/browse/CASSANDRA-15891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15891: --- Fix Version/s: 4.x > provide a configuration option such as endpoint_verification_method > --- > > Key: CASSANDRA-15891 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15891 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Thanh >Priority: Normal > Fix For: 4.x > > > With cassandra-9220, it's possible to configure endpoint/hostname > verification when enabling internode encryption. However, you don't have any > control over what endpoint is used for the endpoint verification; instead, > cassandra will automatically try to use node IP (not node hostname) for > endpoint verification, so if your node certificates don't include the IP in > the ssl certificate's SAN list, then you'll get an error like: > {code:java} > ERROR [MessagingService-Outgoing-/10.10.88.194-Gossip] 2018-11-13 > 10:20:26,903 OutboundTcpConnection.java:606 - SSL handshake error for > outbound connection to 50cc97c1[SSL_NULL_WITH_NULL_NULL: > Socket[addr=/,port=7001,localport=47684]] > javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: > No subject alternative names matching IP address found > at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) {code} > From what I've seen, most orgs will not have node IPs in their certs. > So, it will be best if cassandra would provide another configuration option > such as *{{endpoint_verification_method}}* which you could set to "ip" or > "fqdn" or something else (eg "hostname_alias" if for whatever reason the org > doesn't want to use fqdn for endpoint verification). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13994) Remove dead compact storage code before 4.0 release
[ https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-13994: Fix Version/s: (was: 4.0-alpha) 4.0-beta > Remove dead compact storage code before 4.0 release > --- > > Key: CASSANDRA-13994 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13994 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Local Write-Read Paths >Reporter: Alex Petrov >Assignee: Ekaterina Dimitrova >Priority: Low > Fix For: 4.0, 4.0-beta > > > 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after > [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of > the related functionality is useless. > There are still some things to consider: > 1. One of the system tables (built indexes) was compact. For now, we just > added {{value}} column to it to make sure it's backwards-compatible, but we > might want to make sure it's just a "normal" table and doesn't have redundant > columns. > 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is > trivial, but this would mean that all built indexes will be defunct. We could > log a warning for now and ask users to migrate off those for now and > completely remove it from future releases. It's just a couple of classes > though. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15850) Delay between Gossip settle and CQL port opening during the startup
[ https://issues.apache.org/jira/browse/CASSANDRA-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15850: --- Fix Version/s: 4.x > Delay between Gossip settle and CQL port opening during the startup > --- > > Key: CASSANDRA-15850 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15850 > Project: Cassandra > Issue Type: Improvement > Components: Local/Startup and Shutdown >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > Fix For: 4.x > > > Hello, > When I am bootstrapping/restarting a Cassandra Node, there is a delay between > gossip settle and CQL port opening. Can someone please explain me where this > delay is configured and can this be changed? I don't see any information in > the logs > In my case if you see there is a ~3 minutes delay and this increases if I > increase the #of tables and #of nodes and DC. > {code:java} > INFO [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip > to settle... > INFO [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; > proceeding > INFO [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty > using native Epoll event loop > INFO [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: > [netty-buffer=netty-buffer-4.0.44.Final.452812a, > netty-codec=netty-codec-4.0.44.Final.452812a, > netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, > netty-codec-http=netty-codec-http-4.0.44.Final.452812a, > netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, > netty-common=netty-common-4.0.44.Final.452812a, > netty-handler=netty-handler-4.0.44.Final.452812a, > netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, > netty-transport=netty-transport-4.0.44.Final.452812a, > netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, > netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, > netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, > netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a] > INFO [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for > CQL clients on /x.x.x.x:9042 (encrypted)... > {code} > Also during this 3-10 minutes delay, I see > {noformat} > nodetool compactionstats > {noformat} > command is hung and never respond, until the CQL port is up and running. > Can someone please help me understand the delay here? > Cassandra Version: 3.11.3 > The issue can be easily reproducible with around 300 Tables and 100 nodes in > a cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15880) Memory leak in CompressedChunkReader
[ https://issues.apache.org/jira/browse/CASSANDRA-15880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15880: --- Fix Version/s: 3.11.x 4.0 > Memory leak in CompressedChunkReader > > > Key: CASSANDRA-15880 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15880 > Project: Cassandra > Issue Type: Bug > Components: Feature/Compression >Reporter: Jaroslaw Grabowski >Priority: Normal > Fix For: 4.0, 3.11.x > > > CompressedChunkReader uses java.lang.ThreadLocal to reuse ByteBuffer for > compressed data. ByteBuffers leak due to peculiar ThreadLocal quality. > ThreadLocals are stored in a map, where the key is a weak reference to a > ThreadLocal and the value is the user's object (ByteBuffer in this case). > When a last strong reference to a ThreadLocal is lost, weak reference to > ThreadLocal (key) is removed but the value (ByteBuffer) is kept until cleaned > by ThreadLocal heuristic expunge mechanism. See ThreadLocal's "stale entries" > for details. > When a number of long-living threads is high enough this results in thousands > of ByteBuffers stored as stale entries in ThreadLocals. In a not-so-lucky > scenario we get OutOfMemoryException. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15856) Security vulnerabilities with dependency jars of Cassandra 3.11.6
[ https://issues.apache.org/jira/browse/CASSANDRA-15856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15856: --- Fix Version/s: 3.11.x > Security vulnerabilities with dependency jars of Cassandra 3.11.6 > -- > > Key: CASSANDRA-15856 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15856 > Project: Cassandra > Issue Type: Task >Reporter: Kshitiz Saxena >Priority: Normal > Fix For: 3.11.x > > > The latest release of Cassandra 3.11.6 has few dependency jars which have > some security vulnerabilities. > > Apache Thrift (org.apache.thrift:libthrift:0.9.2) has below mentioned > security vulnerabilities reported > |+[https://nvd.nist.gov/vuln/detail/CVE-2016-5397]+| > |+[https://nvd.nist.gov/vuln/detail/CVE-2018-1320]+| > |+[https://nvd.nist.gov/vuln/detail/CVE-2019-0205]+| > > Netty Project (io.netty:netty-all:4.0.44.Final) has below mentioned security > vulnerabilities reported > |+[https://nvd.nist.gov/vuln/detail/CVE-2019-16869]+| > |+[https://nvd.nist.gov/vuln/detail/CVE-2019-20444]+| > |+[https://nvd.nist.gov/vuln/detail/CVE-2019-20445]+| > > Is there a plan to upgrade these jars in any upcoming release? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15903) Doc update: stream-entire-sstable supports all compaction strategies and internode encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-15903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15903: --- Fix Version/s: 4.0 > Doc update: stream-entire-sstable supports all compaction strategies and > internode encryption > - > > Key: CASSANDRA-15903 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15903 > Project: Cassandra > Issue Type: Task >Reporter: ZhaoYang >Priority: Normal > Fix For: 4.0 > > > As [~mck] point out, doc needs to be updated for CASSANDRA-15657 and > CASSANDRA-15740. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15866) stream sstable attached index files entirely with data file
[ https://issues.apache.org/jira/browse/CASSANDRA-15866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15866: --- Fix Version/s: 4.x > stream sstable attached index files entirely with data file > --- > > Key: CASSANDRA-15866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15866 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Streaming >Reporter: ZhaoYang >Priority: Normal > Fix For: 4.x > > > When sstable is streamed entirely, there is no need to rebuild sstable > attached index on receiver if index files can be streamed entirely. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15898) cassandra 3.11.4 deadlock
[ https://issues.apache.org/jira/browse/CASSANDRA-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15898: --- Fix Version/s: 3.11.x > cassandra 3.11.4 deadlock > - > > Key: CASSANDRA-15898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15898 > Project: Cassandra > Issue Type: Bug >Reporter: john doe >Priority: Normal > Fix For: 3.11.x > > > We are running apache-cassandra-3.11.4, 10 node cluster with -Xms32G -Xmx32G > -Xmn8G using CMS. > after running couple of days one of the node become unresponsive and > threaddump (jstack -F) shows deadlock. > Found one Java-level deadlock: > = > "Native-Transport-Requests-144": waiting to lock Monitor@0x7cd5142e4d08 > (Object@0x7f6e00348268, a java/io/ExpiringCache), > which is held by "CompactionExecutor:115134" > "CompactionExecutor:115134": waiting to lock Monitor@0x7f6bcaf130f8 > (Object@0x7f6dff31faa0, a > ch/qos/logback/core/joran/spi/ConfigurationWatchList), > which is held by "Native-Transport-Requests-144" > Found a total of 1 deadlock. > I have seen this couple of time now with different nodes with following in > system.log > IndexSummaryRedistribution.java:77 - Redistributing index summaries > NoSpamLogger.java:91 - Maximum memory usage reached (536870912), cannot > allocate chunk of 1048576 > also lookin in gc log there has not been a ParNew collection for last 10hrs, > only CMS collections. > 1739842.375: [GC (CMS Final Remark) [YG occupancy: 2712269 K (7549760 K)] > 1739842.375: [Rescan (parallel) , 0.0614157 secs] > 1739842.437: [weak refs processing, 0.994 secs] > 1739842.437: [class unloading, 0.0231076 secs] > 1739842.460: [scrub symbol table, 0.0061049 secs] > 1739842.466: [scrub string table, 0.0043847 secs][1 CMS-remark: > 17696837K(25165824K)] 20409107K(32715584K), 0.0953750 secs] [Times: user=2.95 > sys=0.00, real=0.09 secs] > 1739842.471: [CMS-concurrent-sweep-start] > 1739848.572: [CMS-concurrent-sweep: 6.101/6.101 secs] [Times: user=6.13 > sys=0.00, real=6.10 secs] > 1739848.573: [CMS-concurrent-reset-start] > 1739848.645: [CMS-concurrent-reset: 0.072/0.072 secs] [Times: user=0.08 > sys=0.00, real=0.08 secs] > 1739858.653: [GC (CMS Initial Mark) [1 CMS-initial-mark: > 17696837K(25165824K)] > 20409111K(32715584K), 0.0584838 secs] [Times: user=2.68 sys=0.00, real=0.06 > secs] > 1739858.713: [CMS-concurrent-mark-start] > 1739860.496: [CMS-concurrent-mark: 1.784/1.784 secs] [Times: user=84.77 > sys=0.00, real=1.79 secs] > 1739860.497: [CMS-concurrent-preclean-start] > 1739860.566: [CMS-concurrent-preclean: 0.070/0.070 secs] [Times: user=0.07 > sys=0.00, real=0.07 secs] > 1739860.567: [CMS-concurrent-abortable-preclean-start]CMS: abort preclean due > to time > 1739866.333: [CMS-concurrent-abortable-preclean: 5.766/5.766 secs] [Times: > user=5.80 sys=0.00, real=5.76 secs] > Java HotSpot(TM) 64-Bit Server VM (25.162-b12) for linux-amd64 JRE > (1.8.0_162-b12) > Memory: 4k page, physical 792290076k(2780032k free), swap 16777212k(16693756k > free) > CommandLine flags: > -XX:+AlwaysPreTouch > -XX:CICompilerCount=15 > -XX:+CMSClassUnloadingEnabled > -XX:+CMSEdenChunksRecordAlways > -XX:CMSInitiatingOccupancyFraction=40 > -XX:+CMSParallelInitialMarkEnabled > -XX:+CMSParallelRemarkEnabled > -XX:CMSWaitDuration=1 > -XX:ConcGCThreads=50 > -XX:+CrashOnOutOfMemoryError > -XX:GCLogFileSize=10485760 > -XX:+HeapDumpOnOutOfMemoryError > -XX:InitialHeapSize=34359738368 > -XX:InitialTenuringThreshold=1 > -XX:+ManagementServer > -XX:MaxHeapSize=34359738368 > -XX:MaxNewSize=8589934592 > -XX:MaxTenuringThreshold=1 > -XX:MinHeapDeltaBytes=196608 > -XX:NewSize=8589934592 > -XX:NumberOfGCLogFiles=10 > -XX:OldPLABSize=16 > -XX:OldSize=25769803776 > -XX:OnOutOfMemoryError=kill -9 %p > -XX:ParallelGCThreads=50 > -XX:+PerfDisableSharedMem > -XX:+PrintGC > -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps > -XX:+ResizeTLAB > -XX:StringTableSize=103 > -XX:SurvivorRatio=8 > -XX:ThreadPriorityPolicy=42 > -XX:ThreadStackSize=256 > -XX:-UseBiasedLocking > -XX:+UseCMSInitiatingOccupancyOnly > -XX:+UseConcMarkSweepGC > -XX:+UseCondCardMark > -XX:+UseFastUnorderedTimeStamps > -XX:+UseGCLogFileRotation > -XX:+UseNUMA > -XX:+UseNUMAInterleaving > -XX:+UseParNewGC > -XX:+UseTLAB > -XX:+UseThreadPriorities -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15908) Improve messaging on indexing frozen collections
[ https://issues.apache.org/jira/browse/CASSANDRA-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15908: --- Fix Version/s: 4.x > Improve messaging on indexing frozen collections > > > Key: CASSANDRA-15908 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15908 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Rocco Varela >Assignee: Rocco Varela >Priority: Low > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > When attempting to create an index on a frozen collection the error message > produced can be improved to provide more detail about the problem and > possible workarounds. Currently, a user will receive a message indicating > "...Frozen collections only support full() indexes" which is not immediately > clear for users new to Cassandra indexing and datatype compatibility. > Here is an example: > {code:java} > cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 1}; > cqlsh> CREATE TABLE test.mytable ( id int primary key, addresses > frozen> ); > cqlsh> CREATE INDEX mytable_addresses_idx on test.mytable (addresses); > InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot > create values() index on frozen column addresses. Frozen collections only > support full() indexes"{code} > > I'm proposing possibly enhancing the messaging to something like this. > {quote}Cannot create values() index on frozen column addresses. Frozen > collections only support indexes on the entire data structure due to > immutability constraints of being frozen, wrap your frozen column with the > full() target type to index properly. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15887) Document how to run Cassandra on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-15887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15887: --- Fix Version/s: (was: 4.x) 4.0 > Document how to run Cassandra on Windows > > > Key: CASSANDRA-15887 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15887 > Project: Cassandra > Issue Type: Improvement > Components: Documentation/Website >Reporter: João Reis >Assignee: Berenguer Blasi >Priority: Low > Fix For: 4.0 > > > The "Getting Started" section on the website only has instructions about > installing Cassandra on Linux. > It would help us drive Cassandra adoption if we had instructions for > developers that want to run Cassandra on their Windows development > environment. > We should include instructions on how to use the existing powershell scripts > to run Cassandra on native Windows but the docs should recommend users to > prefer using WSL2/Docker before attempting to run it natively in my opinion. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14888) Several mbeans are not unregistered when dropping a keyspace and table
[ https://issues.apache.org/jira/browse/CASSANDRA-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148745#comment-17148745 ] Caleb Rackliffe commented on CASSANDRA-14888: - Hi [~Ryangdotson]. Does the file you attached ({{ExtendedDictionary.java}}) relate to this issue? It doesn't look like it, but just making sure... > Several mbeans are not unregistered when dropping a keyspace and table > -- > > Key: CASSANDRA-14888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14888 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Alex Deparvu >Priority: Urgent > Labels: patch-available > Fix For: 4.0-beta > > Attachments: CASSANDRA-14888.patch, ExtendedDictionary.java > > Time Spent: 2.5h > Remaining Estimate: 0h > > CasCommit, CasPrepare, CasPropose, ReadRepairRequests, > ShortReadProtectionRequests, AntiCompactionTime, BytesValidated, > PartitionsValidated, RepairPrepareTime, RepairSyncTime, > RepairedDataInconsistencies, ViewLockAcquireTime, ViewReadTime, > WriteFailedIdealCL > Basically for 3 years people haven't known what they are doing because the > entire thing is kind of obscure. Fix it and also add a dtest that detects if > any mbeans are left behind after dropping a table and keyspace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15887) Document how to run Cassandra on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-15887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15887: --- Fix Version/s: (was: 4.0) 4.x > Document how to run Cassandra on Windows > > > Key: CASSANDRA-15887 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15887 > Project: Cassandra > Issue Type: Improvement > Components: Documentation/Website >Reporter: João Reis >Assignee: Berenguer Blasi >Priority: Low > Fix For: 4.x > > > The "Getting Started" section on the website only has instructions about > installing Cassandra on Linux. > It would help us drive Cassandra adoption if we had instructions for > developers that want to run Cassandra on their Windows development > environment. > We should include instructions on how to use the existing powershell scripts > to run Cassandra on native Windows but the docs should recommend users to > prefer using WSL2/Docker before attempting to run it natively in my opinion. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15887) Document how to run Cassandra on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-15887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15887: --- Fix Version/s: 4.0 > Document how to run Cassandra on Windows > > > Key: CASSANDRA-15887 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15887 > Project: Cassandra > Issue Type: Improvement > Components: Documentation/Website >Reporter: João Reis >Assignee: Berenguer Blasi >Priority: Low > Fix For: 4.0 > > > The "Getting Started" section on the website only has instructions about > installing Cassandra on Linux. > It would help us drive Cassandra adoption if we had instructions for > developers that want to run Cassandra on their Windows development > environment. > We should include instructions on how to use the existing powershell scripts > to run Cassandra on native Windows but the docs should recommend users to > prefer using WSL2/Docker before attempting to run it natively in my opinion. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15857) Frozen RawTuple is not annotated with frozen in the toString method
[ https://issues.apache.org/jira/browse/CASSANDRA-15857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15857: --- Fix Version/s: 3.11.x 4.0 > Frozen RawTuple is not annotated with frozen in the toString method > --- > > Key: CASSANDRA-15857 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15857 > Project: Cassandra > Issue Type: Bug > Components: Legacy/CQL >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: 4.0, 3.11.x > > > All raw types (e.g. RawCollection, RawUT) that supports freezing wraps the > type name with 'frozen<>' in the toString method, except RawTuple. > Therefore, the RawTuple::toString output misses the frozen wrapper. > Tuple is always frozen. However since CASSANDRA-15035, it throws when the > inner tuple is not explicitly wrapped with frozen within a collection. > The method, CQL3Type.Raw::toString, is referenced at multiple places in the > source. For example, referenced in CreateTypeStatement.Raw and involved in > CQLSSTableWriter. Another example is that it is called to produce the > SchemaChange at several AlterSchemaStatement implementations. > A test can prove that missing the frozen wrapper causes exception when > building CQLSSTableWriter for user types defined like below. Note that the > inner tuple is wrapped with frozen in the initial CQL statement. > {code:java} > CREATE TYPE ks.fooType ( f list>> ) > {code} > {code:java} > org.apache.cassandra.exceptions.InvalidRequestException: Non-frozen tuples > are not allowed inside collections: list> > at > org.apache.cassandra.cql3.CQL3Type$Raw$RawCollection.throwNestedNonFrozenError(CQL3Type.java:710) > at > org.apache.cassandra.cql3.CQL3Type$Raw$RawCollection.prepare(CQL3Type.java:669) > at > org.apache.cassandra.cql3.CQL3Type$Raw$RawCollection.prepareInternal(CQL3Type.java:661) > at > org.apache.cassandra.schema.Types$RawBuilder$RawUDT.lambda$prepare$1(Types.java:341) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.cassandra.schema.Types$RawBuilder$RawUDT.prepare(Types.java:342) > at org.apache.cassandra.schema.Types$RawBuilder.build(Types.java:291) > at > org.apache.cassandra.io.sstable.CQLSSTableWriter$Builder.createTypes(CQLSSTableWriter.java:551) > at > org.apache.cassandra.io.sstable.CQLSSTableWriter$Builder.build(CQLSSTableWriter.java:527) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15896) NullPointerException in SELECT JSON statement when a UUID field contains an empty string
[ https://issues.apache.org/jira/browse/CASSANDRA-15896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15896: --- Fix Version/s: 3.0.x > NullPointerException in SELECT JSON statement when a UUID field contains an > empty string > > > Key: CASSANDRA-15896 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15896 > Project: Cassandra > Issue Type: Bug > Components: CQL/Interpreter, CQL/Semantics >Reporter: Ostico >Assignee: Benjamin Lerer >Priority: Normal > Fix For: 4.0, 3.0.x, 3.11.x > > > It seems that Cassandra accept empty strings "" ( FROM JSON string ) for UUID > fields but crash when asking for JSON serialization of those fields. > > Cassandra version 3.6.11.6 running in docker from official Dockerhub image. > Java driver: > {code:java} > > > com.datastax.oss > java-driver-core > 4.7.0 > > {code} > The attached code is to allow bug reproducibility: > {code:java} > package com.foo.bar; > import com.datastax.oss.driver.api.core.CqlSession; > import com.datastax.oss.driver.api.core.CqlSessionBuilder; > import com.datastax.oss.driver.api.core.cql.PreparedStatement; > import com.datastax.oss.driver.api.core.cql.ResultSet; > import com.datastax.oss.driver.api.core.cql.Row; > import com.fasterxml.jackson.databind.ObjectMapper; > import org.junit.After; > import org.junit.Before; > import org.junit.Test; > import java.net.InetSocketAddress; > import java.net.URI; > import java.util.*; > import static org.junit.Assert.assertFalse; > import static org.junit.Assert.assertNotNull; > /** > * @author Domenico Lupinetti - 23/06/2020 > */ > public class NullPointerExceptionTest { > protected String uuid; > protected CqlSession cqlSession; > @Before > public void setUp() throws Exception { > URI node = new URI( "tcp://localhost:9042" ); > final CqlSessionBuilder builder = CqlSession.builder(); > cqlSession = builder.addContactPoint( new InetSocketAddress( > node.getHost(), > node.getPort() > ) ).withLocalDatacenter( "datacenter1" ).build(); > cqlSession.execute( "CREATE KEYSPACE IF NOT EXISTS test_suite WITH > replication = {'class':'SimpleStrategy','replication_factor':1};" ); > String sb = "CREATE TABLE IF NOT EXISTS test_suite.test ( id uuid > PRIMARY KEY, another_id uuid, subject text );"; > cqlSession.execute( sb ); > PreparedStatement stm = cqlSession.prepare( "INSERT INTO > test_suite.test JSON :payload" ); > this.uuid = UUID.randomUUID().toString(); > HashMap payload = new HashMap<>(); > payload.put( "id", this.uuid ); > // *** This exception do not happens if the field is set as NULL > payload.put( "another_id", "" ); //<-- EMPTY STRING AS UUID > payload.put( "subject", "Alighieri, Dante. Divina Commedia" ); > ObjectMapper objM = new ObjectMapper(); > cqlSession.execute( > stm.bind().setString( "payload", objM.writeValueAsString( > payload ) ) > ); //<-- serialize as JSON > } > @After > public void tearDown() throws Exception { > cqlSession.execute( "DROP TABLE IF EXISTS test_suite.test;" ); > cqlSession.execute( "DROP KEYSPACE test_suite;" ); > cqlSession.close(); > } > @Test > public void testNullPointer() { > PreparedStatement stmt = cqlSession.prepare( "SELECT JSON id, > another_id FROM test_suite.test where id = :id;" ); > ResultSet resultSet = cqlSession.execute( > stmt.bind().setUuid( "id", UUID.fromString( this.uuid ) ) ); // <-- > EXCEPTION > Row r = resultSet.one(); > assertNotNull( r ); > assertNotNull( r.getString( "[json]" ) ); > assertFalse( Objects.requireNonNull( r.getString( "[json]" ) > ).isEmpty() ); > } > } > {code} > Client stack Trace: > {code:java} > com.datastax.oss.driver.api.core.servererrors.ServerError: > java.lang.NullPointerExceptioncom.datastax.oss.driver.api.core.servererrors.ServerError: > java.lang.NullPointerException > at > com.datastax.oss.driver.api.core.servererrors.ServerError.copy(ServerError.java:54) > at > com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149) > at > com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53) > at > com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30) > at > com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230) > at > com.datastax.oss.d
[jira] [Updated] (CASSANDRA-15896) NullPointerException in SELECT JSON statement when a UUID field contains an empty string
[ https://issues.apache.org/jira/browse/CASSANDRA-15896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15896: --- Fix Version/s: (was: 3.0.x) > NullPointerException in SELECT JSON statement when a UUID field contains an > empty string > > > Key: CASSANDRA-15896 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15896 > Project: Cassandra > Issue Type: Bug > Components: CQL/Interpreter, CQL/Semantics >Reporter: Ostico >Assignee: Benjamin Lerer >Priority: Normal > Fix For: 4.0, 3.11.x > > > It seems that Cassandra accept empty strings "" ( FROM JSON string ) for UUID > fields but crash when asking for JSON serialization of those fields. > > Cassandra version 3.6.11.6 running in docker from official Dockerhub image. > Java driver: > {code:java} > > > com.datastax.oss > java-driver-core > 4.7.0 > > {code} > The attached code is to allow bug reproducibility: > {code:java} > package com.foo.bar; > import com.datastax.oss.driver.api.core.CqlSession; > import com.datastax.oss.driver.api.core.CqlSessionBuilder; > import com.datastax.oss.driver.api.core.cql.PreparedStatement; > import com.datastax.oss.driver.api.core.cql.ResultSet; > import com.datastax.oss.driver.api.core.cql.Row; > import com.fasterxml.jackson.databind.ObjectMapper; > import org.junit.After; > import org.junit.Before; > import org.junit.Test; > import java.net.InetSocketAddress; > import java.net.URI; > import java.util.*; > import static org.junit.Assert.assertFalse; > import static org.junit.Assert.assertNotNull; > /** > * @author Domenico Lupinetti - 23/06/2020 > */ > public class NullPointerExceptionTest { > protected String uuid; > protected CqlSession cqlSession; > @Before > public void setUp() throws Exception { > URI node = new URI( "tcp://localhost:9042" ); > final CqlSessionBuilder builder = CqlSession.builder(); > cqlSession = builder.addContactPoint( new InetSocketAddress( > node.getHost(), > node.getPort() > ) ).withLocalDatacenter( "datacenter1" ).build(); > cqlSession.execute( "CREATE KEYSPACE IF NOT EXISTS test_suite WITH > replication = {'class':'SimpleStrategy','replication_factor':1};" ); > String sb = "CREATE TABLE IF NOT EXISTS test_suite.test ( id uuid > PRIMARY KEY, another_id uuid, subject text );"; > cqlSession.execute( sb ); > PreparedStatement stm = cqlSession.prepare( "INSERT INTO > test_suite.test JSON :payload" ); > this.uuid = UUID.randomUUID().toString(); > HashMap payload = new HashMap<>(); > payload.put( "id", this.uuid ); > // *** This exception do not happens if the field is set as NULL > payload.put( "another_id", "" ); //<-- EMPTY STRING AS UUID > payload.put( "subject", "Alighieri, Dante. Divina Commedia" ); > ObjectMapper objM = new ObjectMapper(); > cqlSession.execute( > stm.bind().setString( "payload", objM.writeValueAsString( > payload ) ) > ); //<-- serialize as JSON > } > @After > public void tearDown() throws Exception { > cqlSession.execute( "DROP TABLE IF EXISTS test_suite.test;" ); > cqlSession.execute( "DROP KEYSPACE test_suite;" ); > cqlSession.close(); > } > @Test > public void testNullPointer() { > PreparedStatement stmt = cqlSession.prepare( "SELECT JSON id, > another_id FROM test_suite.test where id = :id;" ); > ResultSet resultSet = cqlSession.execute( > stmt.bind().setUuid( "id", UUID.fromString( this.uuid ) ) ); // <-- > EXCEPTION > Row r = resultSet.one(); > assertNotNull( r ); > assertNotNull( r.getString( "[json]" ) ); > assertFalse( Objects.requireNonNull( r.getString( "[json]" ) > ).isEmpty() ); > } > } > {code} > Client stack Trace: > {code:java} > com.datastax.oss.driver.api.core.servererrors.ServerError: > java.lang.NullPointerExceptioncom.datastax.oss.driver.api.core.servererrors.ServerError: > java.lang.NullPointerException > at > com.datastax.oss.driver.api.core.servererrors.ServerError.copy(ServerError.java:54) > at > com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149) > at > com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53) > at > com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30) > at > com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230) > at > com.datastax.o
[jira] [Updated] (CASSANDRA-15902) OOM because repair session thread not closed when terminating repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15902: --- Fix Version/s: 3.11.x > OOM because repair session thread not closed when terminating repair > > > Key: CASSANDRA-15902 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15902 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Swen Fuhrmann >Assignee: Swen Fuhrmann >Priority: Normal > Fix For: 3.11.x > > Attachments: heap-mem-histo.txt, repair-terminated.txt > > > In our cluster, after a while some nodes running slowly out of memory. On > that nodes we observed that Cassandra Reaper terminate repairs with a JMX > call to {{StorageServiceMBean.forceTerminateAllRepairSessions()}} because > reaching timeout of 30 min. > In the memory heap dump we see lot of instances of > {{io.netty.util.concurrent.FastThreadLocalThread}} occupy most of the memory: > {noformat} > 119 instances of "io.netty.util.concurrent.FastThreadLocalThread", loaded by > "sun.misc.Launcher$AppClassLoader @ 0x51a80" occupy 8.445.684.480 (93,96 > %) bytes. {noformat} > In the thread dump we see lot of repair threads: > {noformat} > grep "Repair#" threaddump.txt | wc -l > 50 {noformat} > > The repair jobs are waiting for the validation to finish: > {noformat} > "Repair#152:1" #96170 daemon prio=5 os_prio=0 tid=0x12fc5000 > nid=0x542a waiting on condition [0x7f81ee414000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007939bcfc8> (a > com.google.common.util.concurrent.AbstractFuture$Sync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > at > com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137) > at > com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1509) > at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:160) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) > at > org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$13/480490520.run(Unknown > Source) > at java.lang.Thread.run(Thread.java:748) {noformat} > > Thats the line where the threads stuck: > {noformat} > // Wait for validation to complete > Futures.getUnchecked(validations); {noformat} > > The call to {{StorageServiceMBean.forceTerminateAllRepairSessions()}} stops > the thread pool executor. It looks like that futures which are in progress > will therefor never be completed and the repair thread waits forever and > won't be finished. > > Environment: > Cassandra version: 3.11.4 and 3.11.6 > Cassandra Reaper: 1.4.0 > JVM memory settings: > {noformat} > -Xms11771M -Xmx11771M -XX:+UseG1GC -XX:MaxGCPauseMillis=100 > -XX:+ParallelRefProcEnabled -XX:MaxMetaspaceSize=100M {noformat} > on another cluster with same issue: > {noformat} > -Xms31744M -Xmx31744M -XX:+UseG1GC -XX:MaxGCPauseMillis=100 > -XX:+ParallelRefProcEnabled -XX:MaxMetaspaceSize=100M {noformat} > Java Runtime: > {noformat} > openjdk version "1.8.0_212" > OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_212-b03) > OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.212-b03, mixed mode) > {noformat} > > The same issue described in this comment: > https://issues.apache.org/jira/browse/CASSANDRA-14355?focusedCommentId=16992973&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16992973 > As suggested in the comments I created this new specific ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15851) Add bytebuddy support for in-jvm dtests
[ https://issues.apache.org/jira/browse/CASSANDRA-15851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15851: --- Fix Version/s: 4.0 > Add bytebuddy support for in-jvm dtests > --- > > Key: CASSANDRA-15851 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15851 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Labels: pull-request-available > Fix For: 4.0 > > > Old python dtests support byteman, but that is quite horrible to work with, > [bytebuddy|https://bytebuddy.net/#/] is much better, so we should add support > for that in the in-jvm dtests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15870) When 3.0 reads 2.1 data with a regular column set it expects the cellName to contain a element and fails if not true
[ https://issues.apache.org/jira/browse/CASSANDRA-15870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15870: --- Fix Version/s: 3.11.x 3.0.x > When 3.0 reads 2.1 data with a regular column set it expects the > cellName to contain a element and fails if not true > -- > > Key: CASSANDRA-15870 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15870 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema, Local/SSTable >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > {code} > java.lang.AssertionError > at org.apache.cassandra.db.rows.BufferCell.(BufferCell.java:48) > at > org.apache.cassandra.db.LegacyLayout$CellGrouper.addCell(LegacyLayout.java:1461) > at > org.apache.cassandra.db.LegacyLayout$CellGrouper.addAtom(LegacyLayout.java:1380) > at > org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.readRow(UnfilteredDeserializer.java:549) > at > org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.hasNext(UnfilteredDeserializer.java:523) > at > org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer.hasNext(UnfilteredDeserializer.java:336) > at > org.apache.cassandra.io.sstable.SSTableSimpleIterator$OldFormatIterator.readStaticRow(SSTableSimpleIterator.java:133) > at > org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:59) > at > org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:364) > at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48) > at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:65) > at > org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:132) > at > org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:123) > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:207) > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:160) > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > at > org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.hasNext(UnfilteredPartitionIterators.java:174) > at > org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93) > at > org.apache.cassandra.db.compaction.CompactionIterator.hasNext(CompactionIterator.java:240) > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:191) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89) > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:100) > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:345) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:83) > at java.lang.Thread.run(Thread.java:748) > {code} > This exception is similar to other JIRA such as CASSANDRA-14113 but under > root causing both exceptions, they only share the same symptom and not the > same root cause; hence a new JIRA. > This was found when a frozen collection was found when a multi-cell > collection was expected. When this happened LegacyCellName#collectionElement > comes back as null which eventually gets asserted against in BufferCell > (complex cell needs a path). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15904) nodetool getendpoints man page improvements
[ https://issues.apache.org/jira/browse/CASSANDRA-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15904: --- Fix Version/s: 4.x > nodetool getendpoints man page improvements > --- > > Key: CASSANDRA-15904 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15904 > Project: Cassandra > Issue Type: Improvement >Reporter: Arvinder Singh >Assignee: Erick Ramirez >Priority: Normal > Fix For: 4.x > > > Please include support for compound primary key. Ex: > nodetool getendpoints keyspace1 table1 pk1:pk2:pk2 > Thanks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15896) NullPointerException in SELECT JSON statement when a UUID field contains an empty string
[ https://issues.apache.org/jira/browse/CASSANDRA-15896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15896: --- Fix Version/s: 3.11.x 3.0.x 4.0 > NullPointerException in SELECT JSON statement when a UUID field contains an > empty string > > > Key: CASSANDRA-15896 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15896 > Project: Cassandra > Issue Type: Bug > Components: CQL/Interpreter, CQL/Semantics >Reporter: Ostico >Assignee: Benjamin Lerer >Priority: Normal > Fix For: 4.0, 3.0.x, 3.11.x > > > It seems that Cassandra accept empty strings "" ( FROM JSON string ) for UUID > fields but crash when asking for JSON serialization of those fields. > > Cassandra version 3.6.11.6 running in docker from official Dockerhub image. > Java driver: > {code:java} > > > com.datastax.oss > java-driver-core > 4.7.0 > > {code} > The attached code is to allow bug reproducibility: > {code:java} > package com.foo.bar; > import com.datastax.oss.driver.api.core.CqlSession; > import com.datastax.oss.driver.api.core.CqlSessionBuilder; > import com.datastax.oss.driver.api.core.cql.PreparedStatement; > import com.datastax.oss.driver.api.core.cql.ResultSet; > import com.datastax.oss.driver.api.core.cql.Row; > import com.fasterxml.jackson.databind.ObjectMapper; > import org.junit.After; > import org.junit.Before; > import org.junit.Test; > import java.net.InetSocketAddress; > import java.net.URI; > import java.util.*; > import static org.junit.Assert.assertFalse; > import static org.junit.Assert.assertNotNull; > /** > * @author Domenico Lupinetti - 23/06/2020 > */ > public class NullPointerExceptionTest { > protected String uuid; > protected CqlSession cqlSession; > @Before > public void setUp() throws Exception { > URI node = new URI( "tcp://localhost:9042" ); > final CqlSessionBuilder builder = CqlSession.builder(); > cqlSession = builder.addContactPoint( new InetSocketAddress( > node.getHost(), > node.getPort() > ) ).withLocalDatacenter( "datacenter1" ).build(); > cqlSession.execute( "CREATE KEYSPACE IF NOT EXISTS test_suite WITH > replication = {'class':'SimpleStrategy','replication_factor':1};" ); > String sb = "CREATE TABLE IF NOT EXISTS test_suite.test ( id uuid > PRIMARY KEY, another_id uuid, subject text );"; > cqlSession.execute( sb ); > PreparedStatement stm = cqlSession.prepare( "INSERT INTO > test_suite.test JSON :payload" ); > this.uuid = UUID.randomUUID().toString(); > HashMap payload = new HashMap<>(); > payload.put( "id", this.uuid ); > // *** This exception do not happens if the field is set as NULL > payload.put( "another_id", "" ); //<-- EMPTY STRING AS UUID > payload.put( "subject", "Alighieri, Dante. Divina Commedia" ); > ObjectMapper objM = new ObjectMapper(); > cqlSession.execute( > stm.bind().setString( "payload", objM.writeValueAsString( > payload ) ) > ); //<-- serialize as JSON > } > @After > public void tearDown() throws Exception { > cqlSession.execute( "DROP TABLE IF EXISTS test_suite.test;" ); > cqlSession.execute( "DROP KEYSPACE test_suite;" ); > cqlSession.close(); > } > @Test > public void testNullPointer() { > PreparedStatement stmt = cqlSession.prepare( "SELECT JSON id, > another_id FROM test_suite.test where id = :id;" ); > ResultSet resultSet = cqlSession.execute( > stmt.bind().setUuid( "id", UUID.fromString( this.uuid ) ) ); // <-- > EXCEPTION > Row r = resultSet.one(); > assertNotNull( r ); > assertNotNull( r.getString( "[json]" ) ); > assertFalse( Objects.requireNonNull( r.getString( "[json]" ) > ).isEmpty() ); > } > } > {code} > Client stack Trace: > {code:java} > com.datastax.oss.driver.api.core.servererrors.ServerError: > java.lang.NullPointerExceptioncom.datastax.oss.driver.api.core.servererrors.ServerError: > java.lang.NullPointerException > at > com.datastax.oss.driver.api.core.servererrors.ServerError.copy(ServerError.java:54) > at > com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149) > at > com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53) > at > com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30) > at > com.datastax.oss.driver.internal.core.session.DefaultSession.execute(Def
[jira] [Commented] (CASSANDRA-15821) Metrics Documentation Enhancements
[ https://issues.apache.org/jira/browse/CASSANDRA-15821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148734#comment-17148734 ] Stephen Mallette commented on CASSANDRA-15821: -- This issue now depends on CASSANDRA-15909 given the expected metric name consistency changes on that ticket. > Metrics Documentation Enhancements > -- > > Key: CASSANDRA-15821 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15821 > Project: Cassandra > Issue Type: Improvement > Components: Documentation/Website >Reporter: Stephen Mallette >Assignee: Stephen Mallette >Priority: Normal > Fix For: 4.0-beta > > > CASSANDRA-15582 involves quality around metrics and it was mentioned that > reviewing and [improving > documentation|https://github.com/apache/cassandra/blob/trunk/doc/source/operating/metrics.rst] > around metrics would fall into that scope. Please consider some of this > analysis in determining what improvements to make here: > Please see [this > spreadsheet|https://docs.google.com/spreadsheets/d/1iPWfCMIG75CI6LbYuDtCTjEOvZw-5dyH-e08bc63QnI/edit?usp=sharing] > that itemizes almost all of cassandra's metrics and whether they are > documented or not (and other notes). That spreadsheet is "almost all" > because there are some metrics that don't seem to initialize as part of > Cassandra startup (i was able to trigger some to initialize, but all were not > immediately obvious). The missing metrics seem to be related to the following: > * ThreadPool metrics - only some initialize at startup the list of which > follow below > * Streaming Metrics > * HintedHandoff Metrics > * HintsService Metrics > Here are the ThreadPool scopes that get listed: > {code} > AntiEntropyStage > CacheCleanupExecutor > CompactionExecutor > GossipStage > HintsDispatcher > MemtableFlushWriter > MemtablePostFlush > MemtableReclaimMemory > MigrationStage > MutationStage > Native-Transport-Requests > PendingRangeCalculator > PerDiskMemtableFlushWriter_0 > ReadStage > Repair-Task > RequestResponseStage > Sampler > SecondaryIndexManagement > ValidationExecutor > ViewBuildExecutor > {code} > I noticed that Keyspace Metrics have this note: "Most of these metrics are > the same as the Table Metrics above, only they are aggregated at the Keyspace > level." I think I've isolated those metrics on table that are not on keyspace > to specifically be: > {code} > BloomFilterFalsePositives > BloomFilterFalseRatio > BytesAnticompacted > BytesFlushed > BytesMutatedAnticompaction > BytesPendingRepair > BytesRepaired > BytesUnrepaired > CompactionBytesWritten > CompressionRatio > CoordinatorReadLatency > CoordinatorScanLatency > CoordinatorWriteLatency > EstimatedColumnCountHistogram > EstimatedPartitionCount > EstimatedPartitionSizeHistogram > KeyCacheHitRate > LiveSSTableCount > MaxPartitionSize > MeanPartitionSize > MinPartitionSize > MutatedAnticompactionGauge > PercentRepaired > RowCacheHitOutOfRange > RowCacheHit > RowCacheMiss > SpeculativeSampleLatencyNanos > SyncTime > WaitingOnFreeMemtableSpace > DroppedMutations > {code} > Someone with greater knowledge of this area might consider it worth the > effort to see if any of these metrics should be aggregated to the keyspace > level in case they were inadvertently missed. In any case, perhaps the > documentation could easily now reflect which metric names could be expected > on Keyspace. > The DroppedMessage metrics have a much larger body of scopes than just what > were documented: > {code} > ASYMMETRIC_SYNC_REQ > BATCH_REMOVE_REQ > BATCH_REMOVE_RSP > BATCH_STORE_REQ > BATCH_STORE_RSP > CLEANUP_MSG > COUNTER_MUTATION_REQ > COUNTER_MUTATION_RSP > ECHO_REQ > ECHO_RSP > FAILED_SESSION_MSG > FAILURE_RSP > FINALIZE_COMMIT_MSG > FINALIZE_PROMISE_MSG > FINALIZE_PROPOSE_MSG > GOSSIP_DIGEST_ACK > GOSSIP_DIGEST_ACK2 > GOSSIP_DIGEST_SYN > GOSSIP_SHUTDOWN > HINT_REQ > HINT_RSP > INTERNAL_RSP > MUTATION_REQ > MUTATION_RSP > PAXOS_COMMIT_REQ > PAXOS_COMMIT_RSP > PAXOS_PREPARE_REQ > PAXOS_PREPARE_RSP > PAXOS_PROPOSE_REQ > PAXOS_PROPOSE_RSP > PING_REQ > PING_RSP > PREPARE_CONSISTENT_REQ > PREPARE_CONSISTENT_RSP > PREPARE_MSG > RANGE_REQ > RANGE_RSP > READ_REPAIR_REQ > READ_REPAIR_RSP > READ_REQ > READ_RSP > REPAIR_RSP > REPLICATION_DONE_REQ > REPLICATION_DONE_RSP > REQUEST_RSP > SCHEMA_PULL_REQ > SCHEMA_PULL_RSP > SCHEMA_PUSH_REQ > SCHEMA_PUSH_RSP > SCHEMA_VERSION_REQ > SCHEMA_VERSION_RSP > SNAPSHOT_MSG > SNAPSHOT_REQ > SNAPSHOT_RSP > STATUS_REQ > STATUS_RSP > SYNC_REQ > SYNC_RSP > TRUNCATE_REQ > TRUNCATE_RSP > VALIDATION_REQ > VALIDATION_RSP > _SAMPLE > _TEST_1 > _TEST_2 > _TRACE > {code} > I suppose I may yet be missing some metrics as my knowledge of what's > available is limited to what I can get from JMX after cassandra > initialization (and some initial starting c
[jira] [Updated] (CASSANDRA-13994) Remove dead compact storage code before 4.0 release
[ https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-13994: -- Summary: Remove dead compact storage code before 4.0 release (was: Remove COMPACT STORAGE internals before 4.0 release) > Remove dead compact storage code before 4.0 release > --- > > Key: CASSANDRA-13994 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13994 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Local Write-Read Paths >Reporter: Alex Petrov >Assignee: Ekaterina Dimitrova >Priority: Low > Fix For: 4.0, 4.0-alpha > > > 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after > [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of > the related functionality is useless. > There are still some things to consider: > 1. One of the system tables (built indexes) was compact. For now, we just > added {{value}} column to it to make sure it's backwards-compatible, but we > might want to make sure it's just a "normal" table and doesn't have redundant > columns. > 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is > trivial, but this would mean that all built indexes will be defunct. We could > log a warning for now and ask users to migrate off those for now and > completely remove it from future releases. It's just a couple of classes > though. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13994) Remove COMPACT STORAGE internals before 4.0 release
[ https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148732#comment-17148732 ] Aleksey Yeschenko commented on CASSANDRA-13994: --- Agree with Sylvain on all points here. This is just cleanup of dead, unreachable code that doesn't change any API. It can go in an alpha, a beta, or even RC if needed. > Remove COMPACT STORAGE internals before 4.0 release > --- > > Key: CASSANDRA-13994 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13994 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Local Write-Read Paths >Reporter: Alex Petrov >Assignee: Ekaterina Dimitrova >Priority: Low > Fix For: 4.0, 4.0-alpha > > > 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after > [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of > the related functionality is useless. > There are still some things to consider: > 1. One of the system tables (built indexes) was compact. For now, we just > added {{value}} column to it to make sure it's backwards-compatible, but we > might want to make sure it's just a "normal" table and doesn't have redundant > columns. > 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is > trivial, but this would mean that all built indexes will be defunct. We could > log a warning for now and ask users to migrate off those for now and > completely remove it from future releases. It's just a couple of classes > though. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15897) Dropping compact storage with 2.1-sstables on disk make them unreadable
[ https://issues.apache.org/jira/browse/CASSANDRA-15897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne reassigned CASSANDRA-15897: Assignee: Sylvain Lebresne > Dropping compact storage with 2.1-sstables on disk make them unreadable > --- > > Key: CASSANDRA-15897 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15897 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Sylvain Lebresne >Priority: Normal > > Test reproducing: > https://github.com/krummas/cassandra/commits/marcuse/dropcompactstorage -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15891) provide a configuration option such as endpoint_verification_method
[ https://issues.apache.org/jira/browse/CASSANDRA-15891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-15891: - Change Category: Operability Complexity: Normal Component/s: Messaging/Internode Status: Open (was: Triage Needed) > provide a configuration option such as endpoint_verification_method > --- > > Key: CASSANDRA-15891 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15891 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Thanh >Priority: Normal > > With cassandra-9220, it's possible to configure endpoint/hostname > verification when enabling internode encryption. However, you don't have any > control over what endpoint is used for the endpoint verification; instead, > cassandra will automatically try to use node IP (not node hostname) for > endpoint verification, so if your node certificates don't include the IP in > the ssl certificate's SAN list, then you'll get an error like: > {code:java} > ERROR [MessagingService-Outgoing-/10.10.88.194-Gossip] 2018-11-13 > 10:20:26,903 OutboundTcpConnection.java:606 - SSL handshake error for > outbound connection to 50cc97c1[SSL_NULL_WITH_NULL_NULL: > Socket[addr=/,port=7001,localport=47684]] > javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: > No subject alternative names matching IP address found > at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) {code} > From what I've seen, most orgs will not have node IPs in their certs. > So, it will be best if cassandra would provide another configuration option > such as *{{endpoint_verification_method}}* which you could set to "ip" or > "fqdn" or something else (eg "hostname_alias" if for whatever reason the org > doesn't want to use fqdn for endpoint verification). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15850) Delay between Gossip settle and CQL port opening during the startup
[ https://issues.apache.org/jira/browse/CASSANDRA-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-15850: - Change Category: Performance Complexity: Normal Component/s: Local/Startup and Shutdown Status: Open (was: Triage Needed) > Delay between Gossip settle and CQL port opening during the startup > --- > > Key: CASSANDRA-15850 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15850 > Project: Cassandra > Issue Type: Improvement > Components: Local/Startup and Shutdown >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > > Hello, > When I am bootstrapping/restarting a Cassandra Node, there is a delay between > gossip settle and CQL port opening. Can someone please explain me where this > delay is configured and can this be changed? I don't see any information in > the logs > In my case if you see there is a ~3 minutes delay and this increases if I > increase the #of tables and #of nodes and DC. > {code:java} > INFO [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip > to settle... > INFO [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; > proceeding > INFO [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty > using native Epoll event loop > INFO [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: > [netty-buffer=netty-buffer-4.0.44.Final.452812a, > netty-codec=netty-codec-4.0.44.Final.452812a, > netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, > netty-codec-http=netty-codec-http-4.0.44.Final.452812a, > netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, > netty-common=netty-common-4.0.44.Final.452812a, > netty-handler=netty-handler-4.0.44.Final.452812a, > netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, > netty-transport=netty-transport-4.0.44.Final.452812a, > netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, > netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, > netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, > netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a] > INFO [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for > CQL clients on /x.x.x.x:9042 (encrypted)... > {code} > Also during this 3-10 minutes delay, I see > {noformat} > nodetool compactionstats > {noformat} > command is hung and never respond, until the CQL port is up and running. > Can someone please help me understand the delay here? > Cassandra Version: 3.11.3 > The issue can be easily reproducible with around 300 Tables and 100 nodes in > a cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15909) Make Table/Keyspace Metric Names Consistent With Each Other
[ https://issues.apache.org/jira/browse/CASSANDRA-15909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Mallette updated CASSANDRA-15909: - Fix Version/s: 4.0-beta > Make Table/Keyspace Metric Names Consistent With Each Other > --- > > Key: CASSANDRA-15909 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15909 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Stephen Mallette >Assignee: Stephen Mallette >Priority: Normal > Fix For: 4.0-beta > > > As part of CASSANDRA-15821 it became apparent that certain metric names found > in keyspace and tables had different names but were in fact the same metric - > they are as follows: > * Table.SyncTime == Keyspace.RepairSyncTime > * Table.RepairedDataTrackingOverreadRows == Keyspace.RepairedOverreadRows > * Table.RepairedDataTrackingOverreadTime == Keyspace.RepairedOverreadTime > * Table.AllMemtablesHeapSize == Keyspace.AllMemtablesOnHeapDataSize > * Table.AllMemtablesOffHeapSize == Keyspace.AllMemtablesOffHeapDataSize > * Table.MemtableOnHeapSize == Keyspace.MemtableOnHeapDataSize > * Table.MemtableOffHeapSize == Keyspace.MemtableOffHeapDataSize > Unifying this naming would help make metrics more consistent as part of > CASSANDRA-15582 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15909) Make Table/Keyspace Metric Names Consistent With Each Other
Stephen Mallette created CASSANDRA-15909: Summary: Make Table/Keyspace Metric Names Consistent With Each Other Key: CASSANDRA-15909 URL: https://issues.apache.org/jira/browse/CASSANDRA-15909 Project: Cassandra Issue Type: Improvement Components: Observability/Metrics Reporter: Stephen Mallette Assignee: Stephen Mallette As part of CASSANDRA-15821 it became apparent that certain metric names found in keyspace and tables had different names but were in fact the same metric - they are as follows: * Table.SyncTime == Keyspace.RepairSyncTime * Table.RepairedDataTrackingOverreadRows == Keyspace.RepairedOverreadRows * Table.RepairedDataTrackingOverreadTime == Keyspace.RepairedOverreadTime * Table.AllMemtablesHeapSize == Keyspace.AllMemtablesOnHeapDataSize * Table.AllMemtablesOffHeapSize == Keyspace.AllMemtablesOffHeapDataSize * Table.MemtableOnHeapSize == Keyspace.MemtableOnHeapDataSize * Table.MemtableOffHeapSize == Keyspace.MemtableOffHeapDataSize Unifying this naming would help make metrics more consistent as part of CASSANDRA-15582 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15850) Delay between Gossip settle and CQL port opening during the startup
[ https://issues.apache.org/jira/browse/CASSANDRA-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-15850: - Workflow: Cassandra Default Workflow (was: Cassandra Bug Workflow) Issue Type: Improvement (was: Bug) > Delay between Gossip settle and CQL port opening during the startup > --- > > Key: CASSANDRA-15850 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15850 > Project: Cassandra > Issue Type: Improvement >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > > Hello, > When I am bootstrapping/restarting a Cassandra Node, there is a delay between > gossip settle and CQL port opening. Can someone please explain me where this > delay is configured and can this be changed? I don't see any information in > the logs > In my case if you see there is a ~3 minutes delay and this increases if I > increase the #of tables and #of nodes and DC. > {code:java} > INFO [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip > to settle... > INFO [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; > proceeding > INFO [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty > using native Epoll event loop > INFO [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: > [netty-buffer=netty-buffer-4.0.44.Final.452812a, > netty-codec=netty-codec-4.0.44.Final.452812a, > netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, > netty-codec-http=netty-codec-http-4.0.44.Final.452812a, > netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, > netty-common=netty-common-4.0.44.Final.452812a, > netty-handler=netty-handler-4.0.44.Final.452812a, > netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, > netty-transport=netty-transport-4.0.44.Final.452812a, > netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, > netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, > netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, > netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a] > INFO [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for > CQL clients on /x.x.x.x:9042 (encrypted)... > {code} > Also during this 3-10 minutes delay, I see > {noformat} > nodetool compactionstats > {noformat} > command is hung and never respond, until the CQL port is up and running. > Can someone please help me understand the delay here? > Cassandra Version: 3.11.3 > The issue can be easily reproducible with around 300 Tables and 100 nodes in > a cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15850) Delay between Gossip settle and CQL port opening during the startup
[ https://issues.apache.org/jira/browse/CASSANDRA-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148717#comment-17148717 ] Sylvain Lebresne commented on CASSANDRA-15850: -- >From a look at the code, between gossip settling and starting the CQL server, >the only thing that happens is that all the tables are "reloaded" (which >involves a number of steps) to account for changes that could have happened >once Gossip settles, and compactions are started. None of that shouldn't be super long for a given table, but it's not the most optimized thing ever either, and we do reload all tables sequentially, so this may well be the culprit for the delay you are seeing. Assuming I'm correct (I'm only going from a quick read of the code here), I don't think any configuration option will help reduce that delay (but it does make sense the # of tables is a main factor). It's not a bug, the server is doing work, albeit maybe inefficiently. I'm sure this could be improved though. At a minimum, it would be more user friendly to add a log message to explain what is being done so users are not left wondering what is going on. I'm sure we can also make that faster. 2 things comes in mind in particular: - it seems the only reason to do this reloading is for the compaction strategy(ies) to take any disk boundaries change into account, but reloading does other things, and a bit of benchmarking could probably tell us if we could save meaningful time by doing a more targetted reloading. - parallelizing the work might yield benefits. > Delay between Gossip settle and CQL port opening during the startup > --- > > Key: CASSANDRA-15850 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15850 > Project: Cassandra > Issue Type: Bug >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > > Hello, > When I am bootstrapping/restarting a Cassandra Node, there is a delay between > gossip settle and CQL port opening. Can someone please explain me where this > delay is configured and can this be changed? I don't see any information in > the logs > In my case if you see there is a ~3 minutes delay and this increases if I > increase the #of tables and #of nodes and DC. > {code:java} > INFO [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip > to settle... > INFO [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; > proceeding > INFO [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty > using native Epoll event loop > INFO [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: > [netty-buffer=netty-buffer-4.0.44.Final.452812a, > netty-codec=netty-codec-4.0.44.Final.452812a, > netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, > netty-codec-http=netty-codec-http-4.0.44.Final.452812a, > netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, > netty-common=netty-common-4.0.44.Final.452812a, > netty-handler=netty-handler-4.0.44.Final.452812a, > netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, > netty-transport=netty-transport-4.0.44.Final.452812a, > netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, > netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, > netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, > netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a] > INFO [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for > CQL clients on /x.x.x.x:9042 (encrypted)... > {code} > Also during this 3-10 minutes delay, I see > {noformat} > nodetool compactionstats > {noformat} > command is hung and never respond, until the CQL port is up and running. > Can someone please help me understand the delay here? > Cassandra Version: 3.11.3 > The issue can be easily reproducible with around 300 Tables and 100 nodes in > a cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15907) Operational Improvements & Hardening for Replica Filtering Protection
[ https://issues.apache.org/jira/browse/CASSANDRA-15907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148685#comment-17148685 ] Andres de la Peña commented on CASSANDRA-15907: --- {quote} the second approach will execute RFP requests in two places: # at the beginning of 2nd phase, based on the collected outdated rows from 1st phase. These RFP requests can run in parallel and the number can be large. # at merge-listener, for additional rows requested by SRP. These RFP requests have to run in serial, but the number is usually small. {quote} I understand that that would limit the number of cached results, at the expense of producing more queries during the second phase. As for parallelizing, that would help us a bit but I think it's not going to save us from the degenerate cases that worry us, which are those where everything is so out of sync that we have to read the entire database. Perhaps we might consider a more sophisticated way of finding a balance between the numbers of cached rows and grouped queries. We could try to not cache all the results but advance in blocks of a certain fixed number of cached results, so we limit the number of cached results while we can still group keys to do less queries. That is, we could have that pessimistic SRP read prefetching and caching N rows completed with extra queries to the silent replicas, plugged to another group of unmerged-merged counters to prefetch more results if (probably) needed, if that makes sense. Regarding the guardrails, a very reasonable threshold for in-memory cached results like, for example, 100 rows, can produce 100 internal queries if they are all in different partitions, which are definitively too many queries. Thus, we could also consider having another guardrail to limit the number of additional SRP/RFP internal queries per user query, so we can fail before getting to a timeout. That guardrail could however become obsolete for RFP if we implement multi-key queries and we can do the current second phase with a single query per replica. > Operational Improvements & Hardening for Replica Filtering Protection > - > > Key: CASSANDRA-15907 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15907 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Coordination, Feature/2i Index >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Labels: 2i, memory > Fix For: 4.0-beta > > > CASSANDRA-8272 uses additional space on the heap to ensure correctness for 2i > and filtering queries at consistency levels above ONE/LOCAL_ONE. There are a > few things we should follow up on, however, to make life a bit easier for > operators and generally de-risk usage: > (Note: Line numbers are based on {{trunk}} as of > {{3cfe3c9f0dcf8ca8b25ad111800a21725bf152cb}}.) > *Minor Optimizations* > * {{ReplicaFilteringProtection:114}} - Given we size them up-front, we may be > able to use simple arrays instead of lists for {{rowsToFetch}} and > {{originalPartitions}}. Alternatively (or also), we may be able to null out > references in these two collections more aggressively. (ex. Using > {{ArrayList#set()}} instead of {{get()}} in {{queryProtectedPartitions()}}, > assuming we pass {{toFetch}} as an argument to {{querySourceOnKey()}}.) > * {{ReplicaFilteringProtection:323}} - We may be able to use > {{EncodingStats.merge()}} and remove the custom {{stats()}} method. > * {{DataResolver:111 & 228}} - Cache an instance of > {{UnaryOperator#identity()}} instead of creating one on the fly. > * {{ReplicaFilteringProtection:217}} - We may be able to scatter/gather > rather than serially querying every row that needs to be completed. This > isn't a clear win perhaps, given it targets the latency of single queries and > adds some complexity. (Certainly a decent candidate to kick even out of this > issue.) > *Documentation and Intelligibility* > * There are a few places (CHANGES.txt, tracing output in > {{ReplicaFilteringProtection}}, etc.) where we mention "replica-side > filtering protection" (which makes it seem like the coordinator doesn't > filter) rather than "replica filtering protection" (which sounds more like > what we actually do, which is protect ourselves against incorrect replica > filtering results). It's a minor fix, but would avoid confusion. > * The method call chain in {{DataResolver}} might be a bit simpler if we put > the {{repairedDataTracker}} in {{ResolveContext}}. > *Guardrails* > * As it stands, we don't have a way to enforce an upper bound on the memory > usage of {{ReplicaFilteringProtection}} which caches row responses from the > first round of requests. (Remember, these are later used
[jira] [Updated] (CASSANDRA-15908) Improve messaging on indexing frozen collections
[ https://issues.apache.org/jira/browse/CASSANDRA-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-15908: - Change Category: Operability Complexity: Low Hanging Fruit Status: Open (was: Triage Needed) > Improve messaging on indexing frozen collections > > > Key: CASSANDRA-15908 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15908 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Rocco Varela >Assignee: Rocco Varela >Priority: Low > Time Spent: 10m > Remaining Estimate: 0h > > When attempting to create an index on a frozen collection the error message > produced can be improved to provide more detail about the problem and > possible workarounds. Currently, a user will receive a message indicating > "...Frozen collections only support full() indexes" which is not immediately > clear for users new to Cassandra indexing and datatype compatibility. > Here is an example: > {code:java} > cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 1}; > cqlsh> CREATE TABLE test.mytable ( id int primary key, addresses > frozen> ); > cqlsh> CREATE INDEX mytable_addresses_idx on test.mytable (addresses); > InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot > create values() index on frozen column addresses. Frozen collections only > support full() indexes"{code} > > I'm proposing possibly enhancing the messaging to something like this. > {quote}Cannot create values() index on frozen column addresses. Frozen > collections only support indexes on the entire data structure due to > immutability constraints of being frozen, wrap your frozen column with the > full() target type to index properly. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15847) High Local read latency for few tables
[ https://issues.apache.org/jira/browse/CASSANDRA-15847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-15847: - Resolution: Invalid Status: Resolved (was: Triage Needed) The user mailing list (u...@cassandra.apache.org) is the appropriate venue for getting such help. JIRA is for reporting bugs, and documenting idea for new improvements and features. > High Local read latency for few tables > -- > > Key: CASSANDRA-15847 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15847 > Project: Cassandra > Issue Type: Improvement > Components: Tool/sstable >Reporter: Ananda Babu Velupala >Priority: Normal > > Hi Team, > I am seeing high Local read latency for 3 tables in node(its 5 node cluster) > and keyspace has total 16 sstables and hitting 10 sstables for read from that > table, can you please suggest any path forward to fix read latency. > Appreciate your help.. Thanks > Cassandra version : 3.11.3 > SSTable Hitratio: > == > k2view_usp/service_network_element_relation histograms > Percentile SSTables Write Latency Read Latency Partition Size Cell Count > (micros) (micros) (bytes) > 50% 3.00 0.00 219.34 179 10 > 75% 7.00 0.00 315.85 179 10 > 95% 10.00 0.00 454.83 179 10 > 98% 10.00 0.00 545.79 215 10 > 99% 10.00 0.00 545.79 310 20 > Min 0.00 0.00 51.01 43 0 > Max 10.00 0.00 545.79 89970660 8409007 > > TABLE STATS: > == > Table: service_network_element_relation_mirTable: > service_network_element_relation_mir SSTable count: 3 Space used (live): > 283698097 Space used (total): 283698097 Space used by snapshots (total): 0 > Off heap memory used (total): 5335824 SSTable Compression Ratio: > 0.39563345719027554 Number of partitions (estimate): 2194136 Memtable cell > count: 0 Memtable data size: 0 Memtable off heap memory used: 0 Memtable > switch count: 0 Local read count: 0 Local read latency: NaN ms Local write > count: 0 Local write latency: NaN ms Pending flushes: 0 Percent repaired: > 100.0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom > filter space used: 4567016 Bloom filter off heap memory used: 4566992 Index > summary off heap memory used: 705208 Compression metadata off heap memory > used: 63624 Compacted partition minimum bytes: 104 Compacted partition > maximum bytes: 310 Compacted partition mean bytes: 154 Average live cells per > slice (last five minutes): NaN Maximum live cells per slice (last five > minutes): 0 Average tombstones per slice (last five minutes): NaN Maximum > tombstones per slice (last five minutes): 0 Dropped Mutations: 0 > > > Table: service_network_element_relationTable: > service_network_element_relation SSTable count: 11 Space used (live): > 8067239427 Space used (total): 8067239427 Space used by snapshots (total): 0 > Off heap memory used (total): 143032693 SSTable Compression Ratio: > 0.21558247949161227 Number of partitions (estimate): 29357598 Memtable cell > count: 2714 Memtable data size: 691617 Memtable off heap memory used: 0 > Memtable switch count: 9 Local read count: 6369399 Local read latency: 0.311 > ms Local write count: 161229 Local write latency: NaN ms Pending flushes: 0 > Percent repaired: 99.91 Bloom filter false positives: 1508 Bloom filter false > ratio: 0.00012 Bloom filter space used: 113071680 Bloom filter off heap > memory used: 113071592 Index summary off heap memory used: 27244541 > Compression metadata off heap memory used: 2716560 Compacted partition > minimum bytes: 43 Compacted partition maximum bytes: 89970660 Compacted > partition mean bytes: 265 Average live cells per slice (last five minutes): > 1.1779891304347827 Maximum live cells per slice (last five minutes): 103 > Average tombstones per slice (last five minutes): 1.0 Maximum tombstones per > slice (last five minutes): 1 Dropped Mutations: 0 > > Table: service_relationTable: service_relation SSTable count: 7 Space used > (live): 281354042 Space used (total): 281354042 Space used by snapshots > (total): 35695068 Off heap memory used (total): 6423276 SSTable Compression > Ratio: 0.17685515178431085 Number of partitions (estimate): 1719400 Memtable > cell count: 1150 Memtable data size: 67482 Memtable off heap memory used: 0 > Memtable switch count: 3 Local read count: 5506327 Local read latency: 0.182 > ms Local write count: 5237 Local write latency: 0.084 ms Pending flushes: 0 > Percent repaired: 55.48 Bloom filter false positives: 17 Bloom filter false > ratio: 0.0 Bloom filter space used: 5549664 Bloom filter off heap memory > used: 5549608 Index summary off heap memory used: 737348 Compression metadata > off heap memory used: 136320 Compacted partition minimum bytes: 87 Compacted > partition maximum bytes: 4055269 Compacted partition mean bytes
[jira] [Commented] (CASSANDRA-15901) Fix unit tests to load test/conf/cassandra.yaml (so to listen on a valid ip)
[ https://issues.apache.org/jira/browse/CASSANDRA-15901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148621#comment-17148621 ] Berenguer Blasi commented on CASSANDRA-15901: - The latest run with the latest commit looks ok imo: * [CI j11|https://app.circleci.com/pipelines/github/bereng/cassandra/52/workflows/573ad5be-e34d-4668-a0af-2726d4b35568] Failure seems unrelated and passes locally * [CI j8|https://app.circleci.com/pipelines/github/bereng/cassandra/52/workflows/16e15155-7dce-4877-86f5-315c6a837d36] Seems to be a new flaky test but unrelated to the PR imo. It passes when ran locally but failed once locally on {{ant test}} * The [latest|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch-test/156/] ci-cassandra run looks much better but: ** ActiveRepairServiceTest could be a [legit|https://ci-cassandra.apache.org/job/Cassandra-trunk/199/testReport/org.apache.cassandra.service/ActiveRepairServiceTest/testQueueWhenPoolFullStrategy_cdc/history/] flaky ** ClearSpanshotTest passes locally and it failed with some weird VM error :shrug: ** Connection tests have given timeouts [before|https://ci-cassandra.apache.org/job/Cassandra-trunk/199/testReport/org.apache.cassandra.net/ConnectionTest/testMessageDeliveryOnReconnect_cdc/history/] It would be good to have a second opinion here. But I think the failures we are hitting are legit flaky tests now that we've removed much of the noise. [~mck] would you be so kind to run the tests again but not on cassandra13 to see what happens? I think we can then move this to review if no weird stuff happens. Wdyt? > Fix unit tests to load test/conf/cassandra.yaml (so to listen on a valid ip) > > > Key: CASSANDRA-15901 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15901 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Many of the ci-cassandra jenkins runs fail on {{ip-10-0-5-5: Name or service > not known}}. CASSANDRA-15622 addressed some of these but many still remain. > Currently test C* nodes are either failing or listening on a public ip > depending on which agent they end up. > The idea behind this ticket is to make ant force the private VPC ip in the > cassandra yaml when building, this will force the nodes to listen on the > correct ip. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15580) 4.0 quality testing: Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie updated CASSANDRA-15580: -- Description: Reference [doc from NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] for context. *Shepherd: Blake Eggleston* We aim for 4.0 to have the first fully functioning incremental repair solution (CASSANDRA-9143)! Furthermore we aim to verify that all types of repair: (full range, sub range, incremental) function as expected as well as ensuring community tools such as Reaper work. CASSANDRA-3200 adds an experimental option to reduce the amount of data streamed during repair, we should write more tests and see how it works with big nodes. was: *Shepherd: Blake Eggleston* We aim for 4.0 to have the first fully functioning incremental repair solution (CASSANDRA-9143)! Furthermore we aim to verify that all types of repair: (full range, sub range, incremental) function as expected as well as ensuring community tools such as Reaper work. CASSANDRA-3200 adds an experimental option to reduce the amount of data streamed during repair, we should write more tests and see how it works with big nodes. > 4.0 quality testing: Repair > --- > > Key: CASSANDRA-15580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15580 > Project: Cassandra > Issue Type: Task > Components: Test/dtest >Reporter: Josh McKenzie >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Blake Eggleston* > We aim for 4.0 to have the first fully functioning incremental repair > solution (CASSANDRA-9143)! Furthermore we aim to verify that all types of > repair: (full range, sub range, incremental) function as expected as well as > ensuring community tools such as Reaper work. CASSANDRA-3200 adds an > experimental option to reduce the amount of data streamed during repair, we > should write more tests and see how it works with big nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15584) 4.0 quality testing: Tooling - External Ecosystem
[ https://issues.apache.org/jira/browse/CASSANDRA-15584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie updated CASSANDRA-15584: -- Description: Reference [doc from NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] for context. *Shepherd: Sam Tunnicliffe* Many users of Apache Cassandra employ open source tooling to automate Cassandra configuration, runtime management, and repair scheduling. Prior to release, we need to confirm that popular third-party tools such as Reaper, Priam, etc. function properly. was: *Shepherd: Sam Tunnicliffe* Many users of Apache Cassandra employ open source tooling to automate Cassandra configuration, runtime management, and repair scheduling. Prior to release, we need to confirm that popular third-party tools such as Reaper, Priam, etc. function properly. > 4.0 quality testing: Tooling - External Ecosystem > - > > Key: CASSANDRA-15584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15584 > Project: Cassandra > Issue Type: Task >Reporter: Josh McKenzie >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Sam Tunnicliffe* > Many users of Apache Cassandra employ open source tooling to automate > Cassandra configuration, runtime management, and repair scheduling. Prior to > release, we need to confirm that popular third-party tools such as Reaper, > Priam, etc. function properly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15585) 4.0 quality testing: Test Frameworks, Tooling, Infra / Automation
[ https://issues.apache.org/jira/browse/CASSANDRA-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie updated CASSANDRA-15585: -- Description: Reference [doc from NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] for context. *Shepherd: Jordan West* This area refers to contributions to test frameworks/tooling (e.g., dtests, QuickTheories, CASSANDRA-14821), and automation enabling those tools to be applied at scale (e.g., replay testing via Spark-based replay of captured FQL logs). was: *Shepherd: Jordan West* This area refers to contributions to test frameworks/tooling (e.g., dtests, QuickTheories, CASSANDRA-14821), and automation enabling those tools to be applied at scale (e.g., replay testing via Spark-based replay of captured FQL logs). > 4.0 quality testing: Test Frameworks, Tooling, Infra / Automation > - > > Key: CASSANDRA-15585 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15585 > Project: Cassandra > Issue Type: Task > Components: Test/dtest >Reporter: Josh McKenzie >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Jordan West* > This area refers to contributions to test frameworks/tooling (e.g., dtests, > QuickTheories, CASSANDRA-14821), and automation enabling those tools to be > applied at scale (e.g., replay testing via Spark-based replay of captured FQL > logs). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15581) 4.0 quality testing: Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie updated CASSANDRA-15581: -- Description: Reference [doc from NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] for context. *Shepherd: Marcus Eriksson* Alongside the local and distributed read/write paths, we'll also want to validate compaction. CASSANDRA-6696 introduced substantial changes/improvements that require testing (esp. JBOD). was: *Shepherd: Marcus Eriksson* Alongside the local and distributed read/write paths, we'll also want to validate compaction. CASSANDRA-6696 introduced substantial changes/improvements that require testing (esp. JBOD). > 4.0 quality testing: Compaction > --- > > Key: CASSANDRA-15581 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15581 > Project: Cassandra > Issue Type: Task > Components: Test/dtest >Reporter: Josh McKenzie >Assignee: Benjamin Lerer >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Marcus Eriksson* > Alongside the local and distributed read/write paths, we'll also want to > validate compaction. CASSANDRA-6696 introduced substantial > changes/improvements that require testing (esp. JBOD). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15583) 4.0 quality testing: Tooling, Bundled and First Party
[ https://issues.apache.org/jira/browse/CASSANDRA-15583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie updated CASSANDRA-15583: -- Description: Reference [doc from NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] for context. *Shepherd: Sam Tunnicliffe* Test plans should cover bundled first-party tooling and CLIs such as nodetool, cqlsh, and new tools supporting full query and audit logging (CASSANDRA-13983, CASSANDRA-12151). was: *Shepherd: Sam Tunnicliffe* Test plans should cover bundled first-party tooling and CLIs such as nodetool, cqlsh, and new tools supporting full query and audit logging (CASSANDRA-13983, CASSANDRA-12151). > 4.0 quality testing: Tooling, Bundled and First Party > - > > Key: CASSANDRA-15583 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15583 > Project: Cassandra > Issue Type: Task > Components: Test/dtest >Reporter: Josh McKenzie >Assignee: Gianluca Righetto >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Sam Tunnicliffe* > Test plans should cover bundled first-party tooling and CLIs such as > nodetool, cqlsh, and new tools supporting full query and audit logging > (CASSANDRA-13983, CASSANDRA-12151). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15587) 4.0 quality testing: Platforms and Runtimes
[ https://issues.apache.org/jira/browse/CASSANDRA-15587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie updated CASSANDRA-15587: -- Description: Reference [doc from NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] for context. *Shepherd: {color:#ff}NONE{color}* CASSANDRA-9608 introduces support for Java 11. We'll want to verify that Cassandra under Java 11 meets expectations of stability. was: *Shepherd: {color:#FF}NONE{color}* CASSANDRA-9608 introduces support for Java 11. We'll want to verify that Cassandra under Java 11 meets expectations of stability. > 4.0 quality testing: Platforms and Runtimes > --- > > Key: CASSANDRA-15587 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15587 > Project: Cassandra > Issue Type: Task > Components: Test/dtest >Reporter: Josh McKenzie >Assignee: Gianluca Righetto >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: {color:#ff}NONE{color}* > CASSANDRA-9608 introduces support for Java 11. We'll want to verify that > Cassandra under Java 11 meets expectations of stability. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15537) 4.0 quality testing: Local Read/Write Path: Upgrade and Diff Test
[ https://issues.apache.org/jira/browse/CASSANDRA-15537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie updated CASSANDRA-15537: -- Description: Reference [doc from NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] for context. Execution of upgrade and diff tests via cassandra-diff have proven to be one of the most effective approaches toward identifying issues with the local read/write path. These include instances of data loss, data corruption, data resurrection, incorrect responses to queries, incomplete responses, and others. Upgrade and diff tests can be executed concurrent with fault injection (such as host or network failure); as well as during mixed-version scenarios (such as upgrading half of the instances in a cluster, and running upgradesstables on only half of the upgraded instances). Upgrade and diff tests are expected to continue through the release cycle, and are a great way for contributors to gain confidence in the correctness of the database under their own workloads. was: Execution of upgrade and diff tests via cassandra-diff have proven to be one of the most effective approaches toward identifying issues with the local read/write path. These include instances of data loss, data corruption, data resurrection, incorrect responses to queries, incomplete responses, and others. Upgrade and diff tests can be executed concurrent with fault injection (such as host or network failure); as well as during mixed-version scenarios (such as upgrading half of the instances in a cluster, and running upgradesstables on only half of the upgraded instances). Upgrade and diff tests are expected to continue through the release cycle, and are a great way for contributors to gain confidence in the correctness of the database under their own workloads. > 4.0 quality testing: Local Read/Write Path: Upgrade and Diff Test > - > > Key: CASSANDRA-15537 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15537 > Project: Cassandra > Issue Type: Task > Components: Test/dtest >Reporter: Josh McKenzie >Assignee: Yifan Cai >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > Execution of upgrade and diff tests via cassandra-diff have proven to be one > of the most effective approaches toward identifying issues with the local > read/write path. These include instances of data loss, data corruption, data > resurrection, incorrect responses to queries, incomplete responses, and > others. Upgrade and diff tests can be executed concurrent with fault > injection (such as host or network failure); as well as during mixed-version > scenarios (such as upgrading half of the instances in a cluster, and running > upgradesstables on only half of the upgraded instances). > Upgrade and diff tests are expected to continue through the release cycle, > and are a great way for contributors to gain confidence in the correctness of > the database under their own workloads. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie updated CASSANDRA-15579: -- Description: Reference [doc from NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] for context. *Shepherd: Blake Eggleston* Testing in this area focuses on non-node-local aspects of the read-write path: coordination, replication, read repair, etc. was: *Shepherd: Blake Eggleston* Testing in this area focuses on non-node-local aspects of the read-write path: coordination, replication, read repair, etc. > 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, > and Read Repair > > > Key: CASSANDRA-15579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15579 > Project: Cassandra > Issue Type: Task > Components: Test/unit >Reporter: Josh McKenzie >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Blake Eggleston* > Testing in this area focuses on non-node-local aspects of the read-write > path: coordination, replication, read repair, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15538) 4.0 quality testing: Local Read/Write Path: Other Areas
[ https://issues.apache.org/jira/browse/CASSANDRA-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie updated CASSANDRA-15538: -- Description: Reference [doc from NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] for context. *Shepherd: Aleksey Yeschenko* Testing in this area refers to the local read/write path (StorageProxy, ColumnFamilyStore, Memtable, SSTable reading/writing, etc). We are still finding numerous bugs and issues with the 3.0 storage engine rewrite (CASSANDRA-8099). For 4.0 we want to ensure that we thoroughly cover the local read/write path with techniques such as property-based testing, fuzzing ([example|http://cassandra.apache.org/blog/2018/10/17/finding_bugs_with_property_based_testing.html]), and a source audit. was: *Shepherd: Aleksey Yeschenko* Testing in this area refers to the local read/write path (StorageProxy, ColumnFamilyStore, Memtable, SSTable reading/writing, etc). We are still finding numerous bugs and issues with the 3.0 storage engine rewrite (CASSANDRA-8099). For 4.0 we want to ensure that we thoroughly cover the local read/write path with techniques such as property-based testing, fuzzing ([example|http://cassandra.apache.org/blog/2018/10/17/finding_bugs_with_property_based_testing.html]), and a source audit. > 4.0 quality testing: Local Read/Write Path: Other Areas > --- > > Key: CASSANDRA-15538 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15538 > Project: Cassandra > Issue Type: Task > Components: Test/dtest >Reporter: Josh McKenzie >Assignee: Sylvain Lebresne >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Aleksey Yeschenko* > Testing in this area refers to the local read/write path (StorageProxy, > ColumnFamilyStore, Memtable, SSTable reading/writing, etc). We are still > finding numerous bugs and issues with the 3.0 storage engine rewrite > (CASSANDRA-8099). For 4.0 we want to ensure that we thoroughly cover the > local read/write path with techniques such as property-based testing, fuzzing > ([example|http://cassandra.apache.org/blog/2018/10/17/finding_bugs_with_property_based_testing.html]), > and a source audit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15536) 4.0 Quality: Components and Test Plans
[ https://issues.apache.org/jira/browse/CASSANDRA-15536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie updated CASSANDRA-15536: -- Description: [Source doc from NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]. Jira migrated from [cwiki|https://cwiki.apache.org/confluence/display/CASSANDRA/4.0+Quality:+Components+and+Test+Plans] The overarching goal of the 4.0 release is that Cassandra 4.0 should be at a state where major users would run it in production when it is cut. To gain this confidence there are various ongoing testing efforts involving correctness, performance, and ease of use. In this page we try to coordinate and identify blockers for subsystems before we can release 4.0 For each component we strive to have shepherds and contributors involved. Shepherds should be committers or knowledgeable component owners and are responsible for driving their blocking tickets to completion and ensuring quality in their claimed area, while contributors have signed up to help verify that subsystem by running tests or contributing fixes. Shepherds also ideally help set testing standards and ensure that we meet a high standard of quality in their claimed area. If you are interested in contributing to testing 4.0, please add your name as assignee if you want to drive things, reviewer if just participate and review, and get involved in the the tracking ticket, and dev list/IRC discussions involving that component. h3. Targeted Components / Subsystems We've tried to collect some of the major components or subsystems that we want to ensure work properly towards having a great 4.0 release. If you think something is missing please add it. Better yet volunteer to contribute to testing it! h4. Internode Messaging In 4.0 we're getting a new Netty based inter-node communication system (CASSANDRA-8457). As internode messaging is vital to the correctness and performance of the database we should make sure that all forms (TLS, compressed, low latency, high latency, etc ...) of internode messaging function correctly. h4. Test Infrastructure / Automation: Diff Testing Diff testing is a form of model-based testing in which two clusters are exhaustively compared to assert identity. To support Apache Cassandra 4.0 validation, contributors have developed cassandra-diff. This is a Spark application that distributes the token range over a configurable number of Spark executors, then parallelizes randomized forward and reverse reads with varying paging sizes to read and compare every row present in the cluster, persisting a record of mismatches for investigation. This methodology has been instrumental to identifying data loss, data corruption, and incorrect response issues introduced in early Cassandra 3.0 releases. cassandra-diff and associated documentation can be found at: [https://github.com/apache/cassandra-diff]. Contributors are encouraged to run diff tests against clusters they manage and report issues to ensure workload diversity across the project. h4. System Tables and Internal Schema This task covers a review of and minor bug fixes to local and distributed system keyspaces. Planned work in this area is now complete. h4. Source Audit and Performance Testing: Streaming This task covers an audit of the Streaming implementation in Apache Cassandra 4.0. In this release, contributors have implemented full-SSTable streaming to improve performance and reduce memory pressure. Internode messaging changes implemented in CASSANDRA-15066 adjacent to streaming suggested that review of the streaming implementation itself may be desirable. Prior work also covered performance testing of full-SSTable streaming. h4. Test Infrastructure / Automation: "Harry" CASSANDRA-15348 - Harry: generator library and extensible framework for fuzz testing Apache Cassandra TRIAGE NEEDED Harry is a component for fuzz testing and verification of the Apache Cassandra clusters at scale. Harry allows to run tests that are able to validate state of both dense nodes (to test local read-write path) and large clusters (to test distributed read-write path), and do it efficiently. Harry defines a model that holds the state of the database, generators that produce reproducible, pseudo-random schemas, mutations, and queries, and a validator that asserts the correctness of the model following execution of generated traffic. See CASSANDRA-15348 for additional details. h4. Local Read/Write Path: IndexInfo (CASSANDRA-11206) Users upgrading from Cassandra 3.0.x to trunk will pick up CASSANDRA-11206 in the process. Contributors to 4.0 testing and validation have allocated time to testing and validation of these changes via source audit and implementation of property-based tests (currently underway). The majority of planned work here is complete, with a final set of perf tests in progres
[jira] [Assigned] (CASSANDRA-15581) 4.0 quality testing: Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie reassigned CASSANDRA-15581: - Assignee: Benjamin Lerer (was: Stephen Mallette) > 4.0 quality testing: Compaction > --- > > Key: CASSANDRA-15581 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15581 > Project: Cassandra > Issue Type: Task > Components: Test/dtest >Reporter: Josh McKenzie >Assignee: Benjamin Lerer >Priority: Normal > Fix For: 4.0-beta > > > *Shepherd: Marcus Eriksson* > Alongside the local and distributed read/write paths, we'll also want to > validate compaction. CASSANDRA-6696 introduced substantial > changes/improvements that require testing (esp. JBOD). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15585) 4.0 quality testing: Test Frameworks, Tooling, Infra / Automation
[ https://issues.apache.org/jira/browse/CASSANDRA-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148610#comment-17148610 ] Josh McKenzie commented on CASSANDRA-15585: --- This ticket is blocked with no assignee for quite some time and no movement. Could you clarify the status on Harry and what we should do with this ticket [~ifesdjeen] / [~cscotta] ? > 4.0 quality testing: Test Frameworks, Tooling, Infra / Automation > - > > Key: CASSANDRA-15585 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15585 > Project: Cassandra > Issue Type: Task > Components: Test/dtest >Reporter: Josh McKenzie >Priority: Normal > Fix For: 4.0-beta > > > *Shepherd: Jordan West* > This area refers to contributions to test frameworks/tooling (e.g., dtests, > QuickTheories, CASSANDRA-14821), and automation enabling those tools to be > applied at scale (e.g., replay testing via Spark-based replay of captured FQL > logs). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15583) 4.0 quality testing: Tooling, Bundled and First Party
[ https://issues.apache.org/jira/browse/CASSANDRA-15583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie reassigned CASSANDRA-15583: - Assignee: Gianluca Righetto > 4.0 quality testing: Tooling, Bundled and First Party > - > > Key: CASSANDRA-15583 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15583 > Project: Cassandra > Issue Type: Task > Components: Test/dtest >Reporter: Josh McKenzie >Assignee: Gianluca Righetto >Priority: Normal > Fix For: 4.0-beta > > > *Shepherd: Sam Tunnicliffe* > Test plans should cover bundled first-party tooling and CLIs such as > nodetool, cqlsh, and new tools supporting full query and audit logging > (CASSANDRA-13983, CASSANDRA-12151). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie reassigned CASSANDRA-15579: - Assignee: Andres de la Peña > 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, > and Read Repair > > > Key: CASSANDRA-15579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15579 > Project: Cassandra > Issue Type: Task > Components: Test/unit >Reporter: Josh McKenzie >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0-beta > > > *Shepherd: Blake Eggleston* > Testing in this area focuses on non-node-local aspects of the read-write > path: coordination, replication, read repair, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15580) 4.0 quality testing: Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie reassigned CASSANDRA-15580: - Assignee: Berenguer Blasi > 4.0 quality testing: Repair > --- > > Key: CASSANDRA-15580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15580 > Project: Cassandra > Issue Type: Task > Components: Test/dtest >Reporter: Josh McKenzie >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-beta > > > *Shepherd: Blake Eggleston* > We aim for 4.0 to have the first fully functioning incremental repair > solution (CASSANDRA-9143)! Furthermore we aim to verify that all types of > repair: (full range, sub range, incremental) function as expected as well as > ensuring community tools such as Reaper work. CASSANDRA-3200 adds an > experimental option to reduce the amount of data streamed during repair, we > should write more tests and see how it works with big nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org