[jira] [Updated] (CASSANDRA-19641) Accord barriers/inclusive sync points cause failures in BurnTest
[ https://issues.apache.org/jira/browse/CASSANDRA-19641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19641: --- Attachment: ci_summary.html > Accord barriers/inclusive sync points cause failures in BurnTest > > > Key: CASSANDRA-19641 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19641 > Project: Cassandra > Issue Type: Bug > Components: Accord >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Attachments: ci_summary.html > > > The burn test fails almost every run at the moment we found several things to > fix. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19641) Accord barriers/inclusive sync points cause failures in BurnTest
[ https://issues.apache.org/jira/browse/CASSANDRA-19641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19641: --- Test and Documentation Plan: Small tweaks to one of the Accord tests, covered by existing simulator tests, going to add checks in AccordMigrationTest that validate that the cache and system table for migrated keys is being correctly populated Status: Patch Available (was: Open) > Accord barriers/inclusive sync points cause failures in BurnTest > > > Key: CASSANDRA-19641 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19641 > Project: Cassandra > Issue Type: Bug > Components: Accord >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > > The burn test fails almost every run at the moment we found several things to > fix. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19641) Accord barriers/inclusive sync points cause failures in BurnTest
[ https://issues.apache.org/jira/browse/CASSANDRA-19641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19641: --- Bug Category: Parent values: Correctness(12982)Level 1 values: Test Failure(12990) Complexity: Normal Discovered By: Fuzz Test Severity: Normal Status: Open (was: Triage Needed) > Accord barriers/inclusive sync points cause failures in BurnTest > > > Key: CASSANDRA-19641 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19641 > Project: Cassandra > Issue Type: Bug > Components: Accord >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > > The burn test fails almost every run at the moment we found several things to > fix. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19641) Accord barriers/inclusive sync points cause failures in BurnTest
Ariel Weisberg created CASSANDRA-19641: -- Summary: Accord barriers/inclusive sync points cause failures in BurnTest Key: CASSANDRA-19641 URL: https://issues.apache.org/jira/browse/CASSANDRA-19641 Project: Cassandra Issue Type: Bug Components: Accord Reporter: Ariel Weisberg Assignee: Ariel Weisberg The burn test fails almost every run at the moment we found several things to fix. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19636) Fix CCM for Cassandra 5.0 and add arg to the command line which let the user explicitly select JVM
[ https://issues.apache.org/jira/browse/CASSANDRA-19636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847378#comment-17847378 ] Ariel Weisberg commented on CASSANDRA-19636: I didn't test this yet (still working on getting the existing changes to run), but +1 on what I saw in the PR and its description. > Fix CCM for Cassandra 5.0 and add arg to the command line which let the user > explicitly select JVM > -- > > Key: CASSANDRA-19636 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19636 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Attachments: CASSANDRA-19636_50_75_ci_summary.html, > CASSANDRA-19636_50_75_results_details.tar.xz, > CASSANDRA-19636_trunk_76_ci_summary.html, > CASSANDRA-19636_trunk_76_results_details.tar.xz > > > CCM fails to select the right Java version for Cassandra 5 binary > distribution. > There are also two additional changes proposed here: > * add {{--jvm-version}} argument to let the user explicitly select Java > version when starting a node from command line > * fail if {{java}} command is available on the {{PATH}} and points to a > different Java version than Java distribution defined in {{JAVA_HOME}} > because there is no obvious way for the user to figure out which one is going > to be used > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19636) Fix CCM for Cassandra 5.0 and add arg to the command line which let the user explicitly select JVM
[ https://issues.apache.org/jira/browse/CASSANDRA-19636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846980#comment-17846980 ] Ariel Weisberg edited comment on CASSANDRA-19636 at 5/16/24 2:37 PM: - Great! I assume separately the [[upgrade_manifest.py|http://example.com]] to not depend on the JAVAX_HOME so we have a more canonical set of things to test? was (Author: aweisberg): Great! I assume separately the [upgrade_manifest.py](https://github.com/apache/cassandra-dtest/blob/trunk/upgrade_tests/upgrade_manifest.py#L228) to not depend on the JAVAX_HOME so we have a more canonical set of things to test? > Fix CCM for Cassandra 5.0 and add arg to the command line which let the user > explicitly select JVM > -- > > Key: CASSANDRA-19636 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19636 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Attachments: CASSANDRA-19636_50_75_ci_summary.html, > CASSANDRA-19636_50_75_results_details.tar.xz, > CASSANDRA-19636_trunk_76_ci_summary.html, > CASSANDRA-19636_trunk_76_results_details.tar.xz > > > CCM fails to select the right Java version for Cassandra 5 binary > distribution. > There are also two additional changes proposed here: > * add {{--jvm-version}} argument to let the user explicitly select Java > version when starting a node from command line > * fail if {{java}} command is available on the {{PATH}} and points to a > different Java version than Java distribution defined in {{JAVA_HOME}} > because there is no obvious way for the user to figure out which one is going > to be used > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19636) Fix CCM for Cassandra 5.0 and add arg to the command line which let the user explicitly select JVM
[ https://issues.apache.org/jira/browse/CASSANDRA-19636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846980#comment-17846980 ] Ariel Weisberg edited comment on CASSANDRA-19636 at 5/16/24 2:37 PM: - Great! I assume separately the [upgrade_manifest.py|http://example.com] to not depend on the JAVAX_HOME so we have a more canonical set of things to test? was (Author: aweisberg): Great! I assume separately the [[upgrade_manifest.py|http://example.com]] to not depend on the JAVAX_HOME so we have a more canonical set of things to test? > Fix CCM for Cassandra 5.0 and add arg to the command line which let the user > explicitly select JVM > -- > > Key: CASSANDRA-19636 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19636 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Attachments: CASSANDRA-19636_50_75_ci_summary.html, > CASSANDRA-19636_50_75_results_details.tar.xz, > CASSANDRA-19636_trunk_76_ci_summary.html, > CASSANDRA-19636_trunk_76_results_details.tar.xz > > > CCM fails to select the right Java version for Cassandra 5 binary > distribution. > There are also two additional changes proposed here: > * add {{--jvm-version}} argument to let the user explicitly select Java > version when starting a node from command line > * fail if {{java}} command is available on the {{PATH}} and points to a > different Java version than Java distribution defined in {{JAVA_HOME}} > because there is no obvious way for the user to figure out which one is going > to be used > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19636) Fix CCM for Cassandra 5.0 and add arg to the command line which let the user explicitly select JVM
[ https://issues.apache.org/jira/browse/CASSANDRA-19636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846980#comment-17846980 ] Ariel Weisberg commented on CASSANDRA-19636: Great! I assume separately the [upgrade_manifest.py](https://github.com/apache/cassandra-dtest/blob/trunk/upgrade_tests/upgrade_manifest.py#L228) to not depend on the JAVAX_HOME so we have a more canonical set of things to test? > Fix CCM for Cassandra 5.0 and add arg to the command line which let the user > explicitly select JVM > -- > > Key: CASSANDRA-19636 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19636 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Attachments: CASSANDRA-19636_50_75_ci_summary.html, > CASSANDRA-19636_50_75_results_details.tar.xz, > CASSANDRA-19636_trunk_76_ci_summary.html, > CASSANDRA-19636_trunk_76_results_details.tar.xz > > > CCM fails to select the right Java version for Cassandra 5 binary > distribution. > There are also two additional changes proposed here: > * add {{--jvm-version}} argument to let the user explicitly select Java > version when starting a node from command line > * fail if {{java}} command is available on the {{PATH}} and points to a > different Java version than Java distribution defined in {{JAVA_HOME}} > because there is no obvious way for the user to figure out which one is going > to be used > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19636) Fix CCM for Cassandra 5.0 and add arg to the command line which let the user explicitly select JVM
[ https://issues.apache.org/jira/browse/CASSANDRA-19636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846957#comment-17846957 ] Ariel Weisberg commented on CASSANDRA-19636: In terms of future direction for CCM behavior. If CCM automatically selecting a compatible version goes away we should minimize the number of things you need to manage to make CCM do the thing. * Ignore PATH and only use JAVA_HOME * If JAVA_HOME JDK is incompatible return an error * Allowing specifying JDK version as a parameter and then look up the actual JDK location from JAVAX_HOME Existing users now don't need to modify environment variables to do whatever it is they are trying to do with CCM. > Fix CCM for Cassandra 5.0 and add arg to the command line which let the user > explicitly select JVM > -- > > Key: CASSANDRA-19636 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19636 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Attachments: CASSANDRA-19636_50_75_ci_summary.html, > CASSANDRA-19636_50_75_results_details.tar.xz, > CASSANDRA-19636_trunk_76_ci_summary.html, > CASSANDRA-19636_trunk_76_results_details.tar.xz > > > CCM fails to select the right Java version for Cassandra 5 binary > distribution. > There are also two additional changes proposed here: > * add {{--jvm-version}} argument to let the user explicitly select Java > version when starting a node from command line > * fail if {{java}} command is available on the {{PATH}} and points to a > different Java version than Java distribution defined in {{JAVA_HOME}} > because there is no obvious way for the user to figure out which one is going > to be used > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19596) IntervalTree build throughput is low enough to be a bottleneck
[ https://issues.apache.org/jira/browse/CASSANDRA-19596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19596: --- Attachment: ci_summary.html > IntervalTree build throughput is low enough to be a bottleneck > -- > > Key: CASSANDRA-19596 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19596 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/SSTable >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary.html > > > With several terabytes of data and 8 compactors it’s possible for the > compactors to spend a lot of time blocked waiting on IntervalTrees to be > built. > There is also a lot of wasted CPU because it’s updated optimistically so most > of them end up being thrown away. > This can end up being quite painful because it can block memtable flushing as > well and then a single slow CFS can block unrelated CFS because the memtable > post flush executor is single threaded and shared across all CFS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19596) IntervalTree build throughput is low enough to be a bottleneck
[ https://issues.apache.org/jira/browse/CASSANDRA-19596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845106#comment-17845106 ] Ariel Weisberg commented on CASSANDRA-19596: This is a quick and dirty improvement that removes the redundant sorting and replaces it with re-use of the existing sorted data. So instead of having to repeat the n * Lg(n) sort to construct every node we only have to do linear scans of the already sorted data that is in that nodes subtree. > IntervalTree build throughput is low enough to be a bottleneck > -- > > Key: CASSANDRA-19596 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19596 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/SSTable >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.x > > > With several terabytes of data and 8 compactors it’s possible for the > compactors to spend a lot of time blocked waiting on IntervalTrees to be > built. > There is also a lot of wasted CPU because it’s updated optimistically so most > of them end up being thrown away. > This can end up being quite painful because it can block memtable flushing as > well and then a single slow CFS can block unrelated CFS because the memtable > post flush executor is single threaded and shared across all CFS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19596) IntervalTree build throughput is low enough to be a bottleneck
[ https://issues.apache.org/jira/browse/CASSANDRA-19596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19596: --- Change Category: Performance Complexity: Low Hanging Fruit Fix Version/s: 5.x Status: Open (was: Triage Needed) > IntervalTree build throughput is low enough to be a bottleneck > -- > > Key: CASSANDRA-19596 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19596 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/SSTable >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.x > > > With several terabytes of data and 8 compactors it’s possible for the > compactors to spend a lot of time blocked waiting on IntervalTrees to be > built. > There is also a lot of wasted CPU because it’s updated optimistically so most > of them end up being thrown away. > This can end up being quite painful because it can block memtable flushing as > well and then a single slow CFS can block unrelated CFS because the memtable > post flush executor is single threaded and shared across all CFS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19596) IntervalTree build throughput is low enough to be a bottleneck
[ https://issues.apache.org/jira/browse/CASSANDRA-19596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19596: --- Test and Documentation Plan: Existing units tests + a new QT based test Status: Patch Available (was: Open) > IntervalTree build throughput is low enough to be a bottleneck > -- > > Key: CASSANDRA-19596 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19596 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/SSTable >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.x > > > With several terabytes of data and 8 compactors it’s possible for the > compactors to spend a lot of time blocked waiting on IntervalTrees to be > built. > There is also a lot of wasted CPU because it’s updated optimistically so most > of them end up being thrown away. > This can end up being quite painful because it can block memtable flushing as > well and then a single slow CFS can block unrelated CFS because the memtable > post flush executor is single threaded and shared across all CFS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-19596) IntervalTree build throughput is low enough to be a bottleneck
[ https://issues.apache.org/jira/browse/CASSANDRA-19596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg reassigned CASSANDRA-19596: -- Assignee: Ariel Weisberg > IntervalTree build throughput is low enough to be a bottleneck > -- > > Key: CASSANDRA-19596 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19596 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/SSTable >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > > With several terabytes of data and 8 compactors it’s possible for the > compactors to spend a lot of time blocked waiting on IntervalTrees to be > built. > There is also a lot of wasted CPU because it’s updated optimistically so most > of them end up being thrown away. > This can end up being quite painful because it can block memtable flushing as > well and then a single slow CFS can block unrelated CFS because the memtable > post flush executor is single threaded and shared across all CFS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19597) SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-19597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842456#comment-17842456 ] Ariel Weisberg commented on CASSANDRA-19597: I have a patch for this. I think I need to add a test as flushing and doing post flush things in order doesn't seem like it is very well covered. `CommitLogTest` has something, but it doesn't look like it actually checks that the post flush stuff runs in order or makes it run out of order. CFS also doesn't look very testable so I need to spend some time figuring out how to test it without making a mess. > SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction > - > > Key: CASSANDRA-19597 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19597 > Project: Cassandra > Issue Type: Bug > Components: Local/Memtable >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Attachments: ci_summary.html > > > There is a single post flush thread and that thread processes tasks in order > and one of those tasks can be a memtable flush for an unrelated keyspace/cfs, > and that memtable flush can be blocked by slow IntervalTree building and > racing with compactors to try and build an interval tree. > Unless there is a requirement for ordering we probably want to loosen this to > the actual ordering requirement so that problems in one keyspace can’t effect > another. > SystemKeyspace and Gossip in particular cause lots of weird problems like > nodes marking each other down because Gossip can’t process nodes being > removed (blocking flush each time in SystemKeyspace.removeNode) > A very simple fix here might be to queue the post flush task at the same time > as the flush in a per CFS queue, and then submit the task only once the flush > is completed. > If flushes complete out of order the queue will still ensure their > completions are processed in order. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19597) SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-19597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19597: --- Attachment: ci_summary.html > SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction > - > > Key: CASSANDRA-19597 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19597 > Project: Cassandra > Issue Type: Bug > Components: Local/Memtable >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Attachments: ci_summary.html > > > There is a single post flush thread and that thread processes tasks in order > and one of those tasks can be a memtable flush for an unrelated keyspace/cfs, > and that memtable flush can be blocked by slow IntervalTree building and > racing with compactors to try and build an interval tree. > Unless there is a requirement for ordering we probably want to loosen this to > the actual ordering requirement so that problems in one keyspace can’t effect > another. > SystemKeyspace and Gossip in particular cause lots of weird problems like > nodes marking each other down because Gossip can’t process nodes being > removed (blocking flush each time in SystemKeyspace.removeNode) > A very simple fix here might be to queue the post flush task at the same time > as the flush in a per CFS queue, and then submit the task only once the flush > is completed. > If flushes complete out of order the queue will still ensure their > completions are processed in order. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19597) SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-19597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19597: --- Bug Category: Parent values: Availability(12983)Level 1 values: Unavailable(12994) Complexity: Normal Component/s: Local/Memtable Discovered By: User Report Severity: Normal Status: Open (was: Triage Needed) > SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction > - > > Key: CASSANDRA-19597 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19597 > Project: Cassandra > Issue Type: Bug > Components: Local/Memtable >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > > There is a single post flush thread and that thread processes tasks in order > and one of those tasks can be a memtable flush for an unrelated keyspace/cfs, > and that memtable flush can be blocked by slow IntervalTree building and > racing with compactors to try and build an interval tree. > Unless there is a requirement for ordering we probably want to loosen this to > the actual ordering requirement so that problems in one keyspace can’t effect > another. > SystemKeyspace and Gossip in particular cause lots of weird problems like > nodes marking each other down because Gossip can’t process nodes being > removed (blocking flush each time in SystemKeyspace.removeNode) > A very simple fix here might be to queue the post flush task at the same time > as the flush in a per CFS queue, and then submit the task only once the flush > is completed. > If flushes complete out of order the queue will still ensure their > completions are processed in order. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19597) SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-19597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842091#comment-17842091 ] Ariel Weisberg commented on CASSANDRA-19597: [~benedict] is the requirement for post flush processing that it be done in order per CFS so a per CFS queue would actually address the problem of keeping the post flush processing in order? > SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction > - > > Key: CASSANDRA-19597 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19597 > Project: Cassandra > Issue Type: Bug >Reporter: Ariel Weisberg >Priority: Normal > > There is a single post flush thread and that thread processes tasks in order > and one of those tasks can be a memtable flush for an unrelated keyspace/cfs, > and that memtable flush can be blocked by slow IntervalTree building and > racing with compactors to try and build an interval tree. > Unless there is a requirement for ordering we probably want to loosen this to > the actual ordering requirement so that problems in one keyspace can’t effect > another. > SystemKeyspace and Gossip in particular cause lots of weird problems like > nodes marking each other down because Gossip can’t process nodes being > removed (blocking flush each time in SystemKeyspace.removeNode) > A very simple fix here might be to queue the post flush task at the same time > as the flush in a per CFS queue, and then submit the task only once the flush > is completed. > If flushes complete out of order the queue will still ensure their > completions are processed in order. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-19597) SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-19597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg reassigned CASSANDRA-19597: -- Assignee: Ariel Weisberg > SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction > - > > Key: CASSANDRA-19597 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19597 > Project: Cassandra > Issue Type: Bug >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > > There is a single post flush thread and that thread processes tasks in order > and one of those tasks can be a memtable flush for an unrelated keyspace/cfs, > and that memtable flush can be blocked by slow IntervalTree building and > racing with compactors to try and build an interval tree. > Unless there is a requirement for ordering we probably want to loosen this to > the actual ordering requirement so that problems in one keyspace can’t effect > another. > SystemKeyspace and Gossip in particular cause lots of weird problems like > nodes marking each other down because Gossip can’t process nodes being > removed (blocking flush each time in SystemKeyspace.removeNode) > A very simple fix here might be to queue the post flush task at the same time > as the flush in a per CFS queue, and then submit the task only once the flush > is completed. > If flushes complete out of order the queue will still ensure their > completions are processed in order. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19597) SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction
Ariel Weisberg created CASSANDRA-19597: -- Summary: SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction Key: CASSANDRA-19597 URL: https://issues.apache.org/jira/browse/CASSANDRA-19597 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg There is a single post flush thread and that thread processes tasks in order and one of those tasks can be a memtable flush for an unrelated keyspace/cfs, and that memtable flush can be blocked by slow IntervalTree building and racing with compactors to try and build an interval tree. Unless there is a requirement for ordering we probably want to loosen this to the actual ordering requirement so that problems in one keyspace can’t effect another. SystemKeyspace and Gossip in particular cause lots of weird problems like nodes marking each other down because Gossip can’t process nodes being removed (blocking flush each time in SystemKeyspace.removeNode) A very simple fix here might be to queue the post flush task at the same time as the flush in a per CFS queue, and then submit the task only once the flush is completed. If flushes complete out of order the queue will still ensure their completions are processed in order. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19596) IntervalTree build throughput is low enough to be a bottleneck
Ariel Weisberg created CASSANDRA-19596: -- Summary: IntervalTree build throughput is low enough to be a bottleneck Key: CASSANDRA-19596 URL: https://issues.apache.org/jira/browse/CASSANDRA-19596 Project: Cassandra Issue Type: Improvement Components: Local/Compaction, Local/SSTable Reporter: Ariel Weisberg With several terabytes of data and 8 compactors it’s possible for the compactors to spend a lot of time blocked waiting on IntervalTrees to be built. There is also a lot of wasted CPU because it’s updated optimistically so most of them end up being thrown away. This can end up being quite painful because it can block memtable flushing as well and then a single slow CFS can block unrelated CFS because the memtable post flush executor is single threaded and shared across all CFS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests
[ https://issues.apache.org/jira/browse/CASSANDRA-19551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837791#comment-17837791 ] Ariel Weisberg commented on CASSANDRA-19551: TY! > CCM nodes share the same environment variable map breaking upgrade tests > > > Key: CASSANDRA-19551 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19551 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.31, 3.11.17, 4.0.13, 5.0-beta2, 5.1 > > Attachments: ci_summary.html > > > In {{node.py}} {{__environment_variables}} is generally always set with a map > that is passed in from {{cluster.py}} so it is [shared between > nodes|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151] > and if nodes modify the map, such as in {{start}} when [updating the Java > version|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860] > then when {{get_env}} runs it will [overwrite the Java > version|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244] > that is selected by {{update_java_version}}. > This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in > some of the upgrade tests because after the first node upgrades to 4.0 it's > not longer possible for the subsequent nodes to select a Java version that > isn't 11 because it's overridden by {{__environment_variables}}. > I'm not even 100% clear on why the code in {{start}} should update > {{__environment_variables}} at all if we calculate the correct java version > on every invocation of other tools. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests
[ https://issues.apache.org/jira/browse/CASSANDRA-19551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19551: --- Description: In {{node.py}} {{__environment_variables}} is generally always set with a map that is passed in from {{cluster.py}} so it is [shared between nodes|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151] and if nodes modify the map, such as in {{start}} when [updating the Java version|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860] then when {{get_env}} runs it will [overwrite the Java version|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244] that is selected by {{update_java_version}}. This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in some of the upgrade tests because after the first node upgrades to 4.0 it's not longer possible for the subsequent nodes to select a Java version that isn't 11 because it's overridden by {{__environment_variables}}. I'm not even 100% clear on why the code in {{start}} should update {{__environment_variables}} at all if we calculate the correct java version on every invocation of other tools. was: In {{node.py}} {{__environment_variables}} is generally always set with a map that is passed in from {{cluster.py}} so it is [shared between nodes](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151) and if nodes modify the map, such as in {{start}} when [updating the Java version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860) then when {{get_env}} runs it will [overwrite the Java version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244) that is selected by {{update_java_version}}. This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in some of the upgrade tests because after the first node upgrades to 4.0 it's not longer possible for the subsequent nodes to select a Java version that isn't 11 because it's overridden by {{__environment_variables}}. I'm not even 100% clear on why the code in {{start}} should update {{__environment_variables}} at all if we calculate the correct java version on every invocation of other tools. > CCM nodes share the same environment variable map breaking upgrade tests > > > Key: CASSANDRA-19551 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19551 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary.html > > > In {{node.py}} {{__environment_variables}} is generally always set with a map > that is passed in from {{cluster.py}} so it is [shared between > nodes|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151] > and if nodes modify the map, such as in {{start}} when [updating the Java > version|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860] > then when {{get_env}} runs it will [overwrite the Java > version|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244] > that is selected by {{update_java_version}}. > This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in > some of the upgrade tests because after the first node upgrades to 4.0 it's > not longer possible for the subsequent nodes to select a Java version that > isn't 11 because it's overridden by {{__environment_variables}}. > I'm not even 100% clear on why the code in {{start}} should update > {{__environment_variables}} at all if we calculate the correct java version > on every invocation of other tools. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests
[ https://issues.apache.org/jira/browse/CASSANDRA-19551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837447#comment-17837447 ] Ariel Weisberg edited comment on CASSANDRA-19551 at 4/15/24 9:12 PM: - Looks like {{TestGossip::test_assassinate_valid_node}} and {{TestLargeColumn::test_cleanup}} consistently every time the past 5 runs, but {{bootstrap_test.py::TestBootstrap::test_cleanup}} I haven't seen a failure for. was (Author: aweisberg): Looks like `TestGossip::test_assassinate_valid_node` and `TestLargeColumn::test_cleanup` consistently every time the past 5 runs, but `bootstrap_test.py::TestBootstrap::test_cleanup` I haven't seen a failure for. > CCM nodes share the same environment variable map breaking upgrade tests > > > Key: CASSANDRA-19551 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19551 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary.html > > > In {{node.py}} {{__environment_variables}} is generally always set with a map > that is passed in from {{cluster.py}} so it is [shared between > nodes](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151) > and if nodes modify the map, such as in {{start}} when [updating the Java > version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860) > then when {{get_env}} runs it will [overwrite the Java > version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244) > that is selected by {{update_java_version}}. > This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in > some of the upgrade tests because after the first node upgrades to 4.0 it's > not longer possible for the subsequent nodes to select a Java version that > isn't 11 because it's overridden by {{__environment_variables}}. > I'm not even 100% clear on why the code in {{start}} should update > {{__environment_variables}} at all if we calculate the correct java version > on every invocation of other tools. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests
[ https://issues.apache.org/jira/browse/CASSANDRA-19551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837447#comment-17837447 ] Ariel Weisberg commented on CASSANDRA-19551: Looks like `TestGossip::test_assassinate_valid_node` and `TestLargeColumn::test_cleanup` consistently every time the past 5 runs, but `bootstrap_test.py::TestBootstrap::test_cleanup` I haven't seen a failure for. > CCM nodes share the same environment variable map breaking upgrade tests > > > Key: CASSANDRA-19551 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19551 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary.html > > > In {{node.py}} {{__environment_variables}} is generally always set with a map > that is passed in from {{cluster.py}} so it is [shared between > nodes](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151) > and if nodes modify the map, such as in {{start}} when [updating the Java > version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860) > then when {{get_env}} runs it will [overwrite the Java > version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244) > that is selected by {{update_java_version}}. > This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in > some of the upgrade tests because after the first node upgrades to 4.0 it's > not longer possible for the subsequent nodes to select a Java version that > isn't 11 because it's overridden by {{__environment_variables}}. > I'm not even 100% clear on why the code in {{start}} should update > {{__environment_variables}} at all if we calculate the correct java version > on every invocation of other tools. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests
[ https://issues.apache.org/jira/browse/CASSANDRA-19551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837444#comment-17837444 ] Ariel Weisberg commented on CASSANDRA-19551: Attached result of running on trunk with a copy of the environment variables for each node. One failure is an assertion on some values which looks like an unrelated problem since the cluster is coming up and working. Looking into the other failures now. I'll also have baseline nightlies tomorrow. > CCM nodes share the same environment variable map breaking upgrade tests > > > Key: CASSANDRA-19551 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19551 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary.html > > > In {{node.py}} {{__environment_variables}} is generally always set with a map > that is passed in from {{cluster.py}} so it is [shared between > nodes](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151) > and if nodes modify the map, such as in {{start}} when [updating the Java > version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860) > then when {{get_env}} runs it will [overwrite the Java > version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244) > that is selected by {{update_java_version}}. > This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in > some of the upgrade tests because after the first node upgrades to 4.0 it's > not longer possible for the subsequent nodes to select a Java version that > isn't 11 because it's overridden by {{__environment_variables}}. > I'm not even 100% clear on why the code in {{start}} should update > {{__environment_variables}} at all if we calculate the correct java version > on every invocation of other tools. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests
[ https://issues.apache.org/jira/browse/CASSANDRA-19551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19551: --- Attachment: ci_summary.html > CCM nodes share the same environment variable map breaking upgrade tests > > > Key: CASSANDRA-19551 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19551 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary.html > > > In {{node.py}} {{__environment_variables}} is generally always set with a map > that is passed in from {{cluster.py}} so it is [shared between > nodes](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151) > and if nodes modify the map, such as in {{start}} when [updating the Java > version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860) > then when {{get_env}} runs it will [overwrite the Java > version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244) > that is selected by {{update_java_version}}. > This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in > some of the upgrade tests because after the first node upgrades to 4.0 it's > not longer possible for the subsequent nodes to select a Java version that > isn't 11 because it's overridden by {{__environment_variables}}. > I'm not even 100% clear on why the code in {{start}} should update > {{__environment_variables}} at all if we calculate the correct java version > on every invocation of other tools. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests
[ https://issues.apache.org/jira/browse/CASSANDRA-19551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835866#comment-17835866 ] Ariel Weisberg commented on CASSANDRA-19551: This doesn't make sense to me https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L844 every time {{start}} is called after an upgrade we revert back to the old {{JAVA_HOME}} from before upgrade, and then replace that anyways with {{update_java_version}}. Nothing in {{update_java_version}} looks dependent on the existing value of {{JAVA_HOME}} in {{env}} and it doesn't have visibility to {{__environment_variables}} at all. > CCM nodes share the same environment variable map breaking upgrade tests > > > Key: CASSANDRA-19551 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19551 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.x > > > In {{node.py}} {{__environment_variables}} is generally always set with a map > that is passed in from {{cluster.py}} so it is [shared between > nodes](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151) > and if nodes modify the map, such as in {{start}} when [updating the Java > version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860) > then when {{get_env}} runs it will [overwrite the Java > version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244) > that is selected by {{update_java_version}}. > This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in > some of the upgrade tests because after the first node upgrades to 4.0 it's > not longer possible for the subsequent nodes to select a Java version that > isn't 11 because it's overridden by {{__environment_variables}}. > I'm not even 100% clear on why the code in {{start}} should update > {{__environment_variables}} at all if we calculate the correct java version > on every invocation of other tools. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests
[ https://issues.apache.org/jira/browse/CASSANDRA-19551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19551: --- Test and Documentation Plan: Run all python dtests Status: Patch Available (was: Open) > CCM nodes share the same environment variable map breaking upgrade tests > > > Key: CASSANDRA-19551 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19551 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.x > > > In {{node.py}} {{__environment_variables}} is generally always set with a map > that is passed in from {{cluster.py}} so it is [shared between > nodes](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151) > and if nodes modify the map, such as in {{start}} when [updating the Java > version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860) > then when {{get_env}} runs it will [overwrite the Java > version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244) > that is selected by {{update_java_version}}. > This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in > some of the upgrade tests because after the first node upgrades to 4.0 it's > not longer possible for the subsequent nodes to select a Java version that > isn't 11 because it's overridden by {{__environment_variables}}. > I'm not even 100% clear on why the code in {{start}} should update > {{__environment_variables}} at all if we calculate the correct java version > on every invocation of other tools. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests
[ https://issues.apache.org/jira/browse/CASSANDRA-19551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19551: --- Bug Category: Parent values: Correctness(12982)Level 1 values: Test Failure(12990) Complexity: Low Hanging Fruit Discovered By: DTest Fix Version/s: 5.x Reviewers: Joshua McKenzie Severity: Normal Assignee: Ariel Weisberg Status: Open (was: Triage Needed) > CCM nodes share the same environment variable map breaking upgrade tests > > > Key: CASSANDRA-19551 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19551 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.x > > > In {{node.py}} {{__environment_variables}} is generally always set with a map > that is passed in from {{cluster.py}} so it is [shared between > nodes](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151) > and if nodes modify the map, such as in {{start}} when [updating the Java > version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860) > then when {{get_env}} runs it will [overwrite the Java > version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244) > that is selected by {{update_java_version}}. > This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in > some of the upgrade tests because after the first node upgrades to 4.0 it's > not longer possible for the subsequent nodes to select a Java version that > isn't 11 because it's overridden by {{__environment_variables}}. > I'm not even 100% clear on why the code in {{start}} should update > {{__environment_variables}} at all if we calculate the correct java version > on every invocation of other tools. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests
Ariel Weisberg created CASSANDRA-19551: -- Summary: CCM nodes share the same environment variable map breaking upgrade tests Key: CASSANDRA-19551 URL: https://issues.apache.org/jira/browse/CASSANDRA-19551 Project: Cassandra Issue Type: Bug Components: Test/dtest/python Reporter: Ariel Weisberg In {{node.py}} {{__environment_variables}} is generally always set with a map that is passed in from {{cluster.py}} so it is [shared between nodes](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151) and if nodes modify the map, such as in {{start}} when [updating the Java version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860) then when {{get_env}} runs it will [overwrite the Java version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244) that is selected by {{update_java_version}}. This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in some of the upgrade tests because after the first node upgrades to 4.0 it's not longer possible for the subsequent nodes to select a Java version that isn't 11 because it's overridden by {{__environment_variables}}. I'm not even 100% clear on why the code in {{start}} should update {{__environment_variables}} at all if we calculate the correct java version on every invocation of other tools. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19444) AccordRepairJob should be async like CassandraRepairJob
[ https://issues.apache.org/jira/browse/CASSANDRA-19444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832886#comment-17832886 ] Ariel Weisberg commented on CASSANDRA-19444: Blake will be fixing this in CASSANDRA-19472 > AccordRepairJob should be async like CassandraRepairJob > --- > > Key: CASSANDRA-19444 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19444 > Project: Cassandra > Issue Type: Bug >Reporter: Ariel Weisberg >Priority: Normal > > The thread that manages repairs needs to be available and not block. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19444) AccordRepairJob should be async like CassandraRepairJob
[ https://issues.apache.org/jira/browse/CASSANDRA-19444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19444: --- Resolution: Fixed Status: Resolved (was: Triage Needed) > AccordRepairJob should be async like CassandraRepairJob > --- > > Key: CASSANDRA-19444 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19444 > Project: Cassandra > Issue Type: Bug >Reporter: Ariel Weisberg >Priority: Normal > > The thread that manages repairs needs to be available and not block. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19496) Add properties for redirecting build-resolve to mirrors
[ https://issues.apache.org/jira/browse/CASSANDRA-19496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831465#comment-17831465 ] Ariel Weisberg commented on CASSANDRA-19496: Committed. We would need to release {{3.0.30}} and {{3.11.17}} and point to those in {{upgrade_manifest.py}} for this to be helpful. Or at least create some tag to use. It would be helpful to stick with the existing format just because some things do very kludgy parsing of {{upgrade_manifest.py}}. > Add properties for redirecting build-resolve to mirrors > --- > > Key: CASSANDRA-19496 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19496 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.30, 3.11.17 > > > When running upgrade tests in CI it's not always possible to reach the public > mirrors. Currently we have properties for configuring private mirrors in 4.0+ > but we don't have this for 3.x. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19496) Add properties for redirecting build-resolve to mirrors
[ https://issues.apache.org/jira/browse/CASSANDRA-19496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19496: --- Fix Version/s: 3.0.30 3.11.17 (was: 3.0.x) (was: 3.11.x) Source Control Link: https://github.com/apache/cassandra/commit/56d3efff0c574a7c1ac2ebb6c90d283c1d256ee8 Resolution: Fixed Status: Resolved (was: Ready to Commit) > Add properties for redirecting build-resolve to mirrors > --- > > Key: CASSANDRA-19496 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19496 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.30, 3.11.17 > > > When running upgrade tests in CI it's not always possible to reach the public > mirrors. Currently we have properties for configuring private mirrors in 4.0+ > but we don't have this for 3.x. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19496) Add properties for redirecting build-resolve to mirrors
[ https://issues.apache.org/jira/browse/CASSANDRA-19496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831405#comment-17831405 ] Ariel Weisberg commented on CASSANDRA-19496: [Looks like this should change from 4.0.11 to 4.0.16?|https://github.com/apache/cassandra-dtest/blob/trunk/upgrade_tests/upgrade_manifest.py#L172] > Add properties for redirecting build-resolve to mirrors > --- > > Key: CASSANDRA-19496 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19496 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > When running upgrade tests in CI it's not always possible to reach the public > mirrors. Currently we have properties for configuring private mirrors in 4.0+ > but we don't have this for 3.x. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19496) Add properties for redirecting build-resolve to mirrors
[ https://issues.apache.org/jira/browse/CASSANDRA-19496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19496: --- Test and Documentation Plan: Run 4.0 tests showing upgrade tests work. Don't have a way to run the full 3.x tests, but run what we can in CircleCI. (was: Run 4.0 tests. Don't have a way to run the full 3.x tests, but run what we can in CircleCI.) > Add properties for redirecting build-resolve to mirrors > --- > > Key: CASSANDRA-19496 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19496 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > When running upgrade tests in CI it's not always possible to reach the public > mirrors. Currently we have properties for configuring private mirrors in 4.0+ > but we don't have this for 3.x. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19496) Add properties for redirecting build-resolve to mirrors
[ https://issues.apache.org/jira/browse/CASSANDRA-19496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19496: --- Test and Documentation Plan: Run 4.0 tests. Don't have a way to run the full 3.x tests, but run what we can in CircleCI. Status: Patch Available (was: Open) > Add properties for redirecting build-resolve to mirrors > --- > > Key: CASSANDRA-19496 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19496 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > When running upgrade tests in CI it's not always possible to reach the public > mirrors. Currently we have properties for configuring private mirrors in 4.0+ > but we don't have this for 3.x. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19496) Add properties for redirecting build-resolve to mirrors
[ https://issues.apache.org/jira/browse/CASSANDRA-19496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19496: --- Change Category: Quality Assurance Complexity: Normal Status: Open (was: Triage Needed) > Add properties for redirecting build-resolve to mirrors > --- > > Key: CASSANDRA-19496 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19496 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > When running upgrade tests in CI it's not always possible to reach the public > mirrors. Currently we have properties for configuring private mirrors in 4.0+ > but we don't have this for 3.x. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19496) Add properties for redirecting build-resolve to mirrors
Ariel Weisberg created CASSANDRA-19496: -- Summary: Add properties for redirecting build-resolve to mirrors Key: CASSANDRA-19496 URL: https://issues.apache.org/jira/browse/CASSANDRA-19496 Project: Cassandra Issue Type: Improvement Components: Build Reporter: Ariel Weisberg Assignee: Ariel Weisberg When running upgrade tests in CI it's not always possible to reach the public mirrors. Currently we have properties for configuring private mirrors in 4.0+ but we don't have this for 3.x. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19496) Add properties for redirecting build-resolve to mirrors
[ https://issues.apache.org/jira/browse/CASSANDRA-19496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19496: --- Fix Version/s: 3.0.x 3.11.x > Add properties for redirecting build-resolve to mirrors > --- > > Key: CASSANDRA-19496 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19496 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > When running upgrade tests in CI it's not always possible to reach the public > mirrors. Currently we have properties for configuring private mirrors in 4.0+ > but we don't have this for 3.x. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19332) Dropwizard Meter causes timeouts when infrequently used
[ https://issues.apache.org/jira/browse/CASSANDRA-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830652#comment-17830652 ] Ariel Weisberg commented on CASSANDRA-19332: I think I figured out what happened. The latest test repo version was tested, but my branch was behind trunk so it didn't have the change to nodetool. > Dropwizard Meter causes timeouts when infrequently used > --- > > Key: CASSANDRA-19332 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19332 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0, 5.1 > > Attachments: ci_summary_4.1.html, ci_summary_5.0.html, > ci_summary_trunk.html, result_details_4.1.tar.gz, result_details_5.0.tar.gz, > result_details_trunk.tar.gz > > > Observed instances of timeouts on clusters with long uptime and infrequently > used tables and possibly just request paths such as not using CAS for large > fractions of a year. > CAS seems to be more severely impacted because it has more metrics in the > request path such as latency measurements for prepare, propose, and the read > from the underlying table. > Tracing showed ~600-800 milliseconds for these operations in between the > “appending to memtable” and “sending a response” events. Reads had a delay > between finishing the construction of the iterator and sending the read > response. > Stack traces dumped every 100 milliseconds using {{sjk}} shows that in > prepare and propose a lot of time was being spent in > {{{}Meter.tickIfNecessary{}}}. > {code:java} > Thread [2537] RUNNABLE at 2024-01-25T21:14:48.218 - MutationStage-2 > com.codahale.metrics.Meter.tickIfNecessary(Meter.java:71) > com.codahale.metrics.Meter.mark(Meter.java:55) > com.codahale.metrics.Meter.mark(Meter.java:46) > com.codahale.metrics.Timer.update(Timer.java:150) > com.codahale.metrics.Timer.update(Timer.java:86) > org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:159) > org.apache.cassandra.service.paxos.PaxosState.prepare(PaxosState.java:92) > Thread [2539] RUNNABLE at 2024-01-25T21:14:48.520 - MutationStage-4 > com.codahale.metrics.Meter.tickIfNecessary(Meter.java:72) > com.codahale.metrics.Meter.mark(Meter.java:55) > com.codahale.metrics.Meter.mark(Meter.java:46) > com.codahale.metrics.Timer.update(Timer.java:150) > com.codahale.metrics.Timer.update(Timer.java:86) > org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:159) > org.apache.cassandra.service.paxos.PaxosState.propose(PaxosState.java:127){code} > {{tickIfNecessary}} does a linear amount of work proportional to the time > since the last time the metric was updated/read/created and this can actually > take a measurable amount of time even in a tight loop. On my M2 MBP it was > 1.5 milliseconds for a day, ~200 days took ~74 milliseconds. Before it warmed > up it was 140 milliseconds. > A quick fix is to schedule a task to read all the meters once a day so it > isn’t done in the request path and we have a more incremental amount to > process at a time. > Also observed that {{tickIfNecessary}} is not 100% thread safe in that if it > takes longer than 5 seconds to run the loop it can end up with multiple > threads attempting to run the loop at once and then they will concurrently > run {{EWMA.tick}} which probably results in some ticks not being performed. > This issue is still present in the latest version of {{Metrics}} if using > {{{}EWMA{}}}, but {{SlidingWindowTimeAverages}} looks like it has a bounded > amount of work required to tick. Switching would change how our metrics work > since the two don't have the same behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19332) Dropwizard Meter causes timeouts when infrequently used
[ https://issues.apache.org/jira/browse/CASSANDRA-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830649#comment-17830649 ] Ariel Weisberg commented on CASSANDRA-19332: So which build are you referring to? When I look at the trunk build that generated that error the log has Cloned https://github.com/apache/cassandra-dtest.git trunk to /workspace/context/cassandra-dtest; commit is 7a82b3757c136f79b52a76fdf3e98891dfff6b41 and that SHA comes after f3ca59c [https://github.com/apache/cassandra-dtest/commits/trunk/] I'll look a little deeper and try to figure out how the executors ended up running with the wrong commit on the dtest repo because clearly it happened. > Dropwizard Meter causes timeouts when infrequently used > --- > > Key: CASSANDRA-19332 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19332 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0, 5.1 > > Attachments: ci_summary_4.1.html, ci_summary_5.0.html, > ci_summary_trunk.html, result_details_4.1.tar.gz, result_details_5.0.tar.gz, > result_details_trunk.tar.gz > > > Observed instances of timeouts on clusters with long uptime and infrequently > used tables and possibly just request paths such as not using CAS for large > fractions of a year. > CAS seems to be more severely impacted because it has more metrics in the > request path such as latency measurements for prepare, propose, and the read > from the underlying table. > Tracing showed ~600-800 milliseconds for these operations in between the > “appending to memtable” and “sending a response” events. Reads had a delay > between finishing the construction of the iterator and sending the read > response. > Stack traces dumped every 100 milliseconds using {{sjk}} shows that in > prepare and propose a lot of time was being spent in > {{{}Meter.tickIfNecessary{}}}. > {code:java} > Thread [2537] RUNNABLE at 2024-01-25T21:14:48.218 - MutationStage-2 > com.codahale.metrics.Meter.tickIfNecessary(Meter.java:71) > com.codahale.metrics.Meter.mark(Meter.java:55) > com.codahale.metrics.Meter.mark(Meter.java:46) > com.codahale.metrics.Timer.update(Timer.java:150) > com.codahale.metrics.Timer.update(Timer.java:86) > org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:159) > org.apache.cassandra.service.paxos.PaxosState.prepare(PaxosState.java:92) > Thread [2539] RUNNABLE at 2024-01-25T21:14:48.520 - MutationStage-4 > com.codahale.metrics.Meter.tickIfNecessary(Meter.java:72) > com.codahale.metrics.Meter.mark(Meter.java:55) > com.codahale.metrics.Meter.mark(Meter.java:46) > com.codahale.metrics.Timer.update(Timer.java:150) > com.codahale.metrics.Timer.update(Timer.java:86) > org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:159) > org.apache.cassandra.service.paxos.PaxosState.propose(PaxosState.java:127){code} > {{tickIfNecessary}} does a linear amount of work proportional to the time > since the last time the metric was updated/read/created and this can actually > take a measurable amount of time even in a tight loop. On my M2 MBP it was > 1.5 milliseconds for a day, ~200 days took ~74 milliseconds. Before it warmed > up it was 140 milliseconds. > A quick fix is to schedule a task to read all the meters once a day so it > isn’t done in the request path and we have a more incremental amount to > process at a time. > Also observed that {{tickIfNecessary}} is not 100% thread safe in that if it > takes longer than 5 seconds to run the loop it can end up with multiple > threads attempting to run the loop at once and then they will concurrently > run {{EWMA.tick}} which probably results in some ticks not being performed. > This issue is still present in the latest version of {{Metrics}} if using > {{{}EWMA{}}}, but {{SlidingWindowTimeAverages}} looks like it has a bounded > amount of work required to tick. Switching would change how our metrics work > since the two don't have the same behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19332) Dropwizard Meter causes timeouts when infrequently used
[ https://issues.apache.org/jira/browse/CASSANDRA-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830606#comment-17830606 ] Ariel Weisberg commented on CASSANDRA-19332: Sounds good, I will wait for that. > Dropwizard Meter causes timeouts when infrequently used > --- > > Key: CASSANDRA-19332 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19332 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0, 5.1 > > Attachments: ci_summary_4.1.html, ci_summary_5.0.html, > ci_summary_trunk.html, result_details_4.1.tar.gz, result_details_5.0.tar.gz, > result_details_trunk.tar.gz > > > Observed instances of timeouts on clusters with long uptime and infrequently > used tables and possibly just request paths such as not using CAS for large > fractions of a year. > CAS seems to be more severely impacted because it has more metrics in the > request path such as latency measurements for prepare, propose, and the read > from the underlying table. > Tracing showed ~600-800 milliseconds for these operations in between the > “appending to memtable” and “sending a response” events. Reads had a delay > between finishing the construction of the iterator and sending the read > response. > Stack traces dumped every 100 milliseconds using {{sjk}} shows that in > prepare and propose a lot of time was being spent in > {{{}Meter.tickIfNecessary{}}}. > {code:java} > Thread [2537] RUNNABLE at 2024-01-25T21:14:48.218 - MutationStage-2 > com.codahale.metrics.Meter.tickIfNecessary(Meter.java:71) > com.codahale.metrics.Meter.mark(Meter.java:55) > com.codahale.metrics.Meter.mark(Meter.java:46) > com.codahale.metrics.Timer.update(Timer.java:150) > com.codahale.metrics.Timer.update(Timer.java:86) > org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:159) > org.apache.cassandra.service.paxos.PaxosState.prepare(PaxosState.java:92) > Thread [2539] RUNNABLE at 2024-01-25T21:14:48.520 - MutationStage-4 > com.codahale.metrics.Meter.tickIfNecessary(Meter.java:72) > com.codahale.metrics.Meter.mark(Meter.java:55) > com.codahale.metrics.Meter.mark(Meter.java:46) > com.codahale.metrics.Timer.update(Timer.java:150) > com.codahale.metrics.Timer.update(Timer.java:86) > org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:159) > org.apache.cassandra.service.paxos.PaxosState.propose(PaxosState.java:127){code} > {{tickIfNecessary}} does a linear amount of work proportional to the time > since the last time the metric was updated/read/created and this can actually > take a measurable amount of time even in a tight loop. On my M2 MBP it was > 1.5 milliseconds for a day, ~200 days took ~74 milliseconds. Before it warmed > up it was 140 milliseconds. > A quick fix is to schedule a task to read all the meters once a day so it > isn’t done in the request path and we have a more incremental amount to > process at a time. > Also observed that {{tickIfNecessary}} is not 100% thread safe in that if it > takes longer than 5 seconds to run the loop it can end up with multiple > threads attempting to run the loop at once and then they will concurrently > run {{EWMA.tick}} which probably results in some ticks not being performed. > This issue is still present in the latest version of {{Metrics}} if using > {{{}EWMA{}}}, but {{SlidingWindowTimeAverages}} looks like it has a bounded > amount of work required to tick. Switching would change how our metrics work > since the two don't have the same behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19332) Dropwizard Meter causes timeouts when infrequently used
[ https://issues.apache.org/jira/browse/CASSANDRA-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830574#comment-17830574 ] Ariel Weisberg commented on CASSANDRA-19332: Yeah the Dropwizard fix makes this not necessary by fixing the cost of marking a meter that hasn't been used in a long time. We can remove this once that is fixed. Since we don't update dependencies in older versions so we will need this in 4.0/4.1, and 5.0 if it releases before Dropwizard does. > Dropwizard Meter causes timeouts when infrequently used > --- > > Key: CASSANDRA-19332 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19332 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0, 5.1 > > Attachments: ci_summary_4.1.html, ci_summary_5.0.html, > ci_summary_trunk.html, result_details_4.1.tar.gz, result_details_5.0.tar.gz, > result_details_trunk.tar.gz > > > Observed instances of timeouts on clusters with long uptime and infrequently > used tables and possibly just request paths such as not using CAS for large > fractions of a year. > CAS seems to be more severely impacted because it has more metrics in the > request path such as latency measurements for prepare, propose, and the read > from the underlying table. > Tracing showed ~600-800 milliseconds for these operations in between the > “appending to memtable” and “sending a response” events. Reads had a delay > between finishing the construction of the iterator and sending the read > response. > Stack traces dumped every 100 milliseconds using {{sjk}} shows that in > prepare and propose a lot of time was being spent in > {{{}Meter.tickIfNecessary{}}}. > {code:java} > Thread [2537] RUNNABLE at 2024-01-25T21:14:48.218 - MutationStage-2 > com.codahale.metrics.Meter.tickIfNecessary(Meter.java:71) > com.codahale.metrics.Meter.mark(Meter.java:55) > com.codahale.metrics.Meter.mark(Meter.java:46) > com.codahale.metrics.Timer.update(Timer.java:150) > com.codahale.metrics.Timer.update(Timer.java:86) > org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:159) > org.apache.cassandra.service.paxos.PaxosState.prepare(PaxosState.java:92) > Thread [2539] RUNNABLE at 2024-01-25T21:14:48.520 - MutationStage-4 > com.codahale.metrics.Meter.tickIfNecessary(Meter.java:72) > com.codahale.metrics.Meter.mark(Meter.java:55) > com.codahale.metrics.Meter.mark(Meter.java:46) > com.codahale.metrics.Timer.update(Timer.java:150) > com.codahale.metrics.Timer.update(Timer.java:86) > org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:159) > org.apache.cassandra.service.paxos.PaxosState.propose(PaxosState.java:127){code} > {{tickIfNecessary}} does a linear amount of work proportional to the time > since the last time the metric was updated/read/created and this can actually > take a measurable amount of time even in a tight loop. On my M2 MBP it was > 1.5 milliseconds for a day, ~200 days took ~74 milliseconds. Before it warmed > up it was 140 milliseconds. > A quick fix is to schedule a task to read all the meters once a day so it > isn’t done in the request path and we have a more incremental amount to > process at a time. > Also observed that {{tickIfNecessary}} is not 100% thread safe in that if it > takes longer than 5 seconds to run the loop it can end up with multiple > threads attempting to run the loop at once and then they will concurrently > run {{EWMA.tick}} which probably results in some ticks not being performed. > This issue is still present in the latest version of {{Metrics}} if using > {{{}EWMA{}}}, but {{SlidingWindowTimeAverages}} looks like it has a bounded > amount of work required to tick. Switching would change how our metrics work > since the two don't have the same behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19332) Dropwizard Meter causes timeouts when infrequently used
[ https://issues.apache.org/jira/browse/CASSANDRA-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830549#comment-17830549 ] Ariel Weisberg commented on CASSANDRA-19332: I haven't merged this for a while because I have had a steady stream of CI issues. Based on comparisons with nightlies it doesn't look like the test failures are related. 5.0 is basically clean, 4.0/4.1 have a bunch of upgrade test issues, and so does trunk. I think this is more specific to our CI environment and some issues invoking different upgrade paths. I'm going to go ahead and merge anyways unless someone objects since the scope of this change is pretty narrow and doesn't look related to any of the failures. > Dropwizard Meter causes timeouts when infrequently used > --- > > Key: CASSANDRA-19332 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19332 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0, 5.1 > > Attachments: ci_summary_4.1.html, ci_summary_5.0.html, > ci_summary_trunk.html, result_details_4.1.tar.gz, result_details_5.0.tar.gz, > result_details_trunk.tar.gz > > > Observed instances of timeouts on clusters with long uptime and infrequently > used tables and possibly just request paths such as not using CAS for large > fractions of a year. > CAS seems to be more severely impacted because it has more metrics in the > request path such as latency measurements for prepare, propose, and the read > from the underlying table. > Tracing showed ~600-800 milliseconds for these operations in between the > “appending to memtable” and “sending a response” events. Reads had a delay > between finishing the construction of the iterator and sending the read > response. > Stack traces dumped every 100 milliseconds using {{sjk}} shows that in > prepare and propose a lot of time was being spent in > {{{}Meter.tickIfNecessary{}}}. > {code:java} > Thread [2537] RUNNABLE at 2024-01-25T21:14:48.218 - MutationStage-2 > com.codahale.metrics.Meter.tickIfNecessary(Meter.java:71) > com.codahale.metrics.Meter.mark(Meter.java:55) > com.codahale.metrics.Meter.mark(Meter.java:46) > com.codahale.metrics.Timer.update(Timer.java:150) > com.codahale.metrics.Timer.update(Timer.java:86) > org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:159) > org.apache.cassandra.service.paxos.PaxosState.prepare(PaxosState.java:92) > Thread [2539] RUNNABLE at 2024-01-25T21:14:48.520 - MutationStage-4 > com.codahale.metrics.Meter.tickIfNecessary(Meter.java:72) > com.codahale.metrics.Meter.mark(Meter.java:55) > com.codahale.metrics.Meter.mark(Meter.java:46) > com.codahale.metrics.Timer.update(Timer.java:150) > com.codahale.metrics.Timer.update(Timer.java:86) > org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:159) > org.apache.cassandra.service.paxos.PaxosState.propose(PaxosState.java:127){code} > {{tickIfNecessary}} does a linear amount of work proportional to the time > since the last time the metric was updated/read/created and this can actually > take a measurable amount of time even in a tight loop. On my M2 MBP it was > 1.5 milliseconds for a day, ~200 days took ~74 milliseconds. Before it warmed > up it was 140 milliseconds. > A quick fix is to schedule a task to read all the meters once a day so it > isn’t done in the request path and we have a more incremental amount to > process at a time. > Also observed that {{tickIfNecessary}} is not 100% thread safe in that if it > takes longer than 5 seconds to run the loop it can end up with multiple > threads attempting to run the loop at once and then they will concurrently > run {{EWMA.tick}} which probably results in some ticks not being performed. > This issue is still present in the latest version of {{Metrics}} if using > {{{}EWMA{}}}, but {{SlidingWindowTimeAverages}} looks like it has a bounded > amount of work required to tick. Switching would change how our metrics work > since the two don't have the same behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19332) Dropwizard Meter causes timeouts when infrequently used
[ https://issues.apache.org/jira/browse/CASSANDRA-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19332: --- Fix Version/s: 5.0 5.1 > Dropwizard Meter causes timeouts when infrequently used > --- > > Key: CASSANDRA-19332 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19332 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0, 5.1 > > Attachments: ci_summary_4.1.html, ci_summary_5.0.html, > ci_summary_trunk.html, result_details_4.1.tar.gz, result_details_5.0.tar.gz, > result_details_trunk.tar.gz > > > Observed instances of timeouts on clusters with long uptime and infrequently > used tables and possibly just request paths such as not using CAS for large > fractions of a year. > CAS seems to be more severely impacted because it has more metrics in the > request path such as latency measurements for prepare, propose, and the read > from the underlying table. > Tracing showed ~600-800 milliseconds for these operations in between the > “appending to memtable” and “sending a response” events. Reads had a delay > between finishing the construction of the iterator and sending the read > response. > Stack traces dumped every 100 milliseconds using {{sjk}} shows that in > prepare and propose a lot of time was being spent in > {{{}Meter.tickIfNecessary{}}}. > {code:java} > Thread [2537] RUNNABLE at 2024-01-25T21:14:48.218 - MutationStage-2 > com.codahale.metrics.Meter.tickIfNecessary(Meter.java:71) > com.codahale.metrics.Meter.mark(Meter.java:55) > com.codahale.metrics.Meter.mark(Meter.java:46) > com.codahale.metrics.Timer.update(Timer.java:150) > com.codahale.metrics.Timer.update(Timer.java:86) > org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:159) > org.apache.cassandra.service.paxos.PaxosState.prepare(PaxosState.java:92) > Thread [2539] RUNNABLE at 2024-01-25T21:14:48.520 - MutationStage-4 > com.codahale.metrics.Meter.tickIfNecessary(Meter.java:72) > com.codahale.metrics.Meter.mark(Meter.java:55) > com.codahale.metrics.Meter.mark(Meter.java:46) > com.codahale.metrics.Timer.update(Timer.java:150) > com.codahale.metrics.Timer.update(Timer.java:86) > org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:159) > org.apache.cassandra.service.paxos.PaxosState.propose(PaxosState.java:127){code} > {{tickIfNecessary}} does a linear amount of work proportional to the time > since the last time the metric was updated/read/created and this can actually > take a measurable amount of time even in a tight loop. On my M2 MBP it was > 1.5 milliseconds for a day, ~200 days took ~74 milliseconds. Before it warmed > up it was 140 milliseconds. > A quick fix is to schedule a task to read all the meters once a day so it > isn’t done in the request path and we have a more incremental amount to > process at a time. > Also observed that {{tickIfNecessary}} is not 100% thread safe in that if it > takes longer than 5 seconds to run the loop it can end up with multiple > threads attempting to run the loop at once and then they will concurrently > run {{EWMA.tick}} which probably results in some ticks not being performed. > This issue is still present in the latest version of {{Metrics}} if using > {{{}EWMA{}}}, but {{SlidingWindowTimeAverages}} looks like it has a bounded > amount of work required to tick. Switching would change how our metrics work > since the two don't have the same behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19332) Dropwizard Meter causes timeouts when infrequently used
[ https://issues.apache.org/jira/browse/CASSANDRA-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19332: --- Attachment: ci_summary_4.1.html result_details_4.1.tar.gz > Dropwizard Meter causes timeouts when infrequently used > --- > > Key: CASSANDRA-19332 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19332 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: ci_summary_4.1.html, ci_summary_5.0.html, > ci_summary_trunk.html, result_details_4.1.tar.gz, result_details_5.0.tar.gz, > result_details_trunk.tar.gz > > > Observed instances of timeouts on clusters with long uptime and infrequently > used tables and possibly just request paths such as not using CAS for large > fractions of a year. > CAS seems to be more severely impacted because it has more metrics in the > request path such as latency measurements for prepare, propose, and the read > from the underlying table. > Tracing showed ~600-800 milliseconds for these operations in between the > “appending to memtable” and “sending a response” events. Reads had a delay > between finishing the construction of the iterator and sending the read > response. > Stack traces dumped every 100 milliseconds using {{sjk}} shows that in > prepare and propose a lot of time was being spent in > {{{}Meter.tickIfNecessary{}}}. > {code:java} > Thread [2537] RUNNABLE at 2024-01-25T21:14:48.218 - MutationStage-2 > com.codahale.metrics.Meter.tickIfNecessary(Meter.java:71) > com.codahale.metrics.Meter.mark(Meter.java:55) > com.codahale.metrics.Meter.mark(Meter.java:46) > com.codahale.metrics.Timer.update(Timer.java:150) > com.codahale.metrics.Timer.update(Timer.java:86) > org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:159) > org.apache.cassandra.service.paxos.PaxosState.prepare(PaxosState.java:92) > Thread [2539] RUNNABLE at 2024-01-25T21:14:48.520 - MutationStage-4 > com.codahale.metrics.Meter.tickIfNecessary(Meter.java:72) > com.codahale.metrics.Meter.mark(Meter.java:55) > com.codahale.metrics.Meter.mark(Meter.java:46) > com.codahale.metrics.Timer.update(Timer.java:150) > com.codahale.metrics.Timer.update(Timer.java:86) > org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:159) > org.apache.cassandra.service.paxos.PaxosState.propose(PaxosState.java:127){code} > {{tickIfNecessary}} does a linear amount of work proportional to the time > since the last time the metric was updated/read/created and this can actually > take a measurable amount of time even in a tight loop. On my M2 MBP it was > 1.5 milliseconds for a day, ~200 days took ~74 milliseconds. Before it warmed > up it was 140 milliseconds. > A quick fix is to schedule a task to read all the meters once a day so it > isn’t done in the request path and we have a more incremental amount to > process at a time. > Also observed that {{tickIfNecessary}} is not 100% thread safe in that if it > takes longer than 5 seconds to run the loop it can end up with multiple > threads attempting to run the loop at once and then they will concurrently > run {{EWMA.tick}} which probably results in some ticks not being performed. > This issue is still present in the latest version of {{Metrics}} if using > {{{}EWMA{}}}, but {{SlidingWindowTimeAverages}} looks like it has a bounded > amount of work required to tick. Switching would change how our metrics work > since the two don't have the same behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19332) Dropwizard Meter causes timeouts when infrequently used
[ https://issues.apache.org/jira/browse/CASSANDRA-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19332: --- Attachment: ci_summary_5.0.html result_details_5.0.tar.gz > Dropwizard Meter causes timeouts when infrequently used > --- > > Key: CASSANDRA-19332 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19332 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: ci_summary_5.0.html, ci_summary_trunk.html, > result_details_5.0.tar.gz, result_details_trunk.tar.gz > > > Observed instances of timeouts on clusters with long uptime and infrequently > used tables and possibly just request paths such as not using CAS for large > fractions of a year. > CAS seems to be more severely impacted because it has more metrics in the > request path such as latency measurements for prepare, propose, and the read > from the underlying table. > Tracing showed ~600-800 milliseconds for these operations in between the > “appending to memtable” and “sending a response” events. Reads had a delay > between finishing the construction of the iterator and sending the read > response. > Stack traces dumped every 100 milliseconds using {{sjk}} shows that in > prepare and propose a lot of time was being spent in > {{{}Meter.tickIfNecessary{}}}. > {code:java} > Thread [2537] RUNNABLE at 2024-01-25T21:14:48.218 - MutationStage-2 > com.codahale.metrics.Meter.tickIfNecessary(Meter.java:71) > com.codahale.metrics.Meter.mark(Meter.java:55) > com.codahale.metrics.Meter.mark(Meter.java:46) > com.codahale.metrics.Timer.update(Timer.java:150) > com.codahale.metrics.Timer.update(Timer.java:86) > org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:159) > org.apache.cassandra.service.paxos.PaxosState.prepare(PaxosState.java:92) > Thread [2539] RUNNABLE at 2024-01-25T21:14:48.520 - MutationStage-4 > com.codahale.metrics.Meter.tickIfNecessary(Meter.java:72) > com.codahale.metrics.Meter.mark(Meter.java:55) > com.codahale.metrics.Meter.mark(Meter.java:46) > com.codahale.metrics.Timer.update(Timer.java:150) > com.codahale.metrics.Timer.update(Timer.java:86) > org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:159) > org.apache.cassandra.service.paxos.PaxosState.propose(PaxosState.java:127){code} > {{tickIfNecessary}} does a linear amount of work proportional to the time > since the last time the metric was updated/read/created and this can actually > take a measurable amount of time even in a tight loop. On my M2 MBP it was > 1.5 milliseconds for a day, ~200 days took ~74 milliseconds. Before it warmed > up it was 140 milliseconds. > A quick fix is to schedule a task to read all the meters once a day so it > isn’t done in the request path and we have a more incremental amount to > process at a time. > Also observed that {{tickIfNecessary}} is not 100% thread safe in that if it > takes longer than 5 seconds to run the loop it can end up with multiple > threads attempting to run the loop at once and then they will concurrently > run {{EWMA.tick}} which probably results in some ticks not being performed. > This issue is still present in the latest version of {{Metrics}} if using > {{{}EWMA{}}}, but {{SlidingWindowTimeAverages}} looks like it has a bounded > amount of work required to tick. Switching would change how our metrics work > since the two don't have the same behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19332) Dropwizard Meter causes timeouts when infrequently used
[ https://issues.apache.org/jira/browse/CASSANDRA-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19332: --- Attachment: ci_summary_trunk.html result_details_trunk.tar.gz > Dropwizard Meter causes timeouts when infrequently used > --- > > Key: CASSANDRA-19332 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19332 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: ci_summary_trunk.html, result_details_trunk.tar.gz > > > Observed instances of timeouts on clusters with long uptime and infrequently > used tables and possibly just request paths such as not using CAS for large > fractions of a year. > CAS seems to be more severely impacted because it has more metrics in the > request path such as latency measurements for prepare, propose, and the read > from the underlying table. > Tracing showed ~600-800 milliseconds for these operations in between the > “appending to memtable” and “sending a response” events. Reads had a delay > between finishing the construction of the iterator and sending the read > response. > Stack traces dumped every 100 milliseconds using {{sjk}} shows that in > prepare and propose a lot of time was being spent in > {{{}Meter.tickIfNecessary{}}}. > {code:java} > Thread [2537] RUNNABLE at 2024-01-25T21:14:48.218 - MutationStage-2 > com.codahale.metrics.Meter.tickIfNecessary(Meter.java:71) > com.codahale.metrics.Meter.mark(Meter.java:55) > com.codahale.metrics.Meter.mark(Meter.java:46) > com.codahale.metrics.Timer.update(Timer.java:150) > com.codahale.metrics.Timer.update(Timer.java:86) > org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:159) > org.apache.cassandra.service.paxos.PaxosState.prepare(PaxosState.java:92) > Thread [2539] RUNNABLE at 2024-01-25T21:14:48.520 - MutationStage-4 > com.codahale.metrics.Meter.tickIfNecessary(Meter.java:72) > com.codahale.metrics.Meter.mark(Meter.java:55) > com.codahale.metrics.Meter.mark(Meter.java:46) > com.codahale.metrics.Timer.update(Timer.java:150) > com.codahale.metrics.Timer.update(Timer.java:86) > org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:159) > org.apache.cassandra.service.paxos.PaxosState.propose(PaxosState.java:127){code} > {{tickIfNecessary}} does a linear amount of work proportional to the time > since the last time the metric was updated/read/created and this can actually > take a measurable amount of time even in a tight loop. On my M2 MBP it was > 1.5 milliseconds for a day, ~200 days took ~74 milliseconds. Before it warmed > up it was 140 milliseconds. > A quick fix is to schedule a task to read all the meters once a day so it > isn’t done in the request path and we have a more incremental amount to > process at a time. > Also observed that {{tickIfNecessary}} is not 100% thread safe in that if it > takes longer than 5 seconds to run the loop it can end up with multiple > threads attempting to run the loop at once and then they will concurrently > run {{EWMA.tick}} which probably results in some ticks not being performed. > This issue is still present in the latest version of {{Metrics}} if using > {{{}EWMA{}}}, but {{SlidingWindowTimeAverages}} looks like it has a bounded > amount of work required to tick. Switching would change how our metrics work > since the two don't have the same behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19484) Add support for providing nvdDatafeedUrl to OWASP
[ https://issues.apache.org/jira/browse/CASSANDRA-19484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19484: --- Fix Version/s: 3.0.30 3.11.17 4.0.13 4.1.5 5.0 5.1 (was: 3.0.x) (was: 3.11.x) (was: 5.x) (was: 4.0.x) (was: 4.1.x) (was: 5.0.x) > Add support for providing nvdDatafeedUrl to OWASP > - > > Key: CASSANDRA-19484 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19484 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.30, 3.11.17, 4.0.13, 4.1.5, 5.0, 5.1 > > > This allows you to point to a mirror that is faster and doesn’t require an > API key. > This is kind of painful to make work in {{ant}} because you can't specify the > property at all if you want to use the API and I couldn't find a way to get > {{ant}} to conditionally supply the property without having a dedicated > invocation of the {{dependency-check}} task with/without the parameter > {{nvdDataFeedUrl}} specified. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19484) Add support for providing nvdDatafeedUrl to OWASP
[ https://issues.apache.org/jira/browse/CASSANDRA-19484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19484: --- Resolution: Fixed Status: Resolved (was: Ready to Commit) > Add support for providing nvdDatafeedUrl to OWASP > - > > Key: CASSANDRA-19484 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19484 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > This allows you to point to a mirror that is faster and doesn’t require an > API key. > This is kind of painful to make work in {{ant}} because you can't specify the > property at all if you want to use the API and I couldn't find a way to get > {{ant}} to conditionally supply the property without having a dedicated > invocation of the {{dependency-check}} task with/without the parameter > {{nvdDataFeedUrl}} specified. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19484) Add support for providing nvdDatafeedUrl to OWASP
[ https://issues.apache.org/jira/browse/CASSANDRA-19484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19484: --- Status: Review In Progress (was: Patch Available) > Add support for providing nvdDatafeedUrl to OWASP > - > > Key: CASSANDRA-19484 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19484 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > This allows you to point to a mirror that is faster and doesn’t require an > API key. > This is kind of painful to make work in {{ant}} because you can't specify the > property at all if you want to use the API and I couldn't find a way to get > {{ant}} to conditionally supply the property without having a dedicated > invocation of the {{dependency-check}} task with/without the parameter > {{nvdDataFeedUrl}} specified. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19484) Add support for providing nvdDatafeedUrl to OWASP
[ https://issues.apache.org/jira/browse/CASSANDRA-19484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19484: --- Status: Ready to Commit (was: Review In Progress) It's clean on all the branches. > Add support for providing nvdDatafeedUrl to OWASP > - > > Key: CASSANDRA-19484 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19484 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > This allows you to point to a mirror that is faster and doesn’t require an > API key. > This is kind of painful to make work in {{ant}} because you can't specify the > property at all if you want to use the API and I couldn't find a way to get > {{ant}} to conditionally supply the property without having a dedicated > invocation of the {{dependency-check}} task with/without the parameter > {{nvdDataFeedUrl}} specified. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19484) Add support for providing nvdDatafeedUrl to OWASP
[ https://issues.apache.org/jira/browse/CASSANDRA-19484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19484: --- Source Control Link: https://github.com/apache/cassandra/commit/c8fbb97ab04142f9b49fe86017b808ff3e35c10a > Add support for providing nvdDatafeedUrl to OWASP > - > > Key: CASSANDRA-19484 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19484 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > This allows you to point to a mirror that is faster and doesn’t require an > API key. > This is kind of painful to make work in {{ant}} because you can't specify the > property at all if you want to use the API and I couldn't find a way to get > {{ant}} to conditionally supply the property without having a dedicated > invocation of the {{dependency-check}} task with/without the parameter > {{nvdDataFeedUrl}} specified. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19484) Add support for providing nvdDatafeedUrl to OWASP
[ https://issues.apache.org/jira/browse/CASSANDRA-19484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19484: --- Reviewers: Berenguer Blasi (was: Berenguer Blasi, Joshua McKenzie) > Add support for providing nvdDatafeedUrl to OWASP > - > > Key: CASSANDRA-19484 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19484 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > This allows you to point to a mirror that is faster and doesn’t require an > API key. > This is kind of painful to make work in {{ant}} because you can't specify the > property at all if you want to use the API and I couldn't find a way to get > {{ant}} to conditionally supply the property without having a dedicated > invocation of the {{dependency-check}} task with/without the parameter > {{nvdDataFeedUrl}} specified. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19484) Add support for providing nvdDatafeedUrl to OWASP
[ https://issues.apache.org/jira/browse/CASSANDRA-19484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829629#comment-17829629 ] Ariel Weisberg edited comment on CASSANDRA-19484 at 3/21/24 5:43 PM: - *edit* Removed a bunch of incorrectly generated dependencies with CVEs to shorten the comment thread. was (Author: aweisberg): 3.0 {noformat} cassandra-client-4.0.35.jar/META-INF/maven/com.apple.pie.cassandra/pie-cassandra-driver-core/pom.xml: CVE-2010-0538 cassandra-client-4.0.35.jar/META-INF/maven/com.apple.pie.cassandra/pie-cassandra-driver-mapping/pom.xml: CVE-2010-0538 jackson-databind-2.13.2.2.jar: CVE-2023-35116, CVE-2022-42003, CVE-2022-42004 snappy-java-1.1.8.4.jar: CVE-2023-34455, CVE-2023-34454, CVE-2023-34453, CVE-2023-43642 {noformat} 3.11 {noformat} cassandra-client-4.0.35.jar/META-INF/maven/com.apple.pie.cassandra/pie-cassandra-driver-core/pom.xml: CVE-2010-0538 cassandra-client-4.0.35.jar/META-INF/maven/com.apple.pie.cassandra/pie-cassandra-driver-mapping/pom.xml: CVE-2010-0538 jackson-mapper-asl-1.9.2.jar: CVE-2017-7525, CVE-2019-10172 snakeyaml-1.11.jar: CVE-2017-18640 snappy-java-1.1.8.4.jar: CVE-2023-34455, CVE-2023-34454, CVE-2023-34453, CVE-2023-43642 {noformat} 4.0 {noformat} cassandra-client-4.0.35.jar/META-INF/maven/com.apple.pie.cassandra/pie-cassandra-driver-core/pom.xml: CVE-2010-0538 cassandra-client-4.0.35.jar/META-INF/maven/com.apple.pie.cassandra/pie-cassandra-driver-mapping/pom.xml: CVE-2010-0538 guava-18.0.jar: CVE-2018-10237 jackson-mapper-asl-1.9.2.jar: CVE-2017-7525, CVE-2019-10172 libthrift-0.9.2.jar: CVE-2016-5397, CVE-2018-1320, CVE-2015-3254, CVE-2018-11798, CVE-2019-0205 netty-all-4.0.44.Final.jar: CVE-2019-16869, CVE-2019-20445, CVE-2019-20444, CVE-2020-7238 snakeyaml-1.11.jar: CVE-2017-18640 snappy-java-1.1.8.4.jar: CVE-2023-34455, CVE-2023-34454, CVE-2023-34453, CVE-2023-43642 thrift-server-0.3.7.jar: CVE-2016-5397, CVE-2015-3254, CVE-2019-0205 {noformat} 4.1 {noformat} cassandra-client-4.0.35.jar/META-INF/maven/com.apple.pie.cassandra/pie-cassandra-driver-core/pom.xml: CVE-2010-0538 cassandra-client-4.0.35.jar/META-INF/maven/com.apple.pie.cassandra/pie-cassandra-driver-mapping/pom.xml: CVE-2010-0538 guava-18.0.jar: CVE-2018-10237 jackson-mapper-asl-1.9.2.jar: CVE-2017-7525, CVE-2019-10172 libthrift-0.9.2.jar: CVE-2016-5397, CVE-2018-1320, CVE-2015-3254, CVE-2018-11798, CVE-2019-0205 netty-all-4.0.44.Final.jar: CVE-2019-16869, CVE-2019-20445, CVE-2019-20444, CVE-2020-7238 snakeyaml-1.11.jar: CVE-2017-18640 snappy-java-1.1.8.4.jar: CVE-2023-34455, CVE-2023-34454, CVE-2023-34453, CVE-2023-43642 thrift-server-0.3.7.jar: CVE-2016-5397, CVE-2015-3254, CVE-2019-0205 {noformat} 5.0 {noformat} guava-18.0.jar: CVE-2020-8908, CVE-2018-10237, CVE-2023-2976 guava-27.0-jre.jar: CVE-2020-8908, CVE-2023-2976 jackson-mapper-asl-1.9.2.jar: CVE-2017-7525, CVE-2019-10172 libthrift-0.9.2.jar: CVE-2016-5397, CVE-2018-1320, CVE-2015-3254, CVE-2018-11798, CVE-2019-0205 netty-all-4.0.44.Final.jar: CVE-2021-43797, CVE-2019-16869, CVE-2021-37136, CVE-2021-37137, CVE-2019-20445, CVE-2019-20444, CVE-2021-21295, CVE-2023-34462, CVE-2021-21290, CVE-2022-24823, CVE-2022-41881, CVE-2021-21409, CVE-2020-7238 netty-all-4.1.58.Final.jar: CVE-2021-43797, CVE-2021-37136, CVE-2021-37137, CVE-2022-24823, CVE-2022-41881, CVE-2021-21295, CVE-2021-21409, CVE-2023-34462, CVE-2021-21290 snakeyaml-1.11.jar: CVE-2017-18640 snappy-java-1.1.8.4.jar: CVE-2023-34455, CVE-2023-34454, CVE-2023-34453, CVE-2023-43642 thrift-server-0.3.7.jar: CVE-2016-5397, CVE-2015-3254, CVE-2019-0205 {noformat} trunk {noformat} cassandra-client-4.0.35.jar/META-INF/maven/com.apple.pie.cassandra/pie-cassandra-driver-core/pom.xml: CVE-2010-0538 cassandra-client-4.0.35.jar/META-INF/maven/com.apple.pie.cassandra/pie-cassandra-driver-mapping/pom.xml: CVE-2010-0538 guava-18.0.jar: CVE-2020-8908, CVE-2018-10237, CVE-2023-2976 guava-27.0-jre.jar: CVE-2020-8908, CVE-2023-2976 jackson-databind-2.13.2.2.jar: CVE-2022-42003, CVE-2022-42004 jackson-mapper-asl-1.9.2.jar: CVE-2017-7525, CVE-2019-10172 libthrift-0.9.2.jar: CVE-2016-5397, CVE-2018-1320, CVE-2015-3254, CVE-2018-11798, CVE-2019-0205 netty-all-4.0.44.Final.jar: CVE-2021-43797, CVE-2019-16869, CVE-2021-37136, CVE-2021-37137, CVE-2019-20445, CVE-2019-20444, CVE-2021-21295, CVE-2023-34462, CVE-2021-21290, CVE-2022-24823, CVE-2022-41881, CVE-2021-21409, CVE-2020-7238 netty-all-4.1.58.Final.jar: CVE-2021-43797, CVE-2021-37136, CVE-2021-37137, CVE-2022-24823, CVE-2022-41881, CVE-2021-21295, CVE-2021-21409, CVE-2023-34462, CVE-2021-21290 snakeyaml-1.11.jar: CVE-2017-18640, CVE-2022-38752, CVE-2022-38751, CVE-2022-38750, CVE-2022-41854, CVE-2022-25857, CVE-2022-38749, CVE-2022-1471 snakeyaml-1.26.jar: CVE-2022-38752, CVE-2022-38751, CVE-2022-38750, CVE-2022-41854, CVE-2022-25857, CVE-2022-38749, CVE-2022-1471 snappy-java-1.1.8.4.jar: CVE-2023-34455,
[jira] [Commented] (CASSANDRA-19484) Add support for providing nvdDatafeedUrl to OWASP
[ https://issues.apache.org/jira/browse/CASSANDRA-19484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829631#comment-17829631 ] Ariel Weisberg commented on CASSANDRA-19484: PEBKAC, I didn't know you need a clean build before running the dependency check. > Add support for providing nvdDatafeedUrl to OWASP > - > > Key: CASSANDRA-19484 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19484 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > This allows you to point to a mirror that is faster and doesn’t require an > API key. > This is kind of painful to make work in {{ant}} because you can't specify the > property at all if you want to use the API and I couldn't find a way to get > {{ant}} to conditionally supply the property without having a dedicated > invocation of the {{dependency-check}} task with/without the parameter > {{nvdDataFeedUrl}} specified. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19484) Add support for providing nvdDatafeedUrl to OWASP
[ https://issues.apache.org/jira/browse/CASSANDRA-19484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829629#comment-17829629 ] Ariel Weisberg commented on CASSANDRA-19484: 3.0 {noformat} cassandra-client-4.0.35.jar/META-INF/maven/com.apple.pie.cassandra/pie-cassandra-driver-core/pom.xml: CVE-2010-0538 cassandra-client-4.0.35.jar/META-INF/maven/com.apple.pie.cassandra/pie-cassandra-driver-mapping/pom.xml: CVE-2010-0538 jackson-databind-2.13.2.2.jar: CVE-2023-35116, CVE-2022-42003, CVE-2022-42004 snappy-java-1.1.8.4.jar: CVE-2023-34455, CVE-2023-34454, CVE-2023-34453, CVE-2023-43642 {noformat} 3.11 {noformat} cassandra-client-4.0.35.jar/META-INF/maven/com.apple.pie.cassandra/pie-cassandra-driver-core/pom.xml: CVE-2010-0538 cassandra-client-4.0.35.jar/META-INF/maven/com.apple.pie.cassandra/pie-cassandra-driver-mapping/pom.xml: CVE-2010-0538 jackson-mapper-asl-1.9.2.jar: CVE-2017-7525, CVE-2019-10172 snakeyaml-1.11.jar: CVE-2017-18640 snappy-java-1.1.8.4.jar: CVE-2023-34455, CVE-2023-34454, CVE-2023-34453, CVE-2023-43642 {noformat} 4.0 {noformat} cassandra-client-4.0.35.jar/META-INF/maven/com.apple.pie.cassandra/pie-cassandra-driver-core/pom.xml: CVE-2010-0538 cassandra-client-4.0.35.jar/META-INF/maven/com.apple.pie.cassandra/pie-cassandra-driver-mapping/pom.xml: CVE-2010-0538 guava-18.0.jar: CVE-2018-10237 jackson-mapper-asl-1.9.2.jar: CVE-2017-7525, CVE-2019-10172 libthrift-0.9.2.jar: CVE-2016-5397, CVE-2018-1320, CVE-2015-3254, CVE-2018-11798, CVE-2019-0205 netty-all-4.0.44.Final.jar: CVE-2019-16869, CVE-2019-20445, CVE-2019-20444, CVE-2020-7238 snakeyaml-1.11.jar: CVE-2017-18640 snappy-java-1.1.8.4.jar: CVE-2023-34455, CVE-2023-34454, CVE-2023-34453, CVE-2023-43642 thrift-server-0.3.7.jar: CVE-2016-5397, CVE-2015-3254, CVE-2019-0205 {noformat} 4.1 {noformat} cassandra-client-4.0.35.jar/META-INF/maven/com.apple.pie.cassandra/pie-cassandra-driver-core/pom.xml: CVE-2010-0538 cassandra-client-4.0.35.jar/META-INF/maven/com.apple.pie.cassandra/pie-cassandra-driver-mapping/pom.xml: CVE-2010-0538 guava-18.0.jar: CVE-2018-10237 jackson-mapper-asl-1.9.2.jar: CVE-2017-7525, CVE-2019-10172 libthrift-0.9.2.jar: CVE-2016-5397, CVE-2018-1320, CVE-2015-3254, CVE-2018-11798, CVE-2019-0205 netty-all-4.0.44.Final.jar: CVE-2019-16869, CVE-2019-20445, CVE-2019-20444, CVE-2020-7238 snakeyaml-1.11.jar: CVE-2017-18640 snappy-java-1.1.8.4.jar: CVE-2023-34455, CVE-2023-34454, CVE-2023-34453, CVE-2023-43642 thrift-server-0.3.7.jar: CVE-2016-5397, CVE-2015-3254, CVE-2019-0205 {noformat} 5.0 {noformat} guava-18.0.jar: CVE-2020-8908, CVE-2018-10237, CVE-2023-2976 guava-27.0-jre.jar: CVE-2020-8908, CVE-2023-2976 jackson-mapper-asl-1.9.2.jar: CVE-2017-7525, CVE-2019-10172 libthrift-0.9.2.jar: CVE-2016-5397, CVE-2018-1320, CVE-2015-3254, CVE-2018-11798, CVE-2019-0205 netty-all-4.0.44.Final.jar: CVE-2021-43797, CVE-2019-16869, CVE-2021-37136, CVE-2021-37137, CVE-2019-20445, CVE-2019-20444, CVE-2021-21295, CVE-2023-34462, CVE-2021-21290, CVE-2022-24823, CVE-2022-41881, CVE-2021-21409, CVE-2020-7238 netty-all-4.1.58.Final.jar: CVE-2021-43797, CVE-2021-37136, CVE-2021-37137, CVE-2022-24823, CVE-2022-41881, CVE-2021-21295, CVE-2021-21409, CVE-2023-34462, CVE-2021-21290 snakeyaml-1.11.jar: CVE-2017-18640 snappy-java-1.1.8.4.jar: CVE-2023-34455, CVE-2023-34454, CVE-2023-34453, CVE-2023-43642 thrift-server-0.3.7.jar: CVE-2016-5397, CVE-2015-3254, CVE-2019-0205 {noformat} trunk {noformat} cassandra-client-4.0.35.jar/META-INF/maven/com.apple.pie.cassandra/pie-cassandra-driver-core/pom.xml: CVE-2010-0538 cassandra-client-4.0.35.jar/META-INF/maven/com.apple.pie.cassandra/pie-cassandra-driver-mapping/pom.xml: CVE-2010-0538 guava-18.0.jar: CVE-2020-8908, CVE-2018-10237, CVE-2023-2976 guava-27.0-jre.jar: CVE-2020-8908, CVE-2023-2976 jackson-databind-2.13.2.2.jar: CVE-2022-42003, CVE-2022-42004 jackson-mapper-asl-1.9.2.jar: CVE-2017-7525, CVE-2019-10172 libthrift-0.9.2.jar: CVE-2016-5397, CVE-2018-1320, CVE-2015-3254, CVE-2018-11798, CVE-2019-0205 netty-all-4.0.44.Final.jar: CVE-2021-43797, CVE-2019-16869, CVE-2021-37136, CVE-2021-37137, CVE-2019-20445, CVE-2019-20444, CVE-2021-21295, CVE-2023-34462, CVE-2021-21290, CVE-2022-24823, CVE-2022-41881, CVE-2021-21409, CVE-2020-7238 netty-all-4.1.58.Final.jar: CVE-2021-43797, CVE-2021-37136, CVE-2021-37137, CVE-2022-24823, CVE-2022-41881, CVE-2021-21295, CVE-2021-21409, CVE-2023-34462, CVE-2021-21290 snakeyaml-1.11.jar: CVE-2017-18640, CVE-2022-38752, CVE-2022-38751, CVE-2022-38750, CVE-2022-41854, CVE-2022-25857, CVE-2022-38749, CVE-2022-1471 snakeyaml-1.26.jar: CVE-2022-38752, CVE-2022-38751, CVE-2022-38750, CVE-2022-41854, CVE-2022-25857, CVE-2022-38749, CVE-2022-1471 snappy-java-1.1.8.4.jar: CVE-2023-34455, CVE-2023-34454, CVE-2023-34453, CVE-2023-43642 thrift-server-0.3.7.jar: CVE-2016-5397, CVE-2015-3254, CVE-2019-0205 {noformat} > Add support for providing nvdDatafeedUrl to OWASP >
[jira] [Commented] (CASSANDRA-19484) Add support for providing nvdDatafeedUrl to OWASP
[ https://issues.apache.org/jira/browse/CASSANDRA-19484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829617#comment-17829617 ] Ariel Weisberg commented on CASSANDRA-19484: [~bereng] finished updating. If you are still +1 on the new version I will merge. I noticed there are lot of unsuppressed CVEs. {code:java} guava-18.0.jar: CVE-2020-8908, CVE-2018-10237, CVE-2023-2976 guava-27.0-jre.jar: CVE-2020-8908, CVE-2023-2976 jackson-databind-2.13.2.2.jar: CVE-2022-42003, CVE-2022-42004 jackson-mapper-asl-1.9.2.jar: CVE-2017-7525, CVE-2019-10172 libthrift-0.9.2.jar: CVE-2016-5397, CVE-2018-1320, CVE-2015-3254, CVE-2018-11798, CVE-2019-0205 netty-all-4.0.44.Final.jar: CVE-2021-43797, CVE-2019-16869, CVE-2021-37136, CVE-2021-37137, CVE-2019-20445, CVE-2019-20444, CVE-2021-21295, CVE-2023-34462, CVE-2021-21290, CVE-2022-24823, CVE-2022-41881, CVE-2021-21409, CVE-2020-7238 netty-all-4.1.58.Final.jar: CVE-2021-43797, CVE-2021-37136, CVE-2021-37137, CVE-2022-24823, CVE-2022-41881, CVE-2021-21295, CVE-2021-21409, CVE-2023-34462, CVE-2021-21290 snakeyaml-1.11.jar: CVE-2017-18640, CVE-2022-38752, CVE-2022-38751, CVE-2022-38750, CVE-2022-41854, CVE-2022-25857, CVE-2022-38749, CVE-2022-1471 snakeyaml-1.26.jar: CVE-2022-38752, CVE-2022-38751, CVE-2022-38750, CVE-2022-41854, CVE-2022-25857, CVE-2022-38749, CVE-2022-1471 thrift-server-0.3.7.jar: CVE-2016-5397, CVE-2015-3254, CVE-2019-0205 {code} > Add support for providing nvdDatafeedUrl to OWASP > - > > Key: CASSANDRA-19484 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19484 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > This allows you to point to a mirror that is faster and doesn’t require an > API key. > This is kind of painful to make work in {{ant}} because you can't specify the > property at all if you want to use the API and I couldn't find a way to get > {{ant}} to conditionally supply the property without having a dedicated > invocation of the {{dependency-check}} task with/without the parameter > {{nvdDataFeedUrl}} specified. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19484) Add support for providing nvdDatafeedUrl to OWASP
[ https://issues.apache.org/jira/browse/CASSANDRA-19484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829608#comment-17829608 ] Ariel Weisberg commented on CASSANDRA-19484: Ah, I tried that and it didn't work because in `ant` when you reference a property that isn't set it doesn't resolve to the empty string it resolves to the name of the property! Will update. > Add support for providing nvdDatafeedUrl to OWASP > - > > Key: CASSANDRA-19484 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19484 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > This allows you to point to a mirror that is faster and doesn’t require an > API key. > This is kind of painful to make work in {{ant}} because you can't specify the > property at all if you want to use the API and I couldn't find a way to get > {{ant}} to conditionally supply the property without having a dedicated > invocation of the {{dependency-check}} task with/without the parameter > {{nvdDataFeedUrl}} specified. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19484) Add support for providing nvdDatafeedUrl to OWASP
[ https://issues.apache.org/jira/browse/CASSANDRA-19484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829460#comment-17829460 ] Ariel Weisberg commented on CASSANDRA-19484: Setting a property has no effect. At least none I can find. It has to be an attribute on the task invocation. > Add support for providing nvdDatafeedUrl to OWASP > - > > Key: CASSANDRA-19484 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19484 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > This allows you to point to a mirror that is faster and doesn’t require an > API key. > This is kind of painful to make work in {{ant}} because you can't specify the > property at all if you want to use the API and I couldn't find a way to get > {{ant}} to conditionally supply the property without having a dedicated > invocation of the {{dependency-check}} task with/without the parameter > {{nvdDataFeedUrl}} specified. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19484) Add support for providing nvdDatafeedUrl to OWASP
[ https://issues.apache.org/jira/browse/CASSANDRA-19484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19484: --- Source Control Link: (was: https://github.com/apache/cassandra/pull/3189, https://github.com/apache/cassandra/pull/3187) > Add support for providing nvdDatafeedUrl to OWASP > - > > Key: CASSANDRA-19484 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19484 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > This allows you to point to a mirror that is faster and doesn’t require an > API key. > This is kind of painful to make work in {{ant}} because you can't specify the > property at all if you want to use the API and I couldn't find a way to get > {{ant}} to conditionally supply the property without having a dedicated > invocation of the {{dependency-check}} task with/without the parameter > {{nvdDataFeedUrl}} specified. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19484) Add support for providing nvdDatafeedUrl to OWASP
[ https://issues.apache.org/jira/browse/CASSANDRA-19484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19484: --- Source Control Link: https://github.com/apache/cassandra/pull/3189, https://github.com/apache/cassandra/pull/3187 (was: https://github.com/apache/cassandra/pull/3189) > Add support for providing nvdDatafeedUrl to OWASP > - > > Key: CASSANDRA-19484 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19484 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > This allows you to point to a mirror that is faster and doesn’t require an > API key. > This is kind of painful to make work in {{ant}} because you can't specify the > property at all if you want to use the API and I couldn't find a way to get > {{ant}} to conditionally supply the property without having a dedicated > invocation of the {{dependency-check}} task with/without the parameter > {{nvdDataFeedUrl}} specified. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19484) Add support for providing nvdDatafeedUrl to OWASP
[ https://issues.apache.org/jira/browse/CASSANDRA-19484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19484: --- Reviewers: Joshua McKenzie Source Control Link: https://github.com/apache/cassandra/pull/3189 > Add support for providing nvdDatafeedUrl to OWASP > - > > Key: CASSANDRA-19484 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19484 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > This allows you to point to a mirror that is faster and doesn’t require an > API key. > This is kind of painful to make work in {{ant}} because you can't specify the > property at all if you want to use the API and I couldn't find a way to get > {{ant}} to conditionally supply the property without having a dedicated > invocation of the {{dependency-check}} task with/without the parameter > {{nvdDataFeedUrl}} specified. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19484) Add support for providing nvdDatafeedUrl to OWASP
[ https://issues.apache.org/jira/browse/CASSANDRA-19484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19484: --- Test and Documentation Plan: Run `ant dependency-check` on each branch Status: Patch Available (was: Open) > Add support for providing nvdDatafeedUrl to OWASP > - > > Key: CASSANDRA-19484 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19484 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > This allows you to point to a mirror that is faster and doesn’t require an > API key. > This is kind of painful to make work in {{ant}} because you can't specify the > property at all if you want to use the API and I couldn't find a way to get > {{ant}} to conditionally supply the property without having a dedicated > invocation of the {{dependency-check}} task with/without the parameter > {{nvdDataFeedUrl}} specified. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19484) Add support for providing nvdDatafeedUrl to OWASP
[ https://issues.apache.org/jira/browse/CASSANDRA-19484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829318#comment-17829318 ] Ariel Weisberg commented on CASSANDRA-19484: [~jlewandowski] [~brandon.williams] FYI > Add support for providing nvdDatafeedUrl to OWASP > - > > Key: CASSANDRA-19484 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19484 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > This allows you to point to a mirror that is faster and doesn’t require an > API key. > This is kind of painful to make work in {{ant}} because you can't specify the > property at all if you want to use the API and I couldn't find a way to get > {{ant}} to conditionally supply the property without having a dedicated > invocation of the {{dependency-check}} task with/without the parameter > {{nvdDataFeedUrl}} specified. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19484) Add support for providing nvdDatafeedUrl to OWASP
[ https://issues.apache.org/jira/browse/CASSANDRA-19484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19484: --- Description: This allows you to point to a mirror that is faster and doesn’t require an API key. This is kind of painful to make work in {{ant}} because you can't specify the property at all if you want to use the API and I couldn't find a way to get {{ant}} to conditionally supply the property without having a dedicated invocation of the {{dependency-check}} task with/without the parameter {{nvdDataFeedUrl}} specified. was: This allows you to point to a mirror that is faster and doesn’t require an API key. This is kind of painful to make work in `ant` because you can't specify the property at all if you want to use the API and I couldn't find a way to get `ant` to conditionally supply the property without having a dedicated invocation of the `dependency-check` task with/without the parameter {noformat} nvdDataFeedUrl {noformat} specified. > Add support for providing nvdDatafeedUrl to OWASP > - > > Key: CASSANDRA-19484 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19484 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > This allows you to point to a mirror that is faster and doesn’t require an > API key. > This is kind of painful to make work in {{ant}} because you can't specify the > property at all if you want to use the API and I couldn't find a way to get > {{ant}} to conditionally supply the property without having a dedicated > invocation of the {{dependency-check}} task with/without the parameter > {{nvdDataFeedUrl}} specified. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19484) Add support for providing nvdDatafeedUrl to OWASP
[ https://issues.apache.org/jira/browse/CASSANDRA-19484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19484: --- Change Category: Quality Assurance Complexity: Low Hanging Fruit Fix Version/s: 3.0.x 3.11.x 4.0.x 4.1.x 5.0.x 5.x Status: Open (was: Triage Needed) > Add support for providing nvdDatafeedUrl to OWASP > - > > Key: CASSANDRA-19484 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19484 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > This allows you to point to a mirror that is faster and doesn’t require an > API key. > This is kind of painful to make work in `ant` because you can't specify the > property at all if you want to use the API and I couldn't find a way to get > `ant` to conditionally supply the property without having a dedicated > invocation of the `dependency-check` task with/without the parameter > {noformat} > nvdDataFeedUrl > {noformat} > specified. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19484) Add support for providing nvdDatafeedUrl to OWASP
Ariel Weisberg created CASSANDRA-19484: -- Summary: Add support for providing nvdDatafeedUrl to OWASP Key: CASSANDRA-19484 URL: https://issues.apache.org/jira/browse/CASSANDRA-19484 Project: Cassandra Issue Type: Improvement Components: Build Reporter: Ariel Weisberg Assignee: Ariel Weisberg This allows you to point to a mirror that is faster and doesn’t require an API key. This is kind of painful to make work in `ant` because you can't specify the property at all if you want to use the API and I couldn't find a way to get `ant` to conditionally supply the property without having a dedicated invocation of the `dependency-check` task with/without the parameter {noformat} nvdDataFeedUrl {noformat} specified. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19444) AccordRepairJob should be async like CassandraRepairJob
Ariel Weisberg created CASSANDRA-19444: -- Summary: AccordRepairJob should be async like CassandraRepairJob Key: CASSANDRA-19444 URL: https://issues.apache.org/jira/browse/CASSANDRA-19444 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg The thread that manages repairs needs to be available and not block. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19430) Read repair through Accord needs to only route the read repair through Accord if the range is actually migrated/running on Accord
[ https://issues.apache.org/jira/browse/CASSANDRA-19430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820838#comment-17820838 ] Ariel Weisberg commented on CASSANDRA-19430: This is narrowly scoped to just attempting to send it to the right place. We will also need to address the fact that we could send it to the wrong system and need to retry. BRR is per partition so we at least don't need to break up mutations. > Read repair through Accord needs to only route the read repair through Accord > if the range is actually migrated/running on Accord > - > > Key: CASSANDRA-19430 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19430 > Project: Cassandra > Issue Type: Bug >Reporter: Ariel Weisberg >Priority: Normal > > This is because the read repair will simply fail if Accord doesn't manage > that range. Not only does it need to be routed through Accord but if it races > with topology change it needs to retry and not surface an error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19436) When transitioning to Accord migration it's not safe to read immediately using Accord due to concurrent non-serial writes
[ https://issues.apache.org/jira/browse/CASSANDRA-19436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820837#comment-17820837 ] Ariel Weisberg commented on CASSANDRA-19436: Also need to consider materialized views in the future which make additional mutations downstream. > When transitioning to Accord migration it's not safe to read immediately > using Accord due to concurrent non-serial writes > - > > Key: CASSANDRA-19436 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19436 > Project: Cassandra > Issue Type: Bug >Reporter: Ariel Weisberg >Priority: Normal > > Concurrent writes at the same time that migration starts make it unsafe to > read from Accord because txn recovery will not be deterministic in the > presences of writes not done through Accord. > Adding key migration to non-serial writes could solve this by causing writes > not going through Accord to be rejected at nodes where key migration already > occurred. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19441) Blocking read repair needs to handle racing with Accord topology changes
Ariel Weisberg created CASSANDRA-19441: -- Summary: Blocking read repair needs to handle racing with Accord topology changes Key: CASSANDRA-19441 URL: https://issues.apache.org/jira/browse/CASSANDRA-19441 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg Similar to other forms of writes it's possible for the read repair to end up on the wrong system and it should be rejected if necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19431) Mutations need to split Accord/non-Accord mutations based on whether migration is completed
[ https://issues.apache.org/jira/browse/CASSANDRA-19431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820827#comment-17820827 ] Ariel Weisberg edited comment on CASSANDRA-19431 at 2/26/24 7:34 PM: - This one is narrowly scoped to just splitting and routing the mutations which can be done today without the linked issues, but still won't be correct until it can also detect mutations sent to the wrong place and then retry them without generating an error. was (Author: aweisberg): This one is narrowly scoped to just splitting and routing the mutations which can be done today without the others, but still won't be correct until it can also detect mutations sent to the wrong place and then retry them without generating an error. > Mutations need to split Accord/non-Accord mutations based on whether > migration is completed > --- > > Key: CASSANDRA-19431 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19431 > Project: Cassandra > Issue Type: Bug >Reporter: Ariel Weisberg >Priority: Normal > > If we don't do this then requests will fail if they span Accord and > non-Accord keys and tables. This breaks unlogged batches for example. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19431) Mutations need to split Accord/non-Accord mutations based on whether migration is completed
[ https://issues.apache.org/jira/browse/CASSANDRA-19431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820827#comment-17820827 ] Ariel Weisberg commented on CASSANDRA-19431: This one is narrowly scoped to just splitting and routing the mutations which can be done today without the others, but still won't be correct until it can also detect mutations sent to the wrong place and then retry them without generating an error. > Mutations need to split Accord/non-Accord mutations based on whether > migration is completed > --- > > Key: CASSANDRA-19431 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19431 > Project: Cassandra > Issue Type: Bug >Reporter: Ariel Weisberg >Priority: Normal > > If we don't do this then requests will fail if they span Accord and > non-Accord keys and tables. This breaks unlogged batches for example. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19436) When transitioning to Accord migration it's not safe to read immediately using Accord due to concurrent non-serial writes
[ https://issues.apache.org/jira/browse/CASSANDRA-19436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820826#comment-17820826 ] Ariel Weisberg commented on CASSANDRA-19436: This issue covers detecting and generating the appropriate refusal to accept a read/write on the wrong system. Accord and Paxos already do this detection, but non-SERIAL operations do not. The linked issues cover this, but also the loop in read/write that receives the error and automatically retries instead of failing the query. > When transitioning to Accord migration it's not safe to read immediately > using Accord due to concurrent non-serial writes > - > > Key: CASSANDRA-19436 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19436 > Project: Cassandra > Issue Type: Bug >Reporter: Ariel Weisberg >Priority: Normal > > Concurrent writes at the same time that migration starts make it unsafe to > read from Accord because txn recovery will not be deterministic in the > presences of writes not done through Accord. > Adding key migration to non-serial writes could solve this by causing writes > not going through Accord to be rejected at nodes where key migration already > occurred. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19440) Non-serial writes can race with Accord topology changes
Ariel Weisberg created CASSANDRA-19440: -- Summary: Non-serial writes can race with Accord topology changes Key: CASSANDRA-19440 URL: https://issues.apache.org/jira/browse/CASSANDRA-19440 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg Accord and Paxos handle these, but non-SERIAL writes don't check for this condition and can't retry the portions of the write that failed on the correct system until the entire thing succeeds. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19439) Non-serial reads need to handle racing with Accord topology changes
Ariel Weisberg created CASSANDRA-19439: -- Summary: Non-serial reads need to handle racing with Accord topology changes Key: CASSANDRA-19439 URL: https://issues.apache.org/jira/browse/CASSANDRA-19439 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg A key or range read could end up being sent to Accord when it's not managed by Accord and we might not find out until the execution epoch is known. In reality I think this already throws an exception in Accord for a key we just need to propagate and handle the exception and retry with the new topology until we can complete the read. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19438) Accord barriers need to handle racing with topology changes
Ariel Weisberg created CASSANDRA-19438: -- Summary: Accord barriers need to handle racing with topology changes Key: CASSANDRA-19438 URL: https://issues.apache.org/jira/browse/CASSANDRA-19438 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg Topology changes can result in the ranges sent to Accord including things not managed by Accord. It might be sufficient to have the range barriers automatically remove the unsupported subranges since that might be sufficient for the caller. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19437) Non-serial reads/range reads need to be done through Accord for Accord to support async apply/commit
Ariel Weisberg created CASSANDRA-19437: -- Summary: Non-serial reads/range reads need to be done through Accord for Accord to support async apply/commit Key: CASSANDRA-19437 URL: https://issues.apache.org/jira/browse/CASSANDRA-19437 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg Currently they haven't been implemented. We have a path forward for it using ephemeral reads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19436) When transitioning to Accord migration it's not safe to read immediately using Accord due to concurrent non-serial writes
Ariel Weisberg created CASSANDRA-19436: -- Summary: When transitioning to Accord migration it's not safe to read immediately using Accord due to concurrent non-serial writes Key: CASSANDRA-19436 URL: https://issues.apache.org/jira/browse/CASSANDRA-19436 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg Concurrent writes at the same time that migration starts make it unsafe to read from Accord because txn recovery will not be deterministic in the presences of writes not done through Accord. Adding key migration to non-serial writes could solve this by causing writes not going through Accord to be rejected at nodes where key migration already occurred. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19435) Hint delivery doesn't write through Accord
Ariel Weisberg created CASSANDRA-19435: -- Summary: Hint delivery doesn't write through Accord Key: CASSANDRA-19435 URL: https://issues.apache.org/jira/browse/CASSANDRA-19435 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg Hint delivery doesn't write through Accord which would make txn recovery non-deterministic. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19434) Batch log doesn't write through Accord during Accord migration
Ariel Weisberg created CASSANDRA-19434: -- Summary: Batch log doesn't write through Accord during Accord migration Key: CASSANDRA-19434 URL: https://issues.apache.org/jira/browse/CASSANDRA-19434 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg This can result in writes not through Accord occurring which makes txn recovery non-deterministic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19433) Nodetool cleanup can drop data Accord might need
Ariel Weisberg created CASSANDRA-19433: -- Summary: Nodetool cleanup can drop data Accord might need Key: CASSANDRA-19433 URL: https://issues.apache.org/jira/browse/CASSANDRA-19433 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg Nodetool cleanup can theoretically drop data that Accord still needs. I don't think cleanup even waits for streaming to finish. Accord in general doesn't have a strategy for dropping data after topology changes right now. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19432) Accord & Paxos migration doesn't enforce lower bound on generated timestamps
Ariel Weisberg created CASSANDRA-19432: -- Summary: Accord & Paxos migration doesn't enforce lower bound on generated timestamps Key: CASSANDRA-19432 URL: https://issues.apache.org/jira/browse/CASSANDRA-19432 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg When migrating between the two a coordinator with bad clock sync could write data that has already been tombstones for example. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19431) Mutations need to split Accord/non-Accord mutations based on whether migration is completed
Ariel Weisberg created CASSANDRA-19431: -- Summary: Mutations need to split Accord/non-Accord mutations based on whether migration is completed Key: CASSANDRA-19431 URL: https://issues.apache.org/jira/browse/CASSANDRA-19431 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg If we don't do this then requests will fail if they span Accord and non-Accord keys and tables. This breaks unlogged batches for example. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19430) Read repair through Accord needs to only route the read repair through Accord if the range is actually migrated/running on Accord
Ariel Weisberg created CASSANDRA-19430: -- Summary: Read repair through Accord needs to only route the read repair through Accord if the range is actually migrated/running on Accord Key: CASSANDRA-19430 URL: https://issues.apache.org/jira/browse/CASSANDRA-19430 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg This is because the read repair will simply fail if Accord doesn't manage that range. Not only does it need to be routed through Accord but if it races with topology change it needs to retry and not surface an error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19381) StreamingTombstoneHistogramBuilder.DataHolder does not merge histogram points correctly on overflow
[ https://issues.apache.org/jira/browse/CASSANDRA-19381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19381: --- Source Control Link: https://github.com/apache/cassandra/pull/3136 (was: https://github.com/apache/cassandra/pull/3094) > StreamingTombstoneHistogramBuilder.DataHolder does not merge histogram points > correctly on overflow > --- > > Key: CASSANDRA-19381 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19381 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.0, 5.1 > > > The algorithm tries to merge the two nearest points in the histogram and > create a new point that is in between the two merged points based on the > weight of each point. This can overflow long arithmetic with the code that is > currently there and the work around is pick one of the points and just put > it there. > This can be worked around by changing the midpoint calculation to not > overflow. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19381) StreamingTombstoneHistogramBuilder.DataHolder does not merge histogram points correctly on overflow
[ https://issues.apache.org/jira/browse/CASSANDRA-19381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815861#comment-17815861 ] Ariel Weisberg edited comment on CASSANDRA-19381 at 2/26/24 4:14 PM: - [https://github.com/apache/cassandra/pull/3136] Still need to run the tests was (Author: aweisberg): [https://github.com/apache/cassandra/pull/3094] Currently running tests > StreamingTombstoneHistogramBuilder.DataHolder does not merge histogram points > correctly on overflow > --- > > Key: CASSANDRA-19381 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19381 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.0, 5.1 > > > The algorithm tries to merge the two nearest points in the histogram and > create a new point that is in between the two merged points based on the > weight of each point. This can overflow long arithmetic with the code that is > currently there and the work around is pick one of the points and just put > it there. > This can be worked around by changing the midpoint calculation to not > overflow. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19381) StreamingTombstoneHistogramBuilder.DataHolder does not merge histogram points correctly on overflow
[ https://issues.apache.org/jira/browse/CASSANDRA-19381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820764#comment-17820764 ] Ariel Weisberg commented on CASSANDRA-19381: Sorry about that, the branch name was mis-spelled and then I never recreated the PR and finished the tests. > StreamingTombstoneHistogramBuilder.DataHolder does not merge histogram points > correctly on overflow > --- > > Key: CASSANDRA-19381 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19381 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.0, 5.1 > > > The algorithm tries to merge the two nearest points in the histogram and > create a new point that is in between the two merged points based on the > weight of each point. This can overflow long arithmetic with the code that is > currently there and the work around is pick one of the points and just put > it there. > This can be worked around by changing the midpoint calculation to not > overflow. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19381) StreamingTombstoneHistogramBuilder.DataHolder does not merge histogram points correctly on overflow
[ https://issues.apache.org/jira/browse/CASSANDRA-19381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19381: --- Status: In Progress (was: Patch Available) > StreamingTombstoneHistogramBuilder.DataHolder does not merge histogram points > correctly on overflow > --- > > Key: CASSANDRA-19381 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19381 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.0, 5.1 > > > The algorithm tries to merge the two nearest points in the histogram and > create a new point that is in between the two merged points based on the > weight of each point. This can overflow long arithmetic with the code that is > currently there and the work around is pick one of the points and just put > it there. > This can be worked around by changing the midpoint calculation to not > overflow. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19419) Non-transactional schema updates can interfere with Accord transaction execuion
Ariel Weisberg created CASSANDRA-19419: -- Summary: Non-transactional schema updates can interfere with Accord transaction execuion Key: CASSANDRA-19419 URL: https://issues.apache.org/jira/browse/CASSANDRA-19419 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg While Accord can handle topology changes correctly it can’t handle non-transaction schema updates because those execute outside of Accord. When Accord tries to execute a transaction against the schema in the epoch the transaction is supposed to execute in then it is possible for different nodes to see different schemas when reading or writing data as part of a transaction. Dropping a needed a column or table is the most likely issue as we don't support altering column types. Because commit is async it is possible for a table or to be dropped before the writes can be propagated after it was acknowledged instead of signaling an error. While the table was dropped it's possible the client needed the error to know that the request was processed improperly or that it needed to take some other action client side. Or add table where the original coordinator can't read the table, but the recovery coordinator can and might apply different results to different replicas. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19332) Dropwizard Meter causes timeouts when infrequently used
[ https://issues.apache.org/jira/browse/CASSANDRA-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19332: --- Reviewers: Maxim Muzafarov > Dropwizard Meter causes timeouts when infrequently used > --- > > Key: CASSANDRA-19332 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19332 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 4.0.x, 4.1.x > > > Observed instances of timeouts on clusters with long uptime and infrequently > used tables and possibly just request paths such as not using CAS for large > fractions of a year. > CAS seems to be more severely impacted because it has more metrics in the > request path such as latency measurements for prepare, propose, and the read > from the underlying table. > Tracing showed ~600-800 milliseconds for these operations in between the > “appending to memtable” and “sending a response” events. Reads had a delay > between finishing the construction of the iterator and sending the read > response. > Stack traces dumped every 100 milliseconds using {{sjk}} shows that in > prepare and propose a lot of time was being spent in > {{{}Meter.tickIfNecessary{}}}. > {code:java} > Thread [2537] RUNNABLE at 2024-01-25T21:14:48.218 - MutationStage-2 > com.codahale.metrics.Meter.tickIfNecessary(Meter.java:71) > com.codahale.metrics.Meter.mark(Meter.java:55) > com.codahale.metrics.Meter.mark(Meter.java:46) > com.codahale.metrics.Timer.update(Timer.java:150) > com.codahale.metrics.Timer.update(Timer.java:86) > org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:159) > org.apache.cassandra.service.paxos.PaxosState.prepare(PaxosState.java:92) > Thread [2539] RUNNABLE at 2024-01-25T21:14:48.520 - MutationStage-4 > com.codahale.metrics.Meter.tickIfNecessary(Meter.java:72) > com.codahale.metrics.Meter.mark(Meter.java:55) > com.codahale.metrics.Meter.mark(Meter.java:46) > com.codahale.metrics.Timer.update(Timer.java:150) > com.codahale.metrics.Timer.update(Timer.java:86) > org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:159) > org.apache.cassandra.service.paxos.PaxosState.propose(PaxosState.java:127){code} > {{tickIfNecessary}} does a linear amount of work proportional to the time > since the last time the metric was updated/read/created and this can actually > take a measurable amount of time even in a tight loop. On my M2 MBP it was > 1.5 milliseconds for a day, ~200 days took ~74 milliseconds. Before it warmed > up it was 140 milliseconds. > A quick fix is to schedule a task to read all the meters once a day so it > isn’t done in the request path and we have a more incremental amount to > process at a time. > Also observed that {{tickIfNecessary}} is not 100% thread safe in that if it > takes longer than 5 seconds to run the loop it can end up with multiple > threads attempting to run the loop at once and then they will concurrently > run {{EWMA.tick}} which probably results in some ticks not being performed. > This issue is still present in the latest version of {{Metrics}} if using > {{{}EWMA{}}}, but {{SlidingWindowTimeAverages}} looks like it has a bounded > amount of work required to tick. Switching would change how our metrics work > since the two don't have the same behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19381) StreamingTombstoneHistogramBuilder.DataHolder does not merge histogram points correctly on overflow
[ https://issues.apache.org/jira/browse/CASSANDRA-19381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19381: --- Summary: StreamingTombstoneHistogramBuilder.DataHolder does not merge histogram points correctly on overflow (was: StreamingTombstoneHistogramBuilder.DataHolder does not merge bins correctly on overflow) > StreamingTombstoneHistogramBuilder.DataHolder does not merge histogram points > correctly on overflow > --- > > Key: CASSANDRA-19381 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19381 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.0, 5.1 > > > The algorithm tries to merge the two nearest points in the histogram and > create a new point that is in between the two merged points based on the > weight of each point. This can overflow long arithmetic with the code that is > currently there and the work around is pick one of the points and just put > it there. > This can be worked around by changing the midpoint calculation to not > overflow. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19381) StreamingTombstoneHistogramBuilder.DataHolder does not merge bins correctly on overflow
[ https://issues.apache.org/jira/browse/CASSANDRA-19381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19381: --- Test and Documentation Plan: Added unit tests Status: Patch Available (was: Open) [https://github.com/apache/cassandra/pull/3094] Currently running tests > StreamingTombstoneHistogramBuilder.DataHolder does not merge bins correctly > on overflow > --- > > Key: CASSANDRA-19381 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19381 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.0, 5.1 > > > The algorithm tries to merge the two nearest points in the histogram and > create a new point that is in between the two merged points based on the > weight of each point. This can overflow long arithmetic with the code that is > currently there and the work around is pick one of the points and just put > it there. > This can be worked around by changing the midpoint calculation to not > overflow. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19381) StreamingTombstoneHistogramBuilder.DataHolder does not merge bins correctly on overflow
[ https://issues.apache.org/jira/browse/CASSANDRA-19381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19381: --- Source Control Link: https://github.com/apache/cassandra/pull/3094 > StreamingTombstoneHistogramBuilder.DataHolder does not merge bins correctly > on overflow > --- > > Key: CASSANDRA-19381 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19381 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.0, 5.1 > > > The algorithm tries to merge the two nearest points in the histogram and > create a new point that is in between the two merged points based on the > weight of each point. This can overflow long arithmetic with the code that is > currently there and the work around is pick one of the points and just put > it there. > This can be worked around by changing the midpoint calculation to not > overflow. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19381) StreamingTombstoneHistogramBuilder.DataHolder does not merge bins correctly on overflow
[ https://issues.apache.org/jira/browse/CASSANDRA-19381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19381: --- Bug Category: Parent values: Degradation(12984)Level 1 values: Performance Bug/Regression(12997) Complexity: Normal Discovered By: Code Inspection Fix Version/s: 5.0 5.1 Reviewers: Berenguer Blasi Severity: Normal Assignee: Ariel Weisberg Status: Open (was: Triage Needed) > StreamingTombstoneHistogramBuilder.DataHolder does not merge bins correctly > on overflow > --- > > Key: CASSANDRA-19381 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19381 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.0, 5.1 > > > The algorithm tries to merge the two nearest points in the histogram and > create a new point that is in between the two merged points based on the > weight of each point. This can overflow long arithmetic with the code that is > currently there and the work around is pick one of the points and just put > it there. > This can be worked around by changing the midpoint calculation to not > overflow. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19381) StreamingTombstoneHistogramBuilder.DataHolder does not merge bins correctly on overflow
Ariel Weisberg created CASSANDRA-19381: -- Summary: StreamingTombstoneHistogramBuilder.DataHolder does not merge bins correctly on overflow Key: CASSANDRA-19381 URL: https://issues.apache.org/jira/browse/CASSANDRA-19381 Project: Cassandra Issue Type: Bug Components: Local/SSTable Reporter: Ariel Weisberg The algorithm tries to merge the two nearest points in the histogram and create a new point that is in between the two merged points based on the weight of each point. This can overflow long arithmetic with the code that is currently there and the work around is pick one of the points and just put it there. This can be worked around by changing the midpoint calculation to not overflow. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19373) 5.0 updates FQL format, but doesn't update version or handle reading old version files
[ https://issues.apache.org/jira/browse/CASSANDRA-19373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19373: --- Resolution: Not A Bug Status: Resolved (was: Open) Turns out Chronicle wire encodes the width of the int and can read it back safely as a wider type. > 5.0 updates FQL format, but doesn't update version or handle reading old > version files > -- > > Key: CASSANDRA-19373 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19373 > Project: Cassandra > Issue Type: Bug > Components: Tool/fql >Reporter: Ariel Weisberg >Priority: Normal > Fix For: 5.0-rc, 5.x > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org