[jira] [Commented] (CASSANDRA-15389) Minimize BTree iterator allocations
[ https://issues.apache.org/jira/browse/CASSANDRA-15389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987402#comment-16987402 ] Benedict Elliott Smith commented on CASSANDRA-15389: Thanks. I'll try to find some time in the near future to undertake a full review. bq. BTreeRow I had a bug in my hasComplexDeletion re-implementation that made reverse iteration / stop condition seem unnecessary. We actually do need both for hasComplexDeletion to work properly. Otherwise we’d only detect complex deletion if it’s on the final complex column. So, I confess to not having looked closely enough to notice your bug, but (I think) nor to have been mislead by it. I may have been unclear in my suggestion, since it was very terse. The {{firstComplexIdx}} calculation is used in other places to avoid having to perform reverse iteration, since it gives the _lowest_ index in the btree in which any complex column data occurs. Since complex data sorts after simple, this gives the whole range of indices on which complex data occurss. We would need to implement a {{Row}}/{{ColumnData}} variant of the feature we have previously only used in {{Columns}}, but it should map directly, and might permit us to avoid implementing this extra functionality. Does that make sense, or am I still missing something? > Minimize BTree iterator allocations > --- > > Key: CASSANDRA-15389 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15389 > Project: Cassandra > Issue Type: Sub-task > Components: Local/Compaction >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > Allocations of BTree iterators contribute a lot amount of garbage to the > compaction and read paths. > This patch removes most btree iterator allocations on hot paths by: > • using Row#apply where appropriate on frequently called methods > (Row#digest, Row#validateData > • adding BTree accumulate method. Like the apply method, this method walks > the btree with a function that takes and returns a long argument, this > eliminates iterator allocations without adding helper object allocations > (BTreeRow#hasComplex, BTreeRow#hasInvalidDeletions, BTreeRow#dataSize, > BTreeRow#unsharedHeapSizeExcludingData, Rows#collectStats, > UnfilteredSerializer#serializedRowBodySize) as well as eliminating the > allocation of helper objects in places where apply was used previously^[1]^. > • Create map of columns in SerializationHeader, this lets us avoid > allocating a btree search iterator for each row we serialize. > These optimizations reduce garbage created during compaction by up to 13.5% > > [1] the memory test does measure memory allocated by lambdas capturing objects -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15160) Add flag to ignore unreplicated keyspaces during repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Blake Eggleston updated CASSANDRA-15160: Status: Changes Suggested (was: Review In Progress) [~dcapwell] noticed that we should be setting repaired status to completed [here|https://github.com/krummas/cassandra/commit/a2d492bf6a0f203a50162cacf68497322f4614f8#diff-4bc513a60150419e20a9449c70a64a66R250], looks good otherwise > Add flag to ignore unreplicated keyspaces during repair > --- > > Key: CASSANDRA-15160 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15160 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Repair >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > > When a repair is triggered on a node in 'dc2' for a keyspace with replication > factor {'dc1':3, 'dc2':0} we just ignore the repair in versions < 4. In 4.0 > we fail the repair to make sure the operator does not think the keyspace is > fully repaired. > There might be tooling that relies on the old behaviour though, so we should > add a flag to ignore those unreplicated keyspaces > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15160) Add flag to ignore unreplicated keyspaces during repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Blake Eggleston updated CASSANDRA-15160: Reviewers: Blake Eggleston, Blake Eggleston (was: Blake Eggleston) Blake Eggleston, Blake Eggleston (was: Blake Eggleston) Status: Review In Progress (was: Patch Available) > Add flag to ignore unreplicated keyspaces during repair > --- > > Key: CASSANDRA-15160 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15160 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Repair >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > > When a repair is triggered on a node in 'dc2' for a keyspace with replication > factor {'dc1':3, 'dc2':0} we just ignore the repair in versions < 4. In 4.0 > we fail the repair to make sure the operator does not think the keyspace is > fully repaired. > There might be tooling that relies on the old behaviour though, so we should > add a flag to ignore those unreplicated keyspaces > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15295) Running into deadlock when do CommitLog initialization
[ https://issues.apache.org/jira/browse/CASSANDRA-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Joshi updated CASSANDRA-15295: - Status: Ready to Commit (was: Review In Progress) Thanks for the review [~jrwest]. > Running into deadlock when do CommitLog initialization > -- > > Key: CASSANDRA-15295 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15295 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Zephyr Guo >Assignee: Zephyr Guo >Priority: Normal > Attachments: image.png, jstack.log, pstack.log, screenshot-1.png, > screenshot-2.png, screenshot-3.png > > > Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a > long time. > I used jstack to saw what happened. The main thread stuck in > *AbstractCommitLogSegmentManager.awaitAvailableSegment* > !screenshot-1.png! > The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it > was not actually running. > !screenshot-2.png! > And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on > java class initialization. > !screenshot-3.png! > This is a deadlock obviously. CommitLog waits for a CommitLogSegment when > initializing. In this moment, the CommitLog class is not initialized and the > main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a > CommitLogSegment with exception and call *CommitLog.handleCommitError*(static > method). COMMIT-LOG-ALLOCATOR will block on this line because CommitLog > class is still initializing. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15389) Minimize BTree iterator allocations
[ https://issues.apache.org/jira/browse/CASSANDRA-15389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987244#comment-16987244 ] Blake Eggleston commented on CASSANDRA-15389: - Just pushed up some changes addressing most of your comments. *Rows#collectStats:* the overflow checks aren't actually doing anything, since the longs are being shifted/masked to 32 bits. Force of habit when casing longs to ints :). Addressed the other comments *SerializationHeader:* fixed. Now using a single rewindable iterator, and added a check to LeafBTreeSearchIterator to check the current position before doing a binary search. *BTreeRow* I had a bug in my {{hasComplexDeletion}} re-implementation that made reverse iteration / stop condition seem unnecessary. We actually do need both for hasComplexDeletion to work properly. Otherwise we’d only detect complex deletion if it’s on the final complex column. > Minimize BTree iterator allocations > --- > > Key: CASSANDRA-15389 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15389 > Project: Cassandra > Issue Type: Sub-task > Components: Local/Compaction >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > Allocations of BTree iterators contribute a lot amount of garbage to the > compaction and read paths. > This patch removes most btree iterator allocations on hot paths by: > • using Row#apply where appropriate on frequently called methods > (Row#digest, Row#validateData > • adding BTree accumulate method. Like the apply method, this method walks > the btree with a function that takes and returns a long argument, this > eliminates iterator allocations without adding helper object allocations > (BTreeRow#hasComplex, BTreeRow#hasInvalidDeletions, BTreeRow#dataSize, > BTreeRow#unsharedHeapSizeExcludingData, Rows#collectStats, > UnfilteredSerializer#serializedRowBodySize) as well as eliminating the > allocation of helper objects in places where apply was used previously^[1]^. > • Create map of columns in SerializationHeader, this lets us avoid > allocating a btree search iterator for each row we serialize. > These optimizations reduce garbage created during compaction by up to 13.5% > > [1] the memory test does measure memory allocated by lambdas capturing objects -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15442) Read repair implicitly increases read timeout value
[ https://issues.apache.org/jira/browse/CASSANDRA-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Blake Eggleston updated CASSANDRA-15442: Reviewers: Blake Eggleston > Read repair implicitly increases read timeout value > --- > > Key: CASSANDRA-15442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15442 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > > When read repair occurs during a read, internally, it starts several > _blocking_ operations in sequence. See > {{org.apache.cassandra.service.StorageProxy#fetchRows}}. > The timeline of the blocking operations > # Regular read, wait for full data/digest read response to complete. > {{reads[*].awaitResponses();}} > # Read repair read, wait for full data read response to complete. > {{reads[*].awaitReadRepair();}} > # Read repair write, wait for write response to complete. > {{concatAndBlockOnRepair(results, repairs);}} > Step 1 and 2 each waits for the duration of read timeout, say 5 s. > Step 3 waits for the duration of write timeout, say 2 s. > In the worse case, the actual time taken for a read could accumulate to ~12 > s, if each individual step does not exceed the timeout value. > From the client perspective, it does not expect a request taken way higher > than the database configured timeout value. > Such scenario is especially bad for the clients that have set up client-side > timeout monitoring close to the configured one. The clients think the > operations timed out and abort, but they are in fact still running on server. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15295) Running into deadlock when do CommitLog initialization
[ https://issues.apache.org/jira/browse/CASSANDRA-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987133#comment-16987133 ] Jordan West commented on CASSANDRA-15295: - LGTM. +1. Thanks for all the revisions along the way [~gzh1992n] [~djoshi] > Running into deadlock when do CommitLog initialization > -- > > Key: CASSANDRA-15295 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15295 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Zephyr Guo >Assignee: Zephyr Guo >Priority: Normal > Attachments: image.png, jstack.log, pstack.log, screenshot-1.png, > screenshot-2.png, screenshot-3.png > > > Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a > long time. > I used jstack to saw what happened. The main thread stuck in > *AbstractCommitLogSegmentManager.awaitAvailableSegment* > !screenshot-1.png! > The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it > was not actually running. > !screenshot-2.png! > And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on > java class initialization. > !screenshot-3.png! > This is a deadlock obviously. CommitLog waits for a CommitLogSegment when > initializing. In this moment, the CommitLog class is not initialized and the > main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a > CommitLogSegment with exception and call *CommitLog.handleCommitError*(static > method). COMMIT-LOG-ALLOCATOR will block on this line because CommitLog > class is still initializing. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15442) Read repair implicitly increases read timeout value
[ https://issues.apache.org/jira/browse/CASSANDRA-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai reassigned CASSANDRA-15442: - Assignee: Yifan Cai > Read repair implicitly increases read timeout value > --- > > Key: CASSANDRA-15442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15442 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > > When read repair occurs during a read, internally, it starts several > _blocking_ operations in sequence. See > {{org.apache.cassandra.service.StorageProxy#fetchRows}}. > The timeline of the blocking operations > # Regular read, wait for full data/digest read response to complete. > {{reads[*].awaitResponses();}} > # Read repair read, wait for full data read response to complete. > {{reads[*].awaitReadRepair();}} > # Read repair write, wait for write response to complete. > {{concatAndBlockOnRepair(results, repairs);}} > Step 1 and 2 each waits for the duration of read timeout, say 5 s. > Step 3 waits for the duration of write timeout, say 2 s. > In the worse case, the actual time taken for a read could accumulate to ~12 > s, if each individual step does not exceed the timeout value. > From the client perspective, it does not expect a request taken way higher > than the database configured timeout value. > Such scenario is especially bad for the clients that have set up client-side > timeout monitoring close to the configured one. The clients think the > operations timed out and abort, but they are in fact still running on server. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15442) Read repair implicitly increases read timeout value
Yifan Cai created CASSANDRA-15442: - Summary: Read repair implicitly increases read timeout value Key: CASSANDRA-15442 URL: https://issues.apache.org/jira/browse/CASSANDRA-15442 Project: Cassandra Issue Type: Bug Components: Legacy/Core Reporter: Yifan Cai When read repair occurs during a read, internally, it starts several _blocking_ operations in sequence. See {{org.apache.cassandra.service.StorageProxy#fetchRows}}. The timeline of the blocking operations # Regular read, wait for full data/digest read response to complete. {{reads[*].awaitResponses();}} # Read repair read, wait for full data read response to complete. {{reads[*].awaitReadRepair();}} # Read repair write, wait for write response to complete. {{concatAndBlockOnRepair(results, repairs);}} Step 1 and 2 each waits for the duration of read timeout, say 5 s. Step 3 waits for the duration of write timeout, say 2 s. In the worse case, the actual time taken for a read could accumulate to ~12 s, if each individual step does not exceed the timeout value. From the client perspective, it does not expect a request taken way higher than the database configured timeout value. Such scenario is especially bad for the clients that have set up client-side timeout monitoring close to the configured one. The clients think the operations timed out and abort, but they are in fact still running on server. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15441) Bump generations and document changes to system_distributed and system_traces in 3.0, 3.11
[ https://issues.apache.org/jira/browse/CASSANDRA-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-15441: Status: Ready to Commit (was: Review In Progress) +1 LGTM > Bump generations and document changes to system_distributed and system_traces > in 3.0, 3.11 > -- > > Key: CASSANDRA-15441 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15441 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Aleksey Yeschenko >Assignee: Aleksey Yeschenko >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > We should document all the changes to distributed system keyspaces and assign > unique generations to them. In 3.0 and 3.11 this is just a documentation > issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15441) Bump generations and document changes to system_distributed and system_traces in 3.0, 3.11
[ https://issues.apache.org/jira/browse/CASSANDRA-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-15441: Reviewers: Sam Tunnicliffe, Sam Tunnicliffe (was: Sam Tunnicliffe) Sam Tunnicliffe, Sam Tunnicliffe (was: Sam Tunnicliffe) Status: Review In Progress (was: Patch Available) > Bump generations and document changes to system_distributed and system_traces > in 3.0, 3.11 > -- > > Key: CASSANDRA-15441 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15441 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Aleksey Yeschenko >Assignee: Aleksey Yeschenko >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > We should document all the changes to distributed system keyspaces and assign > unique generations to them. In 3.0 and 3.11 this is just a documentation > issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15401) nodetool stop help is missing "ANTICOMPACTION"
[ https://issues.apache.org/jira/browse/CASSANDRA-15401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986737#comment-16986737 ] Marcus Eriksson commented on CASSANDRA-15401: - I think the ones that can be stopped in 3.11/trunk are: {{COMPACTION, VALIDATION, CLEANUP, SCRUB, UPGRADE_SSTABLES, INDEX_BUILD, TOMBSTONE_COMPACTION, ANTICOMPACTION, VERIFY, VIEW_BUILD, INDEX_SUMMARY, RELOCATE, GARBAGE_COLLECT}} - in addition to my comment above, for the operation to be stoppable the code also needs to check {{CompactionInfo.Holder.isStopRequested()}} which for example the {{..._CACHE_SAVE}} types don't > nodetool stop help is missing "ANTICOMPACTION" > -- > > Key: CASSANDRA-15401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15401 > Project: Cassandra > Issue Type: Improvement >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > > The {{nodetool stop}} command can be used to stop certain activities > including anti-compaction. While we can run {{nodetool stop ANTICOMPACTION}} > on a given node the help menu does not list it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15350) Add CAS “uncertainty” and “contention" messages that are currently propagated as a WriteTimeoutException.
[ https://issues.apache.org/jira/browse/CASSANDRA-15350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986687#comment-16986687 ] Alex Petrov commented on CASSANDRA-15350: - To be honest, I think the fact that names {{WriteStalled}} and {{WriteTimeout}} are quite close to each other might confuse the user. We need to reflect the fact that it's a Paxos round failure or that the reason is that _we do not know_ whether the value is going to go through or not. bq. ErrorMessage is not involved in internode messageing Err; of course. Sorry about that: was thinking in a different context and phrased it wrong. Also, both messages occur on the coordinator, so internode doesn't apply. What I should have written is that this logic is used by the {{SimpleClient}}. bq. the scenario was carefully crafted to be deterministic and aims to produce the same kind of contention. This is precisely what I'm concerned about: it is carefully crafted and might be difficult to maintain. Everyone who'll be modifying the code in the future will have to re-craft the test as well. I think we can relatively easily reproduce it with a fuzz test that introduces contention. I think introducing latency/partition in the test is a reasonable thing, I'd just make it random rather than handcrafted. This will also help us to see how it all behaves when contention is higher. bq. Do you mean rename the method to activate/deactivate, Right, I'd just call them {{activate}} and {{deactivate}}. We also need at least a version of the {{SimpleClient}} to be tested with the changes. Ideally, we need an accompanying patch for the java-driver, since it changes the native protocol. > Add CAS “uncertainty” and “contention" messages that are currently propagated > as a WriteTimeoutException. > - > > Key: CASSANDRA-15350 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15350 > Project: Cassandra > Issue Type: Improvement > Components: Feature/Lightweight Transactions >Reporter: Alex Petrov >Assignee: Yifan Cai >Priority: Normal > Labels: protocolv5, pull-request-available > Attachments: Utf8StringEncodeBench.java > > Time Spent: 20m > Remaining Estimate: 0h > > Right now, CAS uncertainty introduced in > https://issues.apache.org/jira/browse/CASSANDRA-6013 is propagating as > WriteTimeout. One of this conditions it manifests is when there’s at least > one acceptor that has accepted the value, which means that this value _may_ > still get accepted during the later round, despite the proposer failure. > Similar problem happens with CAS contention, which is also indistinguishable > from the “regular” timeout, even though it is visible in metrics correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org