[jira] [Commented] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster
[ https://issues.apache.org/jira/browse/CASSANDRA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567278#comment-14567278 ] Sylvain Lebresne commented on CASSANDRA-8547: - Maybe Tyler meant CASSANDRA-9486? And I do think the actual problem is the one pointed in CASSANDRA-9486. The idea behind {{RangeTombstone.Tracker}} is that it only tracks tombstones that are actually useful, i.e. those that still cover something. As such, the linear scan of {{isDeleted}} shouldn't be a problem, it shouldn't scan anything uselessly. However, and that's what CASSANDRA-9486, the tracker is not always use properly, and there is cases where it's {{update}} method is not called, resulting in the non-expected higher cost in {{isDeleted}}. In practice, I'm sure the attached patch does improve things, but that's not really the right fix. And as the right fix is being discussed on CASSANDRA-9486 already, I'm going to mark this as a duplicate. Make RangeTombstone.Tracker.isDeleted() faster -- Key: CASSANDRA-8547 URL: https://issues.apache.org/jira/browse/CASSANDRA-8547 Project: Cassandra Issue Type: Improvement Components: Core Environment: 2.0.11 Reporter: Dominic Letz Assignee: Dominic Letz Labels: tombstone Fix For: 2.1.x Attachments: Selection_044.png, cassandra-2.0.11-8547.txt, cassandra-2.1-8547.txt, rangetombstone.tracker.txt During compaction and repairs with many tombstones an exorbitant amount of time is spend in RangeTombstone.Tracker.isDeleted(). The amount of time spend there can be so big that compactions and repairs look stalled and the time remaining time estimated frozen at the same value for days. Using visualvm I've been sample profiling the code during execution and both in Compaction as well as during repairs found this. (point in time backtraces attached) Looking at the code the problem is obviously the linear scanning: {code} public boolean isDeleted(Column column) { for (RangeTombstone tombstone : ranges) { if (comparator.compare(column.name(), tombstone.min) = 0 comparator.compare(column.name(), tombstone.max) = 0 tombstone.maxTimestamp() = column.timestamp()) { return true; } } return false; } {code} I would like to propose to change this and instead use a sorted list (e.g. RangeTombstoneList) here instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster
[ https://issues.apache.org/jira/browse/CASSANDRA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564997#comment-14564997 ] Tyler Hobbs commented on CASSANDRA-8547: I believe this is obsoleted by CASSANDRA-6446. Make RangeTombstone.Tracker.isDeleted() faster -- Key: CASSANDRA-8547 URL: https://issues.apache.org/jira/browse/CASSANDRA-8547 Project: Cassandra Issue Type: Improvement Components: Core Environment: 2.0.11 Reporter: Dominic Letz Assignee: Dominic Letz Labels: tombstone Fix For: 2.1.x Attachments: Selection_044.png, cassandra-2.0.11-8547.txt, cassandra-2.1-8547.txt, rangetombstone.tracker.txt During compaction and repairs with many tombstones an exorbitant amount of time is spend in RangeTombstone.Tracker.isDeleted(). The amount of time spend there can be so big that compactions and repairs look stalled and the time remaining time estimated frozen at the same value for days. Using visualvm I've been sample profiling the code during execution and both in Compaction as well as during repairs found this. (point in time backtraces attached) Looking at the code the problem is obviously the linear scanning: {code} public boolean isDeleted(Column column) { for (RangeTombstone tombstone : ranges) { if (comparator.compare(column.name(), tombstone.min) = 0 comparator.compare(column.name(), tombstone.max) = 0 tombstone.maxTimestamp() = column.timestamp()) { return true; } } return false; } {code} I would like to propose to change this and instead use a sorted list (e.g. RangeTombstoneList) here instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster
[ https://issues.apache.org/jira/browse/CASSANDRA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565008#comment-14565008 ] Oleg Anastasyev commented on CASSANDRA-8547: Um, not sure. At least for 2.0 we had this problem during repairs after application of 6446. Make RangeTombstone.Tracker.isDeleted() faster -- Key: CASSANDRA-8547 URL: https://issues.apache.org/jira/browse/CASSANDRA-8547 Project: Cassandra Issue Type: Improvement Components: Core Environment: 2.0.11 Reporter: Dominic Letz Assignee: Dominic Letz Labels: tombstone Fix For: 2.1.x Attachments: Selection_044.png, cassandra-2.0.11-8547.txt, cassandra-2.1-8547.txt, rangetombstone.tracker.txt During compaction and repairs with many tombstones an exorbitant amount of time is spend in RangeTombstone.Tracker.isDeleted(). The amount of time spend there can be so big that compactions and repairs look stalled and the time remaining time estimated frozen at the same value for days. Using visualvm I've been sample profiling the code during execution and both in Compaction as well as during repairs found this. (point in time backtraces attached) Looking at the code the problem is obviously the linear scanning: {code} public boolean isDeleted(Column column) { for (RangeTombstone tombstone : ranges) { if (comparator.compare(column.name(), tombstone.min) = 0 comparator.compare(column.name(), tombstone.max) = 0 tombstone.maxTimestamp() = column.timestamp()) { return true; } } return false; } {code} I would like to propose to change this and instead use a sorted list (e.g. RangeTombstoneList) here instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster
[ https://issues.apache.org/jira/browse/CASSANDRA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565143#comment-14565143 ] Randy Fradin commented on CASSANDRA-8547: - I see this problem on 2.1.5 so I don't think this is resolved. A validation compaction is completely stuck; a thread dump shows it inside this loop, and top is showing 1 CPU core 100% utilized. Make RangeTombstone.Tracker.isDeleted() faster -- Key: CASSANDRA-8547 URL: https://issues.apache.org/jira/browse/CASSANDRA-8547 Project: Cassandra Issue Type: Improvement Components: Core Environment: 2.0.11 Reporter: Dominic Letz Assignee: Dominic Letz Labels: tombstone Fix For: 2.1.x Attachments: Selection_044.png, cassandra-2.0.11-8547.txt, cassandra-2.1-8547.txt, rangetombstone.tracker.txt During compaction and repairs with many tombstones an exorbitant amount of time is spend in RangeTombstone.Tracker.isDeleted(). The amount of time spend there can be so big that compactions and repairs look stalled and the time remaining time estimated frozen at the same value for days. Using visualvm I've been sample profiling the code during execution and both in Compaction as well as during repairs found this. (point in time backtraces attached) Looking at the code the problem is obviously the linear scanning: {code} public boolean isDeleted(Column column) { for (RangeTombstone tombstone : ranges) { if (comparator.compare(column.name(), tombstone.min) = 0 comparator.compare(column.name(), tombstone.max) = 0 tombstone.maxTimestamp() = column.timestamp()) { return true; } } return false; } {code} I would like to propose to change this and instead use a sorted list (e.g. RangeTombstoneList) here instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster
[ https://issues.apache.org/jira/browse/CASSANDRA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565142#comment-14565142 ] Randy Fradin commented on CASSANDRA-8547: - I see this problem on 2.1.5 so I don't think this is resolved. A validation compaction is completely stuck; a thread dump shows it inside this loop, and top is showing 1 CPU core 100% utilized. Make RangeTombstone.Tracker.isDeleted() faster -- Key: CASSANDRA-8547 URL: https://issues.apache.org/jira/browse/CASSANDRA-8547 Project: Cassandra Issue Type: Improvement Components: Core Environment: 2.0.11 Reporter: Dominic Letz Assignee: Dominic Letz Labels: tombstone Fix For: 2.1.x Attachments: Selection_044.png, cassandra-2.0.11-8547.txt, cassandra-2.1-8547.txt, rangetombstone.tracker.txt During compaction and repairs with many tombstones an exorbitant amount of time is spend in RangeTombstone.Tracker.isDeleted(). The amount of time spend there can be so big that compactions and repairs look stalled and the time remaining time estimated frozen at the same value for days. Using visualvm I've been sample profiling the code during execution and both in Compaction as well as during repairs found this. (point in time backtraces attached) Looking at the code the problem is obviously the linear scanning: {code} public boolean isDeleted(Column column) { for (RangeTombstone tombstone : ranges) { if (comparator.compare(column.name(), tombstone.min) = 0 comparator.compare(column.name(), tombstone.max) = 0 tombstone.maxTimestamp() = column.timestamp()) { return true; } } return false; } {code} I would like to propose to change this and instead use a sorted list (e.g. RangeTombstoneList) here instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster
[ https://issues.apache.org/jira/browse/CASSANDRA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265891#comment-14265891 ] Benedict commented on CASSANDRA-8547: - This looks like something [~slebresne] should take a look at. Make RangeTombstone.Tracker.isDeleted() faster -- Key: CASSANDRA-8547 URL: https://issues.apache.org/jira/browse/CASSANDRA-8547 Project: Cassandra Issue Type: Improvement Components: Core Environment: 2.0.11 Reporter: Dominic Letz Assignee: Dominic Letz Fix For: 2.1.3 Attachments: Selection_044.png, cassandra-2.0.11-8547.txt, cassandra-2.1-8547.txt, rangetombstone.tracker.txt During compaction and repairs with many tombstones an exorbitant amount of time is spend in RangeTombstone.Tracker.isDeleted(). The amount of time spend there can be so big that compactions and repairs look stalled and the time remaining time estimated frozen at the same value for days. Using visualvm I've been sample profiling the code during execution and both in Compaction as well as during repairs found this. (point in time backtraces attached) Looking at the code the problem is obviously the linear scanning: {code} public boolean isDeleted(Column column) { for (RangeTombstone tombstone : ranges) { if (comparator.compare(column.name(), tombstone.min) = 0 comparator.compare(column.name(), tombstone.max) = 0 tombstone.maxTimestamp() = column.timestamp()) { return true; } } return false; } {code} I would like to propose to change this and instead use a sorted list (e.g. RangeTombstoneList) here instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)