[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808337#comment-13808337 ] Tyler Hobbs commented on CASSANDRA-6109: By the way, it might be a good idea to push this to 2.1 or to disable it by default in 2.0.x. It's a bit of a major change for a bugfix release this far into 2.0.x. Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.3 Attachments: 6109-v1.patch, 6109-v2.patch, 6109-v3.patch I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805387#comment-13805387 ] Tyler Hobbs commented on CASSANDRA-6109: bq. When they make up more than X% do we stop discriminating or merge them only with other cold sstables? I was thinking we would stop discriminating. The logic would basically be this: {noformat} total_reads = sum(sstable.reads_per_sec for sstable in sstables) total_cold_reads = 0 cold_sstables = set() for sstable in sorted(sstables, key=lambda sstable: sstable.reads_per_key_per_sec): if (sstable.reads_per_sec + total_cold_reads) / total_reads configurable_threshold: cold_sstables.add(sstable) total_cold_reads += sstable.reads_per_sec else: break getBuckets(sstable for sstable in sstables if sstable not in cold_sstables) {noformat} Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 Attachments: 6109-v1.patch, 6109-v2.patch I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805394#comment-13805394 ] Jonathan Ellis commented on CASSANDRA-6109: --- Makes sense. Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 Attachments: 6109-v1.patch, 6109-v2.patch I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13804867#comment-13804867 ] Jonathan Ellis commented on CASSANDRA-6109: --- That sounds pretty straightforward. When they make up more than X% do we stop discriminating or merge them only with other cold sstables? Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 Attachments: 6109-v1.patch, 6109-v2.patch I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13803045#comment-13803045 ] Tyler Hobbs commented on CASSANDRA-6109: On second thought, filtering *buckets* (my misinterpretation of your statement) would be very conservative and would generally have little effect since there is no control over what sstables form buckets. Here's my latest proposal: ignore the coldest SSTables until they (collectively) make up more than X% of the total reads/key/sec. This is pretty easy to tune and it handles all of the corner cases we've considered. A conservative default might be 1 to 5%. Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 Attachments: 6109-v1.patch, 6109-v2.patch I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13802167#comment-13802167 ] Tyler Hobbs commented on CASSANDRA-6109: bq. What if we just added a bucket filter that said, SSTables representing less than X% of the reads will not be bucketed? To be clear, you're suggesting ignoring *buckets* whose reads make up less than X% of the total reads/sec for the table, correct? bq. Straightforward to tune and I can't think of any really pathological cases, other than where size-tiering just doesn't put hot overlapping sstables in the same bucket. This is definitely easier to tune. One case I'm concerned about is where the max compaction threshold prevents a bucket from ever being above X% of the total reads/sec, especially with new, small SSTables. If we compare reads *per key* per second instead of just reads/sec, that case goes away. Additionally, while comparing reads/sec would focus compactions on the largest SSTables, comparing reads per key per second would focus on compacting the hottest SSTables, which is an improvement. With that change, I really like this strategy. As far as the default threshold goes, I'll suggest a conservative 2 to 5%. Here's my thought process: there are usually roughly 5 tiers, so each tier should get about 20% of the total reads per key per second if all SSTables were equally hot. Cold sstables should have below 10 to 25% of the normal read rates, giving a 2 to 5% threshold. Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 Attachments: 6109-v1.patch, 6109-v2.patch I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13802176#comment-13802176 ] Jonathan Ellis commented on CASSANDRA-6109: --- bq. you're suggesting ignoring buckets? No, I'm suggesting instead of {{getBuckets(sstables)}}, {{getBuckets(sstable for sstable in sstables if recents_reads_from(sstable) X)}} Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 Attachments: 6109-v1.patch, 6109-v2.patch I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13802206#comment-13802206 ] Tyler Hobbs commented on CASSANDRA-6109: No, I'm suggesting instead of getBuckets(sstables), getBuckets(sstable for sstable in sstables if recents_reads_from(sstable) X) Ah, well that scheme has some problematic cases: * Many cold sstables that collectively make up a large percentage of reads in aggregate may be ignored (like your 10, 1, 1, 1... case above) * It's possible to have no sstables that cross the threshold when they are equally hot Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 Attachments: 6109-v1.patch, 6109-v2.patch I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800057#comment-13800057 ] Jonathan Ellis commented on CASSANDRA-6109: --- What if we just added a bucket filter that said, SSTables representing less than X% of the reads will not be bucketed? Straightforward to tune and I can't think of any really pathological cases, other than where size-tiering just doesn't put hot overlapping sstables in the same bucket. (Which I think is out of scope to solve here -- we need cardinality estimation to fix that.) Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 Attachments: 6109-v1.patch, 6109-v2.patch I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793127#comment-13793127 ] Tyler Hobbs commented on CASSANDRA-6109: I've spent some more time thinking about this and it seems like we either need a more sophisticated approach in order to handle the various corner cases or we need to disable this feature by default. If we disable the feature by default, then using a hotness percentile or something similar might be okay. If we want to enable the feature by default, I've got a couple of more sophisticated approaches: The first approach is fairly simple and uses two parameters: * SSTables which receive less than X% of the reads/sec per key of the hottest sstable (for the whole CF) will be considered cold. * If the cold sstables make up more than Y% of the total reads/sec, don't consider the warmest of the cold sstables cold. (In other words, go through the cold bucket and remove the warmest sstables until the cold bucket makes up less than %Y of the total reads/sec.) This solves one problem of basing coldness on the mean rate, which is that if you have almost all cold sstables, the mean will be very low. Comparing against the max deals well with this. The second parameter acts as a hedge for the case you brought up where a large number of cold sstables can collectively account for a high percentage of the total reads. The second approach is less hacky but more difficult to explain or tune; it's an bucket optimization measure that covers these concerns. Ideally, we would optimize two things: * Average sstable hotness of the bucket * The percentage of the total CF reads that are included in the bucket These two items are somewhat in opposition. Optimizing only for the first measure would mean just compacting the two hottest sstables. Optimizing only for the second would mean compacting all sstables. We can combine the two measures with different weightings to get a pretty good bucket optimization measure. I've played around with some different measures in python and have a script that makes approximately the same bucket choices I would. However, as I mentioned, this would be pretty hard for operators to understand and tune intelligently, somewhat like phi_convict_threshold. If you're still open to that, I can attach my script with some example runs. Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 Attachments: 6109-v1.patch, 6109-v2.patch I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790525#comment-13790525 ] Tyler Hobbs commented on CASSANDRA-6109: +1 on your changes Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 Attachments: 6109-v1.patch, 6109-v2.patch I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790610#comment-13790610 ] Jonathan Ellis commented on CASSANDRA-6109: --- I'm thinking about how I tune this as an operator. If we're going by coldness-relative-to-mean, I'm not really sure where to set that to achieve my read performance goals other than trial and error. Suppose for instance that I have 11 sstables, one of which has 10M reads recently and 10 of which have 1M reads. If I set my threshold to 25% then nothing gets compacted which is probably not what we want, since the 10 cold sstables collectively represent 50% of the read activity. What if instead we # analyze hotness globally (per-CF) rather than per-bucket, and # configure the threshold based on hotness percentile (compact me if I am hotter than N% of my peers) Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 Attachments: 6109-v1.patch, 6109-v2.patch I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790622#comment-13790622 ] Tyler Hobbs commented on CASSANDRA-6109: bq. Suppose for instance that I have 11 sstables, one of which has 10M reads recently and 10 of which have 1M reads. If I set my threshold to 25% then nothing gets compacted which is probably not what we want, since the 10 cold sstables collectively represent 50% of the read activity. Actually, in this case none of the sstables would be considered cold (assuming they all have similar key estimates). The mean reads would be 1.8M, and 0.25 * 1.8M = 0.45M. I agree that it might be difficult to tune intelligently, though. bq. analyze hotness globally (per-CF) rather than per-bucket That seems reasonable to me. bq. configure the threshold based on hotness percentile (compact me if I am hotter than N% of my peers) This has the problem of always ignoring the coldest sstable even when there is little variation between them. So if you have four SSTables with 1M, 1M, 1M, and 0.999M reads, the last will be considered cold and never compacted. Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 Attachments: 6109-v1.patch, 6109-v2.patch I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789939#comment-13789939 ] Jonathan Ellis commented on CASSANDRA-6109: --- Sorry, I had to bikeshed just a little more. Looks like it's cleaner to restrict prepBuckets (renamed) to restricting by max sstables + coldness, and let mostInteresting take care of dropping too-small buckets. This means we'll do a small amount of unnecessary sorting-by-coldness but this is negligible compared to the actual compaction we're setting up. Pushed to https://github.com/jbellis/cassandra/commits/6109 Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 Attachments: 6109-v1.patch, 6109-v2.patch I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786484#comment-13786484 ] Jonathan Ellis commented on CASSANDRA-6109: --- Ah, right. (Since earlier tasks in the queue, or subsequent flushes, may cause us to re-evaluate the sstables we'd like to compact.) Feels to me like we're feature creeping a bit. I'd say let's focus this one on what to do within a CF, and we can open another to prioritize across CFs instead of FIFO. Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786947#comment-13786947 ] Jonathan Ellis commented on CASSANDRA-6109: --- Looks reasonable. - Can you make the actual drop-from-bucket a strategy option? - Can you add a test that exercises prepBuckets to STCSTest, a la the getBuckets test? Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 Attachments: 6109-v1.patch I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785422#comment-13785422 ] Tyler Hobbs commented on CASSANDRA-6109: There's a problem with prioritizing the compaction manager queue: for normal compactions, we enqueue a task that will actually pick the sstables to compact at the last moment, right before the task is run. I think we have three options: # Don't try to prioritize the compaction manager queue # Pick the sstables upfront (maybe only for STCS and not LCS? This behavior was added for CASSANDRA-4310, which is primarily concerned with LCS) and potentially compact a less-than-optimal set of sstables # Prioritize the task when it's submitted by picking an initial bucket of sstables; finalize the bucket, adding sstables if necessary, just before the task is executed I would lean towards #3, although it's the most complex. I just wanted to hear your thoughts before writing that up. Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785445#comment-13785445 ] Jonathan Ellis commented on CASSANDRA-6109: --- We're mostly talking about prioritizing across different CFs, right? What if we just made the compaction manager queue a priority queue instead of FIFO? Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785448#comment-13785448 ] Tyler Hobbs commented on CASSANDRA-6109: bq. We're mostly talking about prioritizing across different CFs, right? Correct. bq. What if we just made the compaction manager queue a priority queue instead of FIFO? Yeah, that's what I'm trying to do, it's just that with the current behavior, we can't determine the priority when inserting the tasks into the queue because the sstables aren't picked until tasks are *removed* from the queue. Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784245#comment-13784245 ] Jonathan Ellis commented on CASSANDRA-6109: --- I guess whether hotness or overlap is a more important criterion depends on your goal: # prioritizing by hotness helps speed reads up more, especially when you have a lot of cold data sitting around # prioritizing by overlap ratio reduces disk space and helps throw away obsolete cells faster I was hoping to tackle #1 here, but maybe that needs a separate strategy a la CASSANDRA-5560. For #2, CASSANDRA-5906 adds a HyperLogLog component that does a fantastic job of letting us estimate overlap ratios. Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784300#comment-13784300 ] Tyler Hobbs commented on CASSANDRA-6109: I think I have some clearer ideas about how to do this now. We should be able to combine hotness and overlap concerns at the different levels. At level (1), avoid compacting comparatively cold data by dropping sstables from buckets when their hotness is less than, say, 25% of the bucket average (this avoids the low-variance problem of using the stddev). If the bucket falls below the min compaction threshold, ignore it (to make sure we're compacting enough sstables at once). At level (2), submit the hottest bucket to the executor for compaction. The average number of sstables hit per-read is actually a decent measure for prioritizing compactions at the executor level. At level (3), we can combine that with the bucket hotness to get a rough idea of how many individual sstable reads per second we could save by compacting a given bucket (hotness * avg_sstables_per_read). Prioritize compaction tasks in the queue based on this measure. That should give us a nice balance of not compacting cold data and prioritizing compaction of the most read and most fragmented sstables. Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784323#comment-13784323 ] Jonathan Ellis commented on CASSANDRA-6109: --- SGTM. Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783326#comment-13783326 ] Tyler Hobbs commented on CASSANDRA-6109: bq. The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with all compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) The pending compactions stat is already pretty wonky, so I'm not sure we should prioritize keeping that sane. Option 1 (don't compact cold sstables) seems dangerous as a first step compared to option 2, especially because it's hard to decide what is cold. Prioritizing compaction of hotter sstables seems like the better first step. When comparing hotness of sstables, I think a good measure is {{avg_reads_per_sec / number_of_keys}} rather than just {{avg_reads_per_sec}} so that large sstables aren't over-weighted. When I mention the hotness of a bucket of sstables below, I'm talking about the sum of the hotness measure across the individual sstables. For prioritizing compaction of hotter sstables, it seems like there are a few levels this can operate at: # Picking sstable members for compaction buckets # Picking the most interesting bucket to submit to the compaction executor (currently the smallest sstables are considered the most interesting) # At the compaction executor level, prioritizing tasks in the queue (the queue is not currently prioritized) (1) seems like the most difficult point to make good decisions at. I can imagine a scheme like dropping members that are below {{2 * stdev}} of the mean hotness for the bucket working decently, but some of the efficiency of compacting many sstables at once is lost, and some of the drops would be poor when there is little variance among the sstables. (2) would probably work well by itself, although, as discussed below, sstable overlap is a better measure than hotness for this. (3) requires (2) to be somewhat fair. Each table submits its hottest buckets for compaction, and the executor prioritizes the hottest buckets in the queue (regardless of which table they came from). There is a potential for starvation among colder tables when compaction falls behind, but that may be mitigated by a few things: * If the compaction of the hotter sstables is very effective at merging rows, the hotness of future buckets for that table should be lower. Since the hotness of a bucket is the sum of its members, if four totally overlapping sstables are merged into one sstable, the hotness of the new sstable should be 1/4 of the hotness of the previous bucket. I'll point out that tracking how much overlap there is among sstables would be a much better measure than hotness for picking which compactions to prioritize; in the worst case here (no overlap), the hotness of the newly compacted sstable could be the same as the bucket it came from. * If we were willing to discard cold items in the queue when hotter items came in and the queue was full, colder tables would eventually submit new tasks with more sstables in them (thus having greater hotness). While I'm thinking about it, do we have any tickets or features in place to track sstable overlap (beyond average number of sstables hit per read at the table level)? Consider coldness in STCS compaction Key: CASSANDRA-6109 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 2.0.2 I see two options: # Don't compact cold sstables at all # Compact cold sstables only if there is nothing more important to compact The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)