[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268393#comment-14268393 ] Jonathan Ellis commented on CASSANDRA-7019: --- Did you mark 7272 as duplicate by mistake instead of this one? Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Labels: compaction Fix For: 3.0 It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249979#comment-14249979 ] T Jake Luciani commented on CASSANDRA-7019: --- Just a note that this should also work on repaired sstables. As mentioned in CASSANDRA-7272 we repair the entire partition so we will end up with N copies of a partition in the repaired sstables. Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Labels: compaction Fix For: 3.0 It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235992#comment-14235992 ] Jonathan Ellis commented on CASSANDRA-7019: --- That probably fits how compaction thinks of its job, better than trying to do it 1:1. +1 from me. Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Labels: compaction Fix For: 3.0 It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231398#comment-14231398 ] Marcus Eriksson commented on CASSANDRA-7019: WDYT [~jbellis] should I finish up the patch as a major compaction for LCS? Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Labels: compaction Fix For: 3.0 It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156144#comment-14156144 ] Marcus Eriksson commented on CASSANDRA-7019: bq. Just rewrite the existing tables minus tombstones without merging or changing levels, as originally proposed yeah this is probably the way to go, just hurts to have the compacted partition in-hand and throw it away Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Labels: compaction Fix For: 3.0 It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156277#comment-14156277 ] Aleksey Yeschenko commented on CASSANDRA-7019: -- I dislike option 1, for personal reasons. Was really hoping to utilize major tombstone compaction for CASSANDRA-7975. Without that, I don't see a direct solution for LCS counter tables. Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Labels: compaction Fix For: 3.0 It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156278#comment-14156278 ] Marcus Eriksson commented on CASSANDRA-7019: [~iamaleksey] we could get rid of the shards in the Reducer, the same way we would get rid of tombstones (ie, merge all counter shards and put them in a single sstable) though, would perhaps be nice to keep this as a major compaction for LCS in addition to option 1 Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Labels: compaction Fix For: 3.0 It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156595#comment-14156595 ] Jeremiah Jordan commented on CASSANDRA-7019: I think we should definitely keep the major compact LCS option. And change STCS major compaction to this new way. If we also want to have the delete/tombstone only compaction that just rewrites existing files without expired tombstones and deleted data (kind of like what half the people out there think cleanup currently does), that works too. Though I think that mode is a little harder to implement if we want it to remove tombstones and dead data, as you need to rewrite multiple files at once for that, and without doing that you aren't going to be able to remove all the old tombstones. Just the ones we were being too conservative on throwing out because we don't do a full read, but just a bloom filter check, to see if the tombstone still shadows existing data.. Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Labels: compaction Fix For: 3.0 It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155144#comment-14155144 ] Jonathan Ellis commented on CASSANDRA-7019: --- bq. The problem with starting in high levels is that it will take a long time before that data gets included in a (minor) compaction. But you already have that problem, just with 90% of your data instead of 100%. IMO the two options that make the most sense are: # Just rewrite the existing tables minus tombstones without merging or changing levels, as originally proposed # Write all the sstables out, then pick a level for them when complete such that all the sstables fit in the level (and they don't overlap with anything flushed + compacted by other threads in the meantime) Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Labels: compaction Fix For: 3.0 It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147823#comment-14147823 ] Marcus Eriksson commented on CASSANDRA-7019: branch here: https://github.com/krummas/cassandra/commits/marcuse/7019-2 triggered with nodetool compact -o ks cf It writes fully compacted partitions - each partition will only be in one single sstable - my first idea was to put the cells back in the corresponding files where they were found (minus tombstones), but it felt wrong to not actually write the compacted partition out when we have it. LCS: * creates an 'optimal' leveling - it takes all existing files, compacts them, and starts filling each level from L0 up ** note that (if we have token range 0 - 1000) L1 will get tokens 0-10, L2 11-100 and L3 101 - 1000. Not though much about if this is good/bad for future compactions. STCS: * calculates an 'optimal' distribution of sstables, currently it makes them 50%, 25%, 12.5% ... of total data size until the smallest sstable would be sub 50MB, then puts all the rest in the last sstable. If anyone has a more optimal sstable distribution, please let me know ** the sstables will be non-overlapping, it starts writing the biggest sstable first and continues with the rest once 50% is in that Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Labels: compaction It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147831#comment-14147831 ] Jeremiah Jordan commented on CASSANDRA-7019: Since this is going in 3.0, maybe we should make this the default nodetool compact. I don't know of any case where the STCS put everything in one file is really what people want. And for LCS all we used to do is run the compaction task like normal. If we still want a way to kick compaction for LCS, we could add a new nodetool checkcompaction command or something that just schedules the compaction manager to run (and does that for STCS and LCS). Doing that is useful when someone changes compaction settings and there are not currently writes happening to the system, so making it an explicit command sounds right to me. Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Labels: compaction Fix For: 3.0 It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147843#comment-14147843 ] Carl Yeksigian commented on CASSANDRA-7019: --- For LCS, we might be artifically penalizing early tokens. What if we started at the highest level which we are currently storing data in instead of at L1? It will be a good proxy for the size of the data that we are currently storing, and it will avoid unnecessarily recompacting data because we placed it in such a low level. I'm +1 to [~jjordan]'s proposal to change the default to this; I'd rather just add an option to compact to start minor compactions instead of adding a new command. Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Labels: compaction Fix For: 3.0 It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147892#comment-14147892 ] sankalp kohli commented on CASSANDRA-7019: -- [~carlyeks] Can you explain your idea about putting stables. If the application is using upto say L4, we should fill L4 then L3 and so on? Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Labels: compaction Fix For: 3.0 It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147903#comment-14147903 ] Carl Yeksigian commented on CASSANDRA-7019: --- I was thinking L4, then L5 (as in this patch, currently). Ideally, we would pick the level where all of the sstables would fit, but we don't know how many sstables will end up being produced by the compaction in the end, so this seems like a compromise. This would be similar to the thinking in CASSANDRA-6323. Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Labels: compaction Fix For: 3.0 It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148058#comment-14148058 ] Marcus Eriksson commented on CASSANDRA-7019: The problem with starting in high levels is that it will take a long time before that data gets included in a (minor) compaction. This is basically a major compaction (like in current STCS) The option to not putting low tokens in lower levels is to write all levels at the same time and randomly distribute the tokens over the levels (and put 1% in L1, 10% in L2, 89% in L3), but i cant really see any difference compared to having the low tokens in one sstable, the number of overlapping tokens between a newly flushed file in L0 and L1 should be the same (if tokens are evenly distributed over the flushed sstable) Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Labels: compaction Fix For: 3.0 It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148186#comment-14148186 ] Carl Yeksigian commented on CASSANDRA-7019: --- I have no problem with making it consistent but arbitrary which tokens go into L1/L2, just thought it would be better to put all of them in the same level since they'll move there eventually. I think you're right, though; they will end up not being included in minor compactions, so it would continually require a major tombstone compaction. Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Labels: compaction Fix For: 3.0 It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146662#comment-14146662 ] sankalp kohli commented on CASSANDRA-7019: -- [~krummas] Thanks for picking this up :). I think we can do other optimizations like putting all tombstones in the last level so that they can be dropped easily when they are past gc grace. Once we have repair aware gc grace, it will not be required. Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Labels: compaction It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146685#comment-14146685 ] Marcus Eriksson commented on CASSANDRA-7019: [~kohlisankalp] ill post a proof of concept patch for option 1 in the description tomorrow, idea is to basically run a major compaction, but have the compaction strategy decide on an 'optimal' sstable distribution for the strategy instead of just creating a big one, for LCS it simply fills levels from level 1 and up. For STCS it will create sstables where one has 50%, one 25% of the data, etc until the sstables get too small. This is mostly for the oh crap we have a ton of tombstones and need to get rid of them-case, not for the day-to-day case, need to figure out something more for that (like your idea perhaps) Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Labels: compaction It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061378#comment-14061378 ] Alexey Plotnik commented on CASSANDRA-7019: --- That's what we needed. We have LCS and a lot of SSTables. Compaction process always retain a lot of tombstones. We need something like super-compaction, or tombstone-compaction, or name it as you want. There is must be a procedure similar Cleanup, but for tombstones deletions. I like it. Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Labels: compaction It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005197#comment-14005197 ] sankalp kohli commented on CASSANDRA-7019: -- I also like this idea. If you have IOPs to spare, why not compact across levels and get rid of extra data. I think we should call it multilevel compaction. No of tombstones is one way to trigger it. Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Labels: compaction It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.2#6252)