[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106866#comment-15106866 ] Carl Yeksigian commented on CASSANDRA-7409: --- I've been working on this on and off; here is the latest [branch|https://github.com/carlyeks/cassandra/commits/ticket/7409]. I think some of the changes recently will improve the same issues here: CASSANDRA-6696 can have more simultaneous compactions (limit is # of disks), and CASSANDRA-10540 will further improve that (limit is # of ranges). I still think this has merits, but in order to instrument this, I've focused on adding the additional logging support in CASSANDRA-10805, which has been useful in figuring out what exactly is going on with these compactions. I still haven't been able to find the cause of the poor performance in the L0 selection when MOLO = 0. > Allow multiple overlapping sstables in L1 > - > > Key: CASSANDRA-7409 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Carl Yeksigian >Assignee: Carl Yeksigian > Labels: compaction > Fix For: 3.x > > > Currently, when a normal L0 compaction takes place (not STCS), we take up to > MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and > compact them together. If we didn't have to deal with the overlapping L1 > tables, we could compact a higher number of L0 sstables together into a set > of non-overlapping L1 sstables. > This could be done by delaying the invariant that L1 has no overlapping > sstables. Going from L1 to L2, we would be compacting fewer sstables together > which overlap. > When reading, we will not have the same one sstable per level (except L0) > guarantee, but this can be bounded (once we have too many sets of sstables, > either compact them back into the same level, or compact them up to the next > level). > This could be generalized to allow any level to be the maximum for this > overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098186#comment-15098186 ] Sylvain Lebresne commented on CASSANDRA-7409: - Where are we with this? This feels like a shame to let all that code rotting. > Allow multiple overlapping sstables in L1 > - > > Key: CASSANDRA-7409 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Carl Yeksigian >Assignee: Carl Yeksigian > Labels: compaction > Fix For: 3.x > > > Currently, when a normal L0 compaction takes place (not STCS), we take up to > MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and > compact them together. If we didn't have to deal with the overlapping L1 > tables, we could compact a higher number of L0 sstables together into a set > of non-overlapping L1 sstables. > This could be done by delaying the invariant that L1 has no overlapping > sstables. Going from L1 to L2, we would be compacting fewer sstables together > which overlap. > When reading, we will not have the same one sstable per level (except L0) > guarantee, but this can be bounded (once we have too many sets of sstables, > either compact them back into the same level, or compact them up to the next > level). > This could be generalized to allow any level to be the maximum for this > overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536739#comment-14536739 ] Alan Boudreault commented on CASSANDRA-7409: Tests are done, no new blockers experienced during the runs: https://drive.google.com/drive/u/0/folders/0BwZ_GPM33j6KfktyN29kelQzd3NEYnNhTnpfajE2UDRwTTUtQkxwQVQ4YnpqaEMxSUk4TXM We do see some bad performance for standard LCS for Like and Temperature scenarios. I will compare them with 2.1 to ensure it's not a new issue. Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.x Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530307#comment-14530307 ] Marcus Eriksson commented on CASSANDRA-7409: [~carlyeks] what is the status now? Waiting for another round of benchmarks? Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.x Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530563#comment-14530563 ] Alan Boudreault commented on CASSANDRA-7409: The last issue I had (CASSANDRA-9240) is resolve. The 2 last patterns to run will be done by the end of the week. Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.x Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507724#comment-14507724 ] Carl Yeksigian commented on CASSANDRA-7409: --- I rebased and pushed a new [branch|https://github.com/carlyeks/cassandra/tree/overlapping-rebased] for these tests; there have been fixes on trunk which weren't included in the branch and were throwing off the results. Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.0 Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500053#comment-14500053 ] Alan Boudreault commented on CASSANDRA-7409: All scenario with basic patterns have been run. Same url than above. Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.0 Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494737#comment-14494737 ] Alan Boudreault commented on CASSANDRA-7409: FYI, there are 2 performance scenarios done: https://drive.google.com/drive/u/0/folders/0BwZ_GPM33j6KfktyN29kelQzd3NEYnNhTnpfajE2UDRwTTUtQkxwQVQ4YnpqaEMxSUk4TXM Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.0 Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370108#comment-14370108 ] Alan Boudreault commented on CASSANDRA-7409: [~carlyeks] I am going to start some work for those performance tests. (stress user profile etc.) Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.0 Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365250#comment-14365250 ] Alan Boudreault commented on CASSANDRA-7409: Devs, I've written a first version of the test plan for the compaction strategies. See: https://github.com/riptano/cassandra-test-plans/wiki/Compaction-performance-test-plan All comments/additions are welcome. Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.0 Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332345#comment-14332345 ] Carl Yeksigian commented on CASSANDRA-7409: --- I've pushed up an updated branch which addresses these concerns. I can rebase if it looks good. The reason that I used the sstable count instead of size in total bytes is I'm trying to find a level which has a lot of small files. If the level is oversized, it will go through a normal compaction, but if there are too many sstables, we don't catch that anywhere. It was originally in case we had a situation like in L0 where you write a lot of small files, they get compacted together and produce another small file, and the compaction doesn't include other L1 files so that there is either a small number or a larger file. I like the ideas for the improvements; both definitely worth investigating. I'll discuss a plan for testing this with [~enigmacurry] this week. Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.0 Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316241#comment-14316241 ] Marcus Eriksson commented on CASSANDRA-7409: bq. I'm a little suspicious of the MOLO=0 results Me too, looking at the code and running a few short tests indicate that they are identical in what compaction candidates they pick - we should probably run a few more tests before committing this. * LCS.getScanners - different checks if level = 0 and level = mol - fold them up? (The comment above the level = 0 check is no longer valid - an sstable can never be in -1) * When getting candidates for same level compaction, we always start from the smallest token, should probably record last token we picked and start from that? * In getCandidatesForSameLevelCompaction, why are we including next level when level == maxOverlapping? Feels like we should be able to do an in-level compaction even though the next level is non-overlapping. * In getCompactionCandidates, what is the reason the score is based on number of sstables instead of level size in total bytes? * In getCompactionCandidates, this looks wrong to me: {{SetSSTableReader candidates = Sets.union(Collections.singleton(sstable), Sets.union(overlapping(sstable, getLevel(level)), overlapping(sstable, getLevel(level + 1;}} - We should grab all the sstables in {{level + 1}} that overlap with any of the sstables we pick from {{level}}. Running stress against this with an {{assert false;}} in LeveledManifest.add(...) if we cannot add the sstable will show any overlaps (the only valid case of when this happens is after an incremental repair and we move sstables from unrepaired manifest to repaired manifest). Nits: * Brace on newline in StorageService.getManifestDescription and LM.calculateOverlappingScore * nodetool getmanifest description could be something more generic like get compaction manifest Random thoughts/future improvements(?): * What if we made the maximum overlapping level contain as much data as the first non-overlapping level? Then the sstables would cover approximately the same ranges and we could probably run more compactions in parallel between those levels (it would probably increase write amplification though so I'm unsure if there would be any benefits). We could also, in theory, bump sstables to higher level without compaction, ie, say that we have maxOverlappingLevel = 2, we run a compaction with one L3-sstable and 10 L4 sstables, this creates a gap in level 3, and it could be possible to take an L2-sstable and just bump it up to L3 * Improve the overlap score using CompactionMetadata.cardinalityEstimator (in a new ticket though) Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.0 Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278871#comment-14278871 ] Carl Yeksigian commented on CASSANDRA-7409: --- It works in much the same way as was described before; except that: - When promoting to a level which is not overlapping, the selection will take all overlapping sstables to compact, rather than just a single one. This is how it was supposed to work initially, but was actually hitting a different case instead. - When selecting the level which needs to be compacted, it starts from 0 going up to MOLO - 1, then from the max level down to MOLO. This means that we try to compact any overlapping levels first, and if we don't find anything, then we'll compact according to our previous ways of compacting. I'm a little suspicious of the MOLO=0 results, now that I've specified the changes; it should more closely mirror the results I got from LCS w/o STCS. It is sensitive to the length of the test; L5 allows for as many overlaps as necessary in this test. The levels shouldn't have any overlap, though; it won't stop compactions until none of the levels have overlap. The L2 runs slower because it is doing more I/O because of the non-overlapping. Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.0 Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278788#comment-14278788 ] Jonathan Ellis commented on CASSANDRA-7409: --- What does your updated algorithm look like now? Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.0 Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278798#comment-14278798 ] Jonathan Ellis commented on CASSANDRA-7409: --- Also, how sensitive is this to the test duration? Is MOLO=5 the magic number because we generate about 5 L1's worth of data before the test ends? Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.0 Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277372#comment-14277372 ] Carl Yeksigian commented on CASSANDRA-7409: --- I've pushed up an updated branch at: https://github.com/carlyeks/cassandra/tree/overlapping-better-selection The compaction selection has been updated so that it performs much better now. The biggest issue was selecting a single sstable for the overlapping compactions, instead of the that one plus the overlapping ones. It performs much better now. || || Operation Time || Compaction Time || | MOLO=0 | 3:52:46 | 0:21:04 | | MOLO=2 | 3:45:52 | 0:37:50 | | MOLO=5 | 3:42:59 | 0:03:17 | | LCS w/ STCS | 3:48:14 | 0:50:24 | | LCS w/o STCS | 3:50:38 | 1:05:02 | The performance on spinning disk is also improved by allowing overlapping; here are the results of a read operation after running a large mixed read/write workload: http://cstar.datastax.com/graph?stats=e113f706-9b54-11e4-9f2c-42010af0688fmetric=op_rateoperation=2_usersmoothing=1show_aggregates=truexmin=0xmax=121.88ymin=0ymax=113984.2 Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.0 Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206385#comment-14206385 ] Marcus Eriksson commented on CASSANDRA-7409: Did you keep the logs? I guess we could estimate how much write amplification we do by summing up the amounts in the Compacted ... X bytes to Y lines Why do we pick candidates from the bottom now? Could we perhaps first check higher levels down to the MOLO-level and then bottom-up in the levels that allow overlap? Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.0 Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206931#comment-14206931 ] Carl Yeksigian commented on CASSANDRA-7409: --- It appears that the L2-L3 compactions are including everything in L3 (kind of like L0-L1 now). This shouldn't happen, so it seems like it isn't choosing the best sstables to compact together in L3. It is the same in L0-L1 in the new LCS run. The reason that we pick candidates from the bottom is that we want to use the IO we have on making progress on pushing data through the levels. This means that with overlapping, we want to get as much data as possible out of L0 at each compaction, but we aren't wasting IO because we shouldn't do any rewriting of data from L1 until we have nothing else to compact. There is going to be an issue when we have MOLO=0, because we can't do anything about that overlapping, so it makes sense to keep the old behaviour in at least that case. Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.0 Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200322#comment-14200322 ] Carl Yeksigian commented on CASSANDRA-7409: --- It's the default settings -- no overlapping in L1, but the compaction candidates are selected slightly differently compared to before (starting at L0 and going up instead of the other way). It's possible that the way this selection is happening is prolonging the compaction process tremendously; I hadn't seen the same behavior locally, so I'll need to figure out a different way to debug. Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.0 Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198665#comment-14198665 ] Carl Yeksigian commented on CASSANDRA-7409: --- I've finished running a performance test; it's available [here|http://cstar.datastax.com/tests/id/162b4540-6386-11e4-a260-bc764e04482c]. The branches tested are: - Overlapping LCS - New LCS, which does not use STCS in L0, but selects the compactions in a different order - Old LCS with STCS - Old LCS without STCS The time to complete the test for each: || || Overlapping LCS || New LCS || Old LCS/STCS || Old LCS || | Operations | 1:05:47 | 1:00:18 | 1:24:23 | 1:15:57 | | Compactions to 0 | 4:59:21 | 9:56:11 | 1:12:59 | 1:42:33 | Unfortunately, the improvements to LCS cause the compaction time (to get to 0 pending compactions) to be significantly worse. Going through the logs, it seems that there are many fewer timeouts generated by the new configurations, so it is providing better latency. Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.0 Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199161#comment-14199161 ] Jonathan Ellis commented on CASSANDRA-7409: --- Is new LCS just overlapping LCS + cassandra.disable_stcs_in_l0, or something else? Intuitively I'm not sure what to make of Overlapping doing so much worse in getting to zero-pending than both LCS+STCS and LCS alone. Any insight there? Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.0 Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163408#comment-14163408 ] Marcus Eriksson commented on CASSANDRA-7409: Do you think we could generalize and remove the {code}if (maxOverlappingLevel == 0){code} parts? Feels like we could get approximately the same behavior if we do a same-level compaction instead of STCS in L0? Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.0 Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163555#comment-14163555 ] Carl Yeksigian commented on CASSANDRA-7409: --- The other change between the two modes is that in LCS currently we check from the highest level down to find a compaction, this goes from the lowest level up. We would get some benefits from doing an L0 same-level compaction, so I think that sounds like a good idea. I fixed a bug that was preventing concurrent L0-L1 compactions, so the commit changed recently. I'm going to start working with [~enigmacurry] to do performance testing on this. Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.0 Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144912#comment-14144912 ] Carl Yeksigian commented on CASSANDRA-7409: --- This was the original purpose of the same level compaction; it was supposed to place the results of the compaction into the same level. It seems that I broke it at some point, but with that correction, it should behave correctly for the last part. For the earlier, it makes sense to try to include the sstables which overlap the least into an uplevel compaction. This will keep the keys which are being written to a lot in a level which allows for overlapping. I'm happy to change the overlap estimator, assuming we can figure out which one we'd like to say is going to be our estimator going forward. If it's still going to be experimental, I think I'd rather just leave it as a really rough estimator of the overlap, and change it afterwards. There are two measures against vanilla LCS that we want to test. - L0 Compaction No reads, no writes. Take sstables and dump into L0. Metric: Time to 0 compactions remaining. - Heavy write Heavy writes (such that LCS is overwhelmed), some reads Metric: read 0.99 I expect that L0 compaction times should be similar between LCS w/ STCS and OCS, with OCS being slightly slower. Under the heavy write scenario, however, there should be a large benefit to using OCS. Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.0 Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119824#comment-14119824 ] Marcus Eriksson commented on CASSANDRA-7409: I think the within-level compaction could be reworked a bit, now we find which sstables are overlapping the most, and compact those into the next level. Since sstables that overlap a lot are bound to contain the same keys, it means we are compacting hot partitions into the next level, where it is likely to be overlapping with a bunch of sstables again causing another compaction into the next level. If we instead try to guess how much overlap a set of sstables would cause in the next level, and pick the ones that would cause the *least* overlap, i think we could save ourselves a bunch of write amplification by keeping the hot keys low in the leveling. I guess we need to have a threshold that the most overlapping sstables should eventually be compacted together as well, but maybe they should stay in the same level they are? Approach (say MOLO = 2); * L0 - L1 is kept the same, flushed L0 sstables are bound to overlap everything in L1 anyway * Find a set of sstables in L1 that would create the least overlap in L2 and compact those together * If we have many (4?) highly overlapping sstables in L1, compact those together, but make them stay in L1 We probably need a better metric for overlappiness here as well (CASSANDRA-6474 or by using the HyperLogLog component in CompactionMetadata) WDYT, does it make sense? Very likely I'm overthinking this and due to the fact that each sstable contain a wide range of keys (atleast in L1) it might not make a difference. Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087722#comment-14087722 ] Jonathan Ellis commented on CASSANDRA-7409: --- We'd want to compare read performance post-compaction as well, right? We have some 8-disk machines that [~lyubent] can test concurrent compactors with. Can you specify specifically what scenario you want to test? Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086767#comment-14086767 ] Carl Yeksigian commented on CASSANDRA-7409: --- I've updated the patch to combine OCS and LCS, and to run STCS in L0 if MOLO = 0. I ran a test to compare the performance between LCS and OLCS, which uses the incremental backup and then running compaction on the results. - 3:04 to compact using LCS w/ STCS. - 2:50 to compact using OLCS w/ MOLO = 2. Another round of testing is needed, specifically around compactions which can take advantage of concurrent compactors. One benefit of OLCS is that concurrent compactors can be working between L0 and L1, up to the max overlapping level. Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077543#comment-14077543 ] Marcus Eriksson commented on CASSANDRA-7409: We should probably make this behave as standard LCS when MAX_OVERLAPPING_LEVEL_OPTION is 0 and not have a separate compaction strategy Benchmarks will be very interesting Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077849#comment-14077849 ] Jonathan Ellis commented on CASSANDRA-7409: --- If you mean that we should add MOLO to standard LCS instead of adding a new strategy, then that's my first reaction too. Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077856#comment-14077856 ] Jonathan Ellis commented on CASSANDRA-7409: --- I don't see any show stoppers to this approach when comparing LCS/OCS and LM/OM. Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077871#comment-14077871 ] Carl Yeksigian commented on CASSANDRA-7409: --- No, it should be easy to do, I just wanted to keep it separate to be able to compare them quickly (and compare the code). Just need to remember to add back the STCS in L0 that was there before if MOLO = 0. Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077156#comment-14077156 ] Carl Yeksigian commented on CASSANDRA-7409: --- I have a first cut of this working now at https://github.com/carlyeks/cassandra/tree/overlapping This adds a new compaction strategy called 'Overlapping', which operates mostly the same as 'Leveled' when max_overlapping_level is configured to 0, except L0 does not do any STCS. When max_overlapping_level is set to non-zero, it will compact without selecting non-overlapping sstables, and will not include any sstables from an upper level. Also, added a new nodetool command to list the sstables in each level for both leveled and overlapping. I haven't benchmarked this strategy yet to compare with regular leveled; that's going to be what I work on next for this. Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.2#6252)