[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2016-01-19 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106866#comment-15106866
 ] 

Carl Yeksigian commented on CASSANDRA-7409:
---

I've been working on this on and off; here is the latest 
[branch|https://github.com/carlyeks/cassandra/commits/ticket/7409]. I think 
some of the changes recently will improve the same issues here: CASSANDRA-6696 
can have more simultaneous compactions (limit is # of disks), and 
CASSANDRA-10540 will further improve that (limit is # of ranges).

I still think this has merits, but in order to instrument this, I've focused on 
adding the additional logging support in CASSANDRA-10805, which has been useful 
in figuring out what exactly is going on with these compactions. I still 
haven't been able to find the cause of the poor performance in the L0 selection 
when MOLO = 0.

> Allow multiple overlapping sstables in L1
> -
>
> Key: CASSANDRA-7409
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Carl Yeksigian
>Assignee: Carl Yeksigian
>  Labels: compaction
> Fix For: 3.x
>
>
> Currently, when a normal L0 compaction takes place (not STCS), we take up to 
> MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
> compact them together. If we didn't have to deal with the overlapping L1 
> tables, we could compact a higher number of L0 sstables together into a set 
> of non-overlapping L1 sstables.
> This could be done by delaying the invariant that L1 has no overlapping 
> sstables. Going from L1 to L2, we would be compacting fewer sstables together 
> which overlap.
> When reading, we will not have the same one sstable per level (except L0) 
> guarantee, but this can be bounded (once we have too many sets of sstables, 
> either compact them back into the same level, or compact them up to the next 
> level).
> This could be generalized to allow any level to be the maximum for this 
> overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2016-01-14 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098186#comment-15098186
 ] 

Sylvain Lebresne commented on CASSANDRA-7409:
-

Where are we with this? This feels like a shame to let all that code rotting.

> Allow multiple overlapping sstables in L1
> -
>
> Key: CASSANDRA-7409
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Carl Yeksigian
>Assignee: Carl Yeksigian
>  Labels: compaction
> Fix For: 3.x
>
>
> Currently, when a normal L0 compaction takes place (not STCS), we take up to 
> MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
> compact them together. If we didn't have to deal with the overlapping L1 
> tables, we could compact a higher number of L0 sstables together into a set 
> of non-overlapping L1 sstables.
> This could be done by delaying the invariant that L1 has no overlapping 
> sstables. Going from L1 to L2, we would be compacting fewer sstables together 
> which overlap.
> When reading, we will not have the same one sstable per level (except L0) 
> guarantee, but this can be bounded (once we have too many sets of sstables, 
> either compact them back into the same level, or compact them up to the next 
> level).
> This could be generalized to allow any level to be the maximum for this 
> overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2015-05-09 Thread Alan Boudreault (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536739#comment-14536739
 ] 

Alan Boudreault commented on CASSANDRA-7409:


Tests are done, no new blockers experienced during the runs: 
https://drive.google.com/drive/u/0/folders/0BwZ_GPM33j6KfktyN29kelQzd3NEYnNhTnpfajE2UDRwTTUtQkxwQVQ4YnpqaEMxSUk4TXM

We do see some bad performance for standard LCS for Like and Temperature 
scenarios. I will compare them with 2.1 to ensure it's not a new issue.

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.x


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2015-05-06 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530307#comment-14530307
 ] 

Marcus Eriksson commented on CASSANDRA-7409:


[~carlyeks] what is the status now? Waiting for another round of benchmarks?

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.x


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2015-05-06 Thread Alan Boudreault (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530563#comment-14530563
 ] 

Alan Boudreault commented on CASSANDRA-7409:


The last issue I had (CASSANDRA-9240) is resolve. The 2 last patterns to run 
will be done by the end of the week.

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.x


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2015-04-22 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507724#comment-14507724
 ] 

Carl Yeksigian commented on CASSANDRA-7409:
---

I rebased and pushed a new 
[branch|https://github.com/carlyeks/cassandra/tree/overlapping-rebased] for 
these tests; there have been fixes on trunk which weren't included in the 
branch and were throwing off the results.

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.0


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2015-04-17 Thread Alan Boudreault (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500053#comment-14500053
 ] 

Alan Boudreault commented on CASSANDRA-7409:


All scenario with basic patterns have been run. Same url than above.

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.0


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2015-04-14 Thread Alan Boudreault (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494737#comment-14494737
 ] 

Alan Boudreault commented on CASSANDRA-7409:


FYI, there are 2 performance scenarios done: 
https://drive.google.com/drive/u/0/folders/0BwZ_GPM33j6KfktyN29kelQzd3NEYnNhTnpfajE2UDRwTTUtQkxwQVQ4YnpqaEMxSUk4TXM

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.0


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2015-03-19 Thread Alan Boudreault (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370108#comment-14370108
 ] 

Alan Boudreault commented on CASSANDRA-7409:


[~carlyeks]  I am going to start some work for those performance tests. (stress 
user profile etc.)

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.0


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2015-03-17 Thread Alan Boudreault (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365250#comment-14365250
 ] 

Alan Boudreault commented on CASSANDRA-7409:


Devs, I've written a first version of the test plan for the compaction 
strategies. See: 
https://github.com/riptano/cassandra-test-plans/wiki/Compaction-performance-test-plan

All comments/additions are welcome.

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.0


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2015-02-22 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332345#comment-14332345
 ] 

Carl Yeksigian commented on CASSANDRA-7409:
---

I've pushed up an updated branch which addresses these concerns. I can rebase 
if it looks good.

The reason that I used the sstable count instead of size in total bytes is I'm 
trying to find a level which has a lot of small files. If the level is 
oversized, it will go through a normal compaction, but if there are too many 
sstables, we don't catch that anywhere.
It was originally in case we had a situation like in L0 where you write a lot 
of small files, they get compacted together and produce another small file, and 
the compaction doesn't include other L1 files so that there is either a small 
number or a larger file.

I like the ideas for the improvements; both definitely worth investigating.

I'll discuss a plan for testing this with [~enigmacurry] this week.

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.0


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2015-02-11 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316241#comment-14316241
 ] 

Marcus Eriksson commented on CASSANDRA-7409:


bq. I'm a little suspicious of the MOLO=0 results
Me too, looking at the code and running a few short tests indicate that they 
are identical in what compaction candidates they pick - we should probably run 
a few more tests before committing this.

* LCS.getScanners - different checks if level = 0 and level = mol - fold them 
up? (The comment above the level = 0 check is no longer valid - an sstable can 
never be in -1)
* When getting candidates for same level compaction, we always start from the 
smallest token, should probably record last token we picked and start from that?
* In getCandidatesForSameLevelCompaction, why are we including next level when 
level == maxOverlapping? Feels like we should be able to do an in-level 
compaction even though the next level is non-overlapping.
* In getCompactionCandidates, what is the reason the score is based on number 
of sstables instead of level size in total bytes?
* In getCompactionCandidates, this looks wrong to me: {{SetSSTableReader 
candidates = Sets.union(Collections.singleton(sstable), 
Sets.union(overlapping(sstable, getLevel(level)), overlapping(sstable, 
getLevel(level + 1;}} - We should grab all the sstables in {{level + 1}} 
that overlap with any of the sstables we pick from {{level}}. Running stress 
against this with an {{assert false;}} in LeveledManifest.add(...) if we cannot 
add the sstable will show any overlaps (the only valid case of when this 
happens is after an incremental repair and we move sstables from unrepaired 
manifest to repaired manifest).

Nits:
* Brace on newline in StorageService.getManifestDescription and 
LM.calculateOverlappingScore
* nodetool getmanifest description could be something more generic like get 
compaction manifest

Random thoughts/future improvements(?):
* What if we made the maximum overlapping level contain as much data as the 
first non-overlapping level? Then the sstables would cover approximately the 
same ranges and we could probably run more compactions in parallel between 
those levels (it would probably increase write amplification though so I'm 
unsure if there would be any benefits). We could also, in theory, bump sstables 
to higher level without compaction, ie, say that we have maxOverlappingLevel = 
2, we run a compaction with one L3-sstable and 10 L4 sstables, this creates a 
gap in level 3, and it could be possible to take an L2-sstable and just bump it 
up to L3
* Improve the overlap score using CompactionMetadata.cardinalityEstimator (in a 
new ticket though)


 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.0


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2015-01-15 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278871#comment-14278871
 ] 

Carl Yeksigian commented on CASSANDRA-7409:
---

It works in much the same way as was described before; except that:

- When promoting to a level which is not overlapping, the selection will take 
all overlapping sstables to compact, rather than just a single one. This is how 
it was supposed to work initially, but was actually hitting a different case 
instead.
- When selecting the level which needs to be compacted, it starts from 0 going 
up to MOLO - 1, then from the max level down to MOLO. This means that we try to 
compact any overlapping levels first, and if we don't find anything, then we'll 
compact according to our previous ways of compacting.

I'm a little suspicious of the MOLO=0 results, now that I've specified the 
changes; it should more closely mirror the results I got from LCS w/o STCS.

It is sensitive to the length of the test; L5 allows for as many overlaps as 
necessary in this test. The levels shouldn't have any overlap, though; it won't 
stop compactions until none of the levels have overlap. The L2 runs slower 
because it is doing more I/O because of the non-overlapping.

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.0


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2015-01-15 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278788#comment-14278788
 ] 

Jonathan Ellis commented on CASSANDRA-7409:
---

What does your updated algorithm look like now?

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.0


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2015-01-15 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278798#comment-14278798
 ] 

Jonathan Ellis commented on CASSANDRA-7409:
---

Also, how sensitive is this to the test duration?  Is MOLO=5 the magic number 
because we generate about 5 L1's worth of data before the test ends?

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.0


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2015-01-14 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277372#comment-14277372
 ] 

Carl Yeksigian commented on CASSANDRA-7409:
---

I've pushed up an updated branch at: 
https://github.com/carlyeks/cassandra/tree/overlapping-better-selection

The compaction selection has been updated so that it performs much better now. 
The biggest issue was selecting a single sstable for the overlapping 
compactions, instead of the that one plus the overlapping ones. It performs 
much better now.

|| || Operation Time || Compaction Time ||
| MOLO=0 | 3:52:46 | 0:21:04 |
| MOLO=2 | 3:45:52 | 0:37:50 | 
| MOLO=5 | 3:42:59 | 0:03:17 |
| LCS w/ STCS | 3:48:14 | 0:50:24 |
| LCS w/o STCS | 3:50:38 | 1:05:02 |

The performance on spinning disk is also improved by allowing overlapping; here 
are the results of a read operation after running a large mixed read/write 
workload: 
http://cstar.datastax.com/graph?stats=e113f706-9b54-11e4-9f2c-42010af0688fmetric=op_rateoperation=2_usersmoothing=1show_aggregates=truexmin=0xmax=121.88ymin=0ymax=113984.2

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.0


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2014-11-11 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206385#comment-14206385
 ] 

Marcus Eriksson commented on CASSANDRA-7409:


Did you keep the logs? I guess we could estimate how much write amplification 
we do by summing up the amounts in the Compacted ... X bytes to Y lines

Why do we pick candidates from the bottom now? Could we perhaps first check 
higher levels down to the MOLO-level and then bottom-up in the levels that 
allow overlap?

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.0


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2014-11-11 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206931#comment-14206931
 ] 

Carl Yeksigian commented on CASSANDRA-7409:
---

It appears that the L2-L3 compactions are including everything in L3 (kind of 
like L0-L1 now). This shouldn't happen, so it seems like it isn't choosing the 
best sstables to compact together in L3. It is the same in L0-L1 in the new 
LCS run.

The reason that we pick candidates from the bottom is that we want to use the 
IO we have on making progress on pushing data through the levels. This means 
that with overlapping, we want to get as much data as possible out of L0 at 
each compaction, but we aren't wasting IO because we shouldn't do any rewriting 
of data from L1 until we have nothing else to compact.

There is going to be an issue when we have MOLO=0, because we can't do anything 
about that overlapping, so it makes sense to keep the old behaviour in at least 
that case.

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.0


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2014-11-06 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200322#comment-14200322
 ] 

Carl Yeksigian commented on CASSANDRA-7409:
---

It's the default settings -- no overlapping in L1, but the compaction 
candidates are selected slightly differently compared to before (starting at L0 
and going up instead of the other way).

It's possible that the way this selection is happening is prolonging the 
compaction process tremendously; I hadn't seen the same behavior locally, so 
I'll need to figure out a different way to debug.

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.0


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2014-11-05 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198665#comment-14198665
 ] 

Carl Yeksigian commented on CASSANDRA-7409:
---

I've finished running a performance test; it's available 
[here|http://cstar.datastax.com/tests/id/162b4540-6386-11e4-a260-bc764e04482c].

The branches tested are:
- Overlapping LCS
- New LCS, which does not use STCS in L0, but selects the compactions in a 
different order
- Old LCS with STCS
- Old LCS without STCS

The time to complete the test for each:
|| || Overlapping LCS || New LCS || Old LCS/STCS || Old LCS ||
| Operations | 1:05:47 | 1:00:18 | 1:24:23 | 1:15:57 |
| Compactions to 0 | 4:59:21 | 9:56:11 | 1:12:59 | 1:42:33 |

Unfortunately, the improvements to LCS cause the compaction time (to get to 0 
pending compactions) to be significantly worse. Going through the logs, it 
seems that there are many fewer timeouts generated by the new configurations, 
so it is providing better latency.

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.0


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2014-11-05 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199161#comment-14199161
 ] 

Jonathan Ellis commented on CASSANDRA-7409:
---

Is new LCS just overlapping LCS + cassandra.disable_stcs_in_l0, or 
something else?

Intuitively I'm not sure what to make of Overlapping doing so much worse in 
getting to zero-pending than both LCS+STCS and LCS alone.  Any insight there?

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.0


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2014-10-08 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163408#comment-14163408
 ] 

Marcus Eriksson commented on CASSANDRA-7409:


Do you think we could generalize and remove the {code}if (maxOverlappingLevel 
== 0){code} parts? Feels like we could get approximately the same behavior if 
we do a same-level compaction instead of STCS in L0?

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.0


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2014-10-08 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163555#comment-14163555
 ] 

Carl Yeksigian commented on CASSANDRA-7409:
---

The other change between the two modes is that in LCS currently we check from 
the highest level down to find a compaction, this goes from the lowest level up.

We would get some benefits from doing an L0 same-level compaction, so I think 
that sounds like a good idea.

I fixed a bug that was preventing concurrent L0-L1 compactions, so the commit 
changed recently. I'm going to start working with [~enigmacurry] to do 
performance testing on this.

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.0


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2014-09-23 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144912#comment-14144912
 ] 

Carl Yeksigian commented on CASSANDRA-7409:
---

This was the original purpose of the same level compaction; it was supposed to 
place the results of the compaction into the same level. It seems that I broke 
it at some point, but with that correction, it should behave correctly for the 
last part.

For the earlier, it makes sense to try to include the sstables which overlap 
the least into an uplevel compaction. This will keep the keys which are being 
written to a lot in a level which allows for overlapping.

I'm happy to change the overlap estimator, assuming we can figure out which one 
we'd like to say is going to be our estimator going forward. If it's still 
going to be experimental, I think I'd rather just leave it as a really rough 
estimator of the overlap, and change it afterwards.

There are two measures against vanilla LCS that we want to test.

- L0 Compaction
  No reads, no writes. Take sstables and dump into L0.
  Metric: Time to 0 compactions remaining.

- Heavy write
  Heavy writes (such that LCS is overwhelmed), some reads
  Metric: read 0.99

I expect that L0 compaction times should be similar between LCS w/ STCS and 
OCS, with OCS being slightly slower. Under the heavy write scenario, however, 
there should be a large benefit to using OCS.


 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.0


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2014-09-03 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119824#comment-14119824
 ] 

Marcus Eriksson commented on CASSANDRA-7409:


I think the within-level compaction could be reworked a bit, now we find which 
sstables are overlapping the most, and compact those into the next level.

Since sstables that overlap a lot are bound to contain the same keys, it means 
we are compacting hot partitions into the next level, where it is likely to be 
overlapping with a bunch of sstables again causing another compaction into the 
next level.

If we instead try to guess how much overlap a set of sstables would cause in 
the next level, and pick the ones that would cause the *least* overlap, i think 
we could save ourselves a bunch of write amplification by keeping the hot keys 
low in the leveling.

I guess we need to have a threshold that the most overlapping sstables should 
eventually be compacted together as well, but maybe they should stay in the 
same level they are?

Approach (say MOLO = 2);
* L0 - L1 is kept the same, flushed L0 sstables are bound to overlap 
everything in L1 anyway
* Find a set of sstables in L1 that would create the least overlap in L2 and 
compact those together
* If we have many (4?) highly overlapping sstables in L1, compact those 
together, but make them stay in L1

We probably need a better metric for overlappiness here as well (CASSANDRA-6474 
or by using the HyperLogLog component in CompactionMetadata)

WDYT, does it make sense? Very likely I'm overthinking this and due to the fact 
that each sstable contain a wide range of keys (atleast in L1) it might not 
make a difference.


 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian

 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2014-08-06 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087722#comment-14087722
 ] 

Jonathan Ellis commented on CASSANDRA-7409:
---

We'd want to compare read performance post-compaction as well, right?

We have some 8-disk machines that [~lyubent] can test concurrent compactors 
with.  Can you specify specifically what scenario you want to test?

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian

 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2014-08-05 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086767#comment-14086767
 ] 

Carl Yeksigian commented on CASSANDRA-7409:
---

I've updated the patch to combine OCS and LCS, and to run STCS in L0
if MOLO = 0.

I ran a test to compare the performance between LCS and OLCS, which
uses the incremental backup and then running compaction on the
results.
- 3:04 to compact using LCS w/ STCS.
- 2:50 to compact using OLCS w/ MOLO = 2.

Another round of testing is needed, specifically around compactions
which can take advantage of concurrent compactors. One benefit of OLCS
is that concurrent compactors can be working between L0 and L1, up to
the max overlapping level.


 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian

 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2014-07-29 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077543#comment-14077543
 ] 

Marcus Eriksson commented on CASSANDRA-7409:


We should probably make this behave as standard LCS when 
MAX_OVERLAPPING_LEVEL_OPTION is 0 and not have a separate compaction strategy

Benchmarks will be very interesting

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian

 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2014-07-29 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077849#comment-14077849
 ] 

Jonathan Ellis commented on CASSANDRA-7409:
---

If you mean that we should add MOLO to standard LCS instead of adding a new 
strategy, then that's my first reaction too.

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian

 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2014-07-29 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077856#comment-14077856
 ] 

Jonathan Ellis commented on CASSANDRA-7409:
---

I don't see any show stoppers to this approach when comparing LCS/OCS and LM/OM.

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian

 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2014-07-29 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077871#comment-14077871
 ] 

Carl Yeksigian commented on CASSANDRA-7409:
---

No, it should be easy to do, I just wanted to keep it separate to be able to 
compare them quickly (and compare the code).

Just need to remember to add back the STCS in L0 that was there before if MOLO 
= 0.

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian

 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2014-07-28 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077156#comment-14077156
 ] 

Carl Yeksigian commented on CASSANDRA-7409:
---

I have a first cut of this working now at 
https://github.com/carlyeks/cassandra/tree/overlapping

This adds a new compaction strategy called 'Overlapping', which operates mostly 
the same as 'Leveled' when max_overlapping_level is configured to 0, except L0 
does not do any STCS. When max_overlapping_level is set to non-zero, it will 
compact without selecting non-overlapping sstables, and will not include any 
sstables from an upper level.

Also, added a new nodetool command to list the sstables in each level for both 
leveled and overlapping. 

I haven't benchmarked this strategy yet to compare with regular leveled; that's 
going to be what I work on next for this.

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian

 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)