[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction

2015-01-07 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268393#comment-14268393
 ] 

Jonathan Ellis commented on CASSANDRA-7019:
---

Did you mark 7272 as duplicate by mistake instead of this one?

 Major tombstone compaction
 --

 Key: CASSANDRA-7019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
  Labels: compaction
 Fix For: 3.0


 It should be possible to do a major tombstone compaction by including all 
 sstables, but writing them out 1:1, meaning that if you have 10 sstables 
 before, you will have 10 sstables after the compaction with the same data, 
 minus all the expired tombstones.
 We could do this in two ways:
 # a nodetool command that includes _all_ sstables
 # once we detect that an sstable has more than x% (20%?) expired tombstones, 
 we start one of these compactions, and include all overlapping sstables that 
 contain older data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction

2014-12-17 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249979#comment-14249979
 ] 

T Jake Luciani commented on CASSANDRA-7019:
---

Just a note that this should also work on repaired sstables.  As mentioned in 
CASSANDRA-7272 we repair the entire partition so we will end up with N copies 
of a partition in the repaired sstables.

 Major tombstone compaction
 --

 Key: CASSANDRA-7019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
  Labels: compaction
 Fix For: 3.0


 It should be possible to do a major tombstone compaction by including all 
 sstables, but writing them out 1:1, meaning that if you have 10 sstables 
 before, you will have 10 sstables after the compaction with the same data, 
 minus all the expired tombstones.
 We could do this in two ways:
 # a nodetool command that includes _all_ sstables
 # once we detect that an sstable has more than x% (20%?) expired tombstones, 
 we start one of these compactions, and include all overlapping sstables that 
 contain older data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction

2014-12-05 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235992#comment-14235992
 ] 

Jonathan Ellis commented on CASSANDRA-7019:
---

That probably fits how compaction thinks of its job, better than trying to do 
it 1:1.  +1 from me.

 Major tombstone compaction
 --

 Key: CASSANDRA-7019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
  Labels: compaction
 Fix For: 3.0


 It should be possible to do a major tombstone compaction by including all 
 sstables, but writing them out 1:1, meaning that if you have 10 sstables 
 before, you will have 10 sstables after the compaction with the same data, 
 minus all the expired tombstones.
 We could do this in two ways:
 # a nodetool command that includes _all_ sstables
 # once we detect that an sstable has more than x% (20%?) expired tombstones, 
 we start one of these compactions, and include all overlapping sstables that 
 contain older data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction

2014-12-02 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231398#comment-14231398
 ] 

Marcus Eriksson commented on CASSANDRA-7019:


WDYT [~jbellis] should I finish up the patch as a major compaction for LCS?

 Major tombstone compaction
 --

 Key: CASSANDRA-7019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
  Labels: compaction
 Fix For: 3.0


 It should be possible to do a major tombstone compaction by including all 
 sstables, but writing them out 1:1, meaning that if you have 10 sstables 
 before, you will have 10 sstables after the compaction with the same data, 
 minus all the expired tombstones.
 We could do this in two ways:
 # a nodetool command that includes _all_ sstables
 # once we detect that an sstable has more than x% (20%?) expired tombstones, 
 we start one of these compactions, and include all overlapping sstables that 
 contain older data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction

2014-10-02 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156144#comment-14156144
 ] 

Marcus Eriksson commented on CASSANDRA-7019:


bq. Just rewrite the existing tables minus tombstones without merging or 
changing levels, as originally proposed

yeah this is probably the way to go, just hurts to have the compacted partition 
in-hand and throw it away


 Major tombstone compaction
 --

 Key: CASSANDRA-7019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
  Labels: compaction
 Fix For: 3.0


 It should be possible to do a major tombstone compaction by including all 
 sstables, but writing them out 1:1, meaning that if you have 10 sstables 
 before, you will have 10 sstables after the compaction with the same data, 
 minus all the expired tombstones.
 We could do this in two ways:
 # a nodetool command that includes _all_ sstables
 # once we detect that an sstable has more than x% (20%?) expired tombstones, 
 we start one of these compactions, and include all overlapping sstables that 
 contain older data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction

2014-10-02 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156277#comment-14156277
 ] 

Aleksey Yeschenko commented on CASSANDRA-7019:
--

I dislike option 1, for personal reasons. Was really hoping to utilize major 
tombstone compaction for CASSANDRA-7975. Without that, I don't see a direct 
solution for LCS counter tables.

 Major tombstone compaction
 --

 Key: CASSANDRA-7019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
  Labels: compaction
 Fix For: 3.0


 It should be possible to do a major tombstone compaction by including all 
 sstables, but writing them out 1:1, meaning that if you have 10 sstables 
 before, you will have 10 sstables after the compaction with the same data, 
 minus all the expired tombstones.
 We could do this in two ways:
 # a nodetool command that includes _all_ sstables
 # once we detect that an sstable has more than x% (20%?) expired tombstones, 
 we start one of these compactions, and include all overlapping sstables that 
 contain older data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction

2014-10-02 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156278#comment-14156278
 ] 

Marcus Eriksson commented on CASSANDRA-7019:


[~iamaleksey] we could get rid of the shards in the Reducer, the same way we 
would get rid of tombstones (ie, merge all counter shards and put them in a 
single sstable)

though, would perhaps be nice to keep this as a major compaction for LCS in 
addition to option 1

 Major tombstone compaction
 --

 Key: CASSANDRA-7019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
  Labels: compaction
 Fix For: 3.0


 It should be possible to do a major tombstone compaction by including all 
 sstables, but writing them out 1:1, meaning that if you have 10 sstables 
 before, you will have 10 sstables after the compaction with the same data, 
 minus all the expired tombstones.
 We could do this in two ways:
 # a nodetool command that includes _all_ sstables
 # once we detect that an sstable has more than x% (20%?) expired tombstones, 
 we start one of these compactions, and include all overlapping sstables that 
 contain older data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction

2014-10-02 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156595#comment-14156595
 ] 

Jeremiah Jordan commented on CASSANDRA-7019:


I think we should definitely keep the major compact LCS option. And change 
STCS major compaction to this new way. If we also want to have the 
delete/tombstone only compaction that just rewrites existing files without 
expired tombstones and deleted data (kind of like what half the people out 
there think cleanup currently does), that works too.

Though I think that mode is a little harder to implement if we want it to 
remove tombstones and dead data, as you need to rewrite multiple files at once 
for that, and without doing that you aren't going to be able to remove all the 
old tombstones. Just the ones we were being too conservative on throwing out 
because we don't do a full read, but just a bloom filter check, to see if the 
tombstone still shadows existing data..

 Major tombstone compaction
 --

 Key: CASSANDRA-7019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
  Labels: compaction
 Fix For: 3.0


 It should be possible to do a major tombstone compaction by including all 
 sstables, but writing them out 1:1, meaning that if you have 10 sstables 
 before, you will have 10 sstables after the compaction with the same data, 
 minus all the expired tombstones.
 We could do this in two ways:
 # a nodetool command that includes _all_ sstables
 # once we detect that an sstable has more than x% (20%?) expired tombstones, 
 we start one of these compactions, and include all overlapping sstables that 
 contain older data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction

2014-10-01 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155144#comment-14155144
 ] 

Jonathan Ellis commented on CASSANDRA-7019:
---

bq. The problem with starting in high levels is that it will take a long time 
before that data gets included in a (minor) compaction.

But you already have that problem, just with 90% of your data instead of 100%.

IMO the two options that make the most sense are:

# Just rewrite the existing tables minus tombstones without merging or changing 
levels, as originally proposed
# Write all the sstables out, then pick a level for them when complete such 
that all the sstables fit in the level (and they don't overlap with anything 
flushed + compacted by other threads in the meantime)

 Major tombstone compaction
 --

 Key: CASSANDRA-7019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
  Labels: compaction
 Fix For: 3.0


 It should be possible to do a major tombstone compaction by including all 
 sstables, but writing them out 1:1, meaning that if you have 10 sstables 
 before, you will have 10 sstables after the compaction with the same data, 
 minus all the expired tombstones.
 We could do this in two ways:
 # a nodetool command that includes _all_ sstables
 # once we detect that an sstable has more than x% (20%?) expired tombstones, 
 we start one of these compactions, and include all overlapping sstables that 
 contain older data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction

2014-09-25 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147823#comment-14147823
 ] 

Marcus Eriksson commented on CASSANDRA-7019:


branch here: https://github.com/krummas/cassandra/commits/marcuse/7019-2

triggered with nodetool compact -o ks cf

It writes fully compacted partitions - each partition will only be in one 
single sstable  - my first idea was to put the cells back in the corresponding 
files where they were found (minus tombstones), but it felt wrong to not 
actually write the compacted partition out when we have it.

LCS:
* creates an 'optimal' leveling - it takes all existing files, compacts them, 
and starts filling each level from L0 up
** note that (if we have token range 0 - 1000) L1 will get tokens 0-10, L2 
11-100 and L3 101 - 1000. Not though much about if this is good/bad for 
future compactions.

STCS:
* calculates an 'optimal' distribution of sstables, currently it makes them 
50%, 25%, 12.5% ... of total data size until the smallest sstable would be sub 
50MB, then puts all the rest in the last sstable. If anyone has a more optimal 
sstable distribution, please let me know
** the sstables will be non-overlapping, it starts writing the biggest sstable 
first and continues with the rest once 50% is in that

 Major tombstone compaction
 --

 Key: CASSANDRA-7019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
  Labels: compaction

 It should be possible to do a major tombstone compaction by including all 
 sstables, but writing them out 1:1, meaning that if you have 10 sstables 
 before, you will have 10 sstables after the compaction with the same data, 
 minus all the expired tombstones.
 We could do this in two ways:
 # a nodetool command that includes _all_ sstables
 # once we detect that an sstable has more than x% (20%?) expired tombstones, 
 we start one of these compactions, and include all overlapping sstables that 
 contain older data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction

2014-09-25 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147831#comment-14147831
 ] 

Jeremiah Jordan commented on CASSANDRA-7019:


Since this is going in 3.0, maybe we should make this the default nodetool 
compact.  I don't know of any case where the STCS put everything in one file 
is really what people want.  And for LCS all we used to do is run the 
compaction task like normal.  If we still want a way to kick compaction for 
LCS, we could add a new nodetool checkcompaction command or something that 
just schedules the compaction manager to run (and does that for STCS and LCS).  
Doing that is useful when someone changes compaction settings and there are not 
currently writes happening to the system, so making it an explicit command 
sounds right to me.

 Major tombstone compaction
 --

 Key: CASSANDRA-7019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
  Labels: compaction
 Fix For: 3.0


 It should be possible to do a major tombstone compaction by including all 
 sstables, but writing them out 1:1, meaning that if you have 10 sstables 
 before, you will have 10 sstables after the compaction with the same data, 
 minus all the expired tombstones.
 We could do this in two ways:
 # a nodetool command that includes _all_ sstables
 # once we detect that an sstable has more than x% (20%?) expired tombstones, 
 we start one of these compactions, and include all overlapping sstables that 
 contain older data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction

2014-09-25 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147843#comment-14147843
 ] 

Carl Yeksigian commented on CASSANDRA-7019:
---

For LCS, we might be artifically penalizing early tokens. What if we started at 
the highest level which we are currently storing data in instead of at L1? It 
will be a good proxy for the size of the data that we are currently storing, 
and it will avoid unnecessarily recompacting data because we placed it in such 
a low level.

I'm +1 to [~jjordan]'s proposal to change the default to this; I'd rather just 
add an option to compact to start minor compactions instead of adding a new 
command.

 Major tombstone compaction
 --

 Key: CASSANDRA-7019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
  Labels: compaction
 Fix For: 3.0


 It should be possible to do a major tombstone compaction by including all 
 sstables, but writing them out 1:1, meaning that if you have 10 sstables 
 before, you will have 10 sstables after the compaction with the same data, 
 minus all the expired tombstones.
 We could do this in two ways:
 # a nodetool command that includes _all_ sstables
 # once we detect that an sstable has more than x% (20%?) expired tombstones, 
 we start one of these compactions, and include all overlapping sstables that 
 contain older data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction

2014-09-25 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147892#comment-14147892
 ] 

sankalp kohli commented on CASSANDRA-7019:
--

[~carlyeks] Can you explain your idea about putting stables. If the application 
is using upto say L4, we should fill L4 then L3 and so on? 

 Major tombstone compaction
 --

 Key: CASSANDRA-7019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
  Labels: compaction
 Fix For: 3.0


 It should be possible to do a major tombstone compaction by including all 
 sstables, but writing them out 1:1, meaning that if you have 10 sstables 
 before, you will have 10 sstables after the compaction with the same data, 
 minus all the expired tombstones.
 We could do this in two ways:
 # a nodetool command that includes _all_ sstables
 # once we detect that an sstable has more than x% (20%?) expired tombstones, 
 we start one of these compactions, and include all overlapping sstables that 
 contain older data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction

2014-09-25 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147903#comment-14147903
 ] 

Carl Yeksigian commented on CASSANDRA-7019:
---

I was thinking L4, then L5 (as in this patch, currently). Ideally, we would 
pick the level where all of the sstables would fit, but we don't know how many 
sstables will end up being produced by the compaction in the end, so this seems 
like a compromise. This would be similar to the thinking in CASSANDRA-6323.

 Major tombstone compaction
 --

 Key: CASSANDRA-7019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
  Labels: compaction
 Fix For: 3.0


 It should be possible to do a major tombstone compaction by including all 
 sstables, but writing them out 1:1, meaning that if you have 10 sstables 
 before, you will have 10 sstables after the compaction with the same data, 
 minus all the expired tombstones.
 We could do this in two ways:
 # a nodetool command that includes _all_ sstables
 # once we detect that an sstable has more than x% (20%?) expired tombstones, 
 we start one of these compactions, and include all overlapping sstables that 
 contain older data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction

2014-09-25 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148058#comment-14148058
 ] 

Marcus Eriksson commented on CASSANDRA-7019:


The problem with starting in high levels is that it will take a long time 
before that data gets included in a (minor) compaction. This is basically a 
major compaction (like in current STCS)

The option to not putting low tokens in lower levels is to write all levels at 
the same time and randomly distribute the tokens over the levels (and put 1% in 
L1, 10% in L2, 89% in L3), but i cant really see any difference compared to 
having the low tokens in one sstable, the number of overlapping tokens between 
a newly flushed file in L0 and L1 should be the same (if tokens are evenly 
distributed over the flushed sstable)

 Major tombstone compaction
 --

 Key: CASSANDRA-7019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
  Labels: compaction
 Fix For: 3.0


 It should be possible to do a major tombstone compaction by including all 
 sstables, but writing them out 1:1, meaning that if you have 10 sstables 
 before, you will have 10 sstables after the compaction with the same data, 
 minus all the expired tombstones.
 We could do this in two ways:
 # a nodetool command that includes _all_ sstables
 # once we detect that an sstable has more than x% (20%?) expired tombstones, 
 we start one of these compactions, and include all overlapping sstables that 
 contain older data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction

2014-09-25 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148186#comment-14148186
 ] 

Carl Yeksigian commented on CASSANDRA-7019:
---

I have no problem with making it consistent but arbitrary which tokens go into 
L1/L2, just thought it would be better to put all of them in the same level 
since they'll move there eventually. I think you're right, though; they will 
end up not being included in minor compactions, so it would continually require 
a major tombstone compaction.

 Major tombstone compaction
 --

 Key: CASSANDRA-7019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
  Labels: compaction
 Fix For: 3.0


 It should be possible to do a major tombstone compaction by including all 
 sstables, but writing them out 1:1, meaning that if you have 10 sstables 
 before, you will have 10 sstables after the compaction with the same data, 
 minus all the expired tombstones.
 We could do this in two ways:
 # a nodetool command that includes _all_ sstables
 # once we detect that an sstable has more than x% (20%?) expired tombstones, 
 we start one of these compactions, and include all overlapping sstables that 
 contain older data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction

2014-09-24 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146662#comment-14146662
 ] 

sankalp kohli commented on CASSANDRA-7019:
--

[~krummas]  Thanks for picking this up :). I think we can do other 
optimizations like putting all tombstones in the last level so that they can be 
dropped easily when they are past gc grace. Once we have repair aware gc grace, 
it will not be required. 

 Major tombstone compaction
 --

 Key: CASSANDRA-7019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
  Labels: compaction

 It should be possible to do a major tombstone compaction by including all 
 sstables, but writing them out 1:1, meaning that if you have 10 sstables 
 before, you will have 10 sstables after the compaction with the same data, 
 minus all the expired tombstones.
 We could do this in two ways:
 # a nodetool command that includes _all_ sstables
 # once we detect that an sstable has more than x% (20%?) expired tombstones, 
 we start one of these compactions, and include all overlapping sstables that 
 contain older data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction

2014-09-24 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146685#comment-14146685
 ] 

Marcus Eriksson commented on CASSANDRA-7019:


[~kohlisankalp] ill post a proof of concept patch for option 1 in the 
description tomorrow, idea is to basically run a major compaction, but have the 
compaction strategy decide on an 'optimal' sstable distribution for the 
strategy instead of just creating a big one, for LCS it simply fills levels 
from level 1 and up. For STCS it will create sstables where one has 50%, one 
25% of the data, etc until the sstables get too small.

This is mostly for the oh crap we have a ton of tombstones and need to get rid 
of them-case, not for the day-to-day case, need to figure out something more 
for that (like your idea perhaps)

 Major tombstone compaction
 --

 Key: CASSANDRA-7019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
  Labels: compaction

 It should be possible to do a major tombstone compaction by including all 
 sstables, but writing them out 1:1, meaning that if you have 10 sstables 
 before, you will have 10 sstables after the compaction with the same data, 
 minus all the expired tombstones.
 We could do this in two ways:
 # a nodetool command that includes _all_ sstables
 # once we detect that an sstable has more than x% (20%?) expired tombstones, 
 we start one of these compactions, and include all overlapping sstables that 
 contain older data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction

2014-07-14 Thread Alexey Plotnik (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061378#comment-14061378
 ] 

Alexey Plotnik commented on CASSANDRA-7019:
---

That's what we needed. We have LCS and a lot of SSTables. Compaction process 
always retain a lot of tombstones. We need something like super-compaction, or 
tombstone-compaction, or name it as you want. There is must be a procedure 
similar Cleanup, but for tombstones deletions. I like it.

 Major tombstone compaction
 --

 Key: CASSANDRA-7019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
  Labels: compaction

 It should be possible to do a major tombstone compaction by including all 
 sstables, but writing them out 1:1, meaning that if you have 10 sstables 
 before, you will have 10 sstables after the compaction with the same data, 
 minus all the expired tombstones.
 We could do this in two ways:
 # a nodetool command that includes _all_ sstables
 # once we detect that an sstable has more than x% (20%?) expired tombstones, 
 we start one of these compactions, and include all overlapping sstables that 
 contain older data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction

2014-05-21 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005197#comment-14005197
 ] 

sankalp kohli commented on CASSANDRA-7019:
--

I also like this idea. If you have IOPs to spare, why not compact across levels 
and get rid of extra data. 
I think we should call it multilevel compaction. No of tombstones is one way 
to trigger it. 


 Major tombstone compaction
 --

 Key: CASSANDRA-7019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
  Labels: compaction

 It should be possible to do a major tombstone compaction by including all 
 sstables, but writing them out 1:1, meaning that if you have 10 sstables 
 before, you will have 10 sstables after the compaction with the same data, 
 minus all the expired tombstones.
 We could do this in two ways:
 # a nodetool command that includes _all_ sstables
 # once we detect that an sstable has more than x% (20%?) expired tombstones, 
 we start one of these compactions, and include all overlapping sstables that 
 contain older data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)