At the cost of really quite a lot of compaction, you can temporarily switch to SizeTiered, and when that is completely done (check each node), switch back to Leveled.
it's like doing the laundry twice :) I've done this on CFs that were about 5GB but I don't see why it wouldn't work on larger ones. ml On Fri, Apr 11, 2014 at 1:33 PM, Paulo Ricardo Motta Gomes < paulo.mo...@chaordicsystems.com> wrote: > This thread is really informative, thanks for the good feedback. > > My question is : Is there a way to force tombstones to be clared with LCS? > Does scrub help in any case? Or the only solution would be to create a new > CF and migrate all the data if you intend to do a large CF cleanup? > > Cheers, > > > On Fri, Apr 11, 2014 at 2:02 PM, Mark Reddy <mark.re...@boxever.com>wrote: > >> Thats great Will, if you could update the thread with the actions you >> decide to take and the results that would be great. >> >> >> Mark >> >> >> On Fri, Apr 11, 2014 at 5:53 PM, William Oberman < >> ober...@civicscience.com> wrote: >> >>> I've learned a *lot* from this thread. My thanks to all of the >>> contributors! >>> >>> Paulo: Good luck with LCS. I wish I could help there, but all of my >>> CF's are SizeTiered (mostly as I'm on the same schema/same settings since >>> 0.7...) >>> >>> will >>> >>> >>> >>> On Fri, Apr 11, 2014 at 12:14 PM, Mina Naguib <mina.nag...@adgear.com>wrote: >>> >>>> >>>> Levelled Compaction is a wholly different beast when it comes to >>>> tombstones. >>>> >>>> The tombstones are inserted, like any other write really, at the lower >>>> levels in the leveldb hierarchy. >>>> >>>> They are only removed after they have had the chance to "naturally" >>>> migrate upwards in the leveldb hierarchy to the highest level in your data >>>> store. How long that takes depends on: >>>> 1. The amount of data in your store and the number of levels your LCS >>>> strategy has >>>> 2. The amount of new writes entering the bottom funnel of your leveldb, >>>> forcing upwards compaction and combining >>>> >>>> To give you an idea, I had a similar scenario and ran a (slow, >>>> throttled) delete job on my cluster around December-January. Here's a >>>> graph of the disk space usage on one node. Notice the still-diclining >>>> usage long after the cleanup job has finished (sometime in January). I >>>> tend to think of tombstones in LCS as little bombs that get to explode much >>>> later in time: >>>> >>>> http://mina.naguib.ca/images/tombstones-cassandra-LCS.jpg >>>> >>>> >>>> >>>> On 2014-04-11, at 11:20 AM, Paulo Ricardo Motta Gomes < >>>> paulo.mo...@chaordicsystems.com> wrote: >>>> >>>> I have a similar problem here, I deleted about 30% of a very large CF >>>> using LCS (about 80GB per node), but still my data hasn't shrinked, even if >>>> I used 1 day for gc_grace_seconds. Would nodetool scrub help? Does nodetool >>>> scrub forces a minor compaction? >>>> >>>> Cheers, >>>> >>>> Paulo >>>> >>>> >>>> On Fri, Apr 11, 2014 at 12:12 PM, Mark Reddy <mark.re...@boxever.com>wrote: >>>> >>>>> Yes, running nodetool compact (major compaction) creates one large >>>>> SSTable. This will mess up the heuristics of the SizeTiered strategy (is >>>>> this the compaction strategy you are using?) leading to multiple 'small' >>>>> SSTables alongside the single large SSTable, which results in increased >>>>> read latency. You will incur the operational overhead of having to manage >>>>> compactions if you wish to compact these smaller SSTables. For all these >>>>> reasons it is generally advised to stay away from running compactions >>>>> manually. >>>>> >>>>> Assuming that this is a production environment and you want to keep >>>>> everything running as smoothly as possible I would reduce the gc_grace on >>>>> the CF, allow automatic minor compactions to kick in and then increase the >>>>> gc_grace once again after the tombstones have been removed. >>>>> >>>>> >>>>> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman < >>>>> ober...@civicscience.com> wrote: >>>>> >>>>>> So, if I was impatient and just "wanted to make this happen now", I >>>>>> could: >>>>>> >>>>>> 1.) Change GCGraceSeconds of the CF to 0 >>>>>> 2.) run nodetool compact (*) >>>>>> 3.) Change GCGraceSeconds of the CF back to 10 days >>>>>> >>>>>> Since I have ~900M tombstones, even if I miss a few due to >>>>>> impatience, I don't care *that* much as I could re-run my clean up tool >>>>>> against the now much smaller CF. >>>>>> >>>>>> (*) A long long time ago I seem to recall reading advice about "don't >>>>>> ever run nodetool compact", but I can't remember why. Is there any bad >>>>>> long term consequence? Short term there are several: >>>>>> -a heavy operation >>>>>> -temporary 2x disk space >>>>>> -one big SSTable afterwards >>>>>> But moving forward, everything is ok right? >>>>>> CommitLog/MemTable->SStables, minor compactions that merge SSTables, >>>>>> etc... The only flaw I can think of is it will take forever until the >>>>>> SSTable minor compactions build up enough to consider including the big >>>>>> SSTable in a compaction, making it likely I'll have to self manage >>>>>> compactions. >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy >>>>>> <mark.re...@boxever.com>wrote: >>>>>> >>>>>>> Correct, a tombstone will only be removed after gc_grace period has >>>>>>> elapsed. The default value is set to 10 days which allows a great deal >>>>>>> of >>>>>>> time for consistency to be achieved prior to deletion. If you are >>>>>>> operationally confident that you can achieve consistency via >>>>>>> anti-entropy >>>>>>> repairs within a shorter period you can always reduce that 10 day >>>>>>> interval. >>>>>>> >>>>>>> >>>>>>> Mark >>>>>>> >>>>>>> >>>>>>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman < >>>>>>> ober...@civicscience.com> wrote: >>>>>>> >>>>>>>> I'm seeing a lot of articles about a dependency between removing >>>>>>>> tombstones and GCGraceSeconds, which might be my problem (I just >>>>>>>> checked, >>>>>>>> and this CF has GCGraceSeconds of 10 days). >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli < >>>>>>>> tbarbu...@gmail.com> wrote: >>>>>>>> >>>>>>>>> compaction should take care of it; for me it never worked so I run >>>>>>>>> nodetool compaction on every node; that does it. >>>>>>>>> >>>>>>>>> >>>>>>>>> 2014-04-11 16:05 GMT+02:00 William Oberman < >>>>>>>>> ober...@civicscience.com>: >>>>>>>>> >>>>>>>>> I'm wondering what will clear tombstoned rows? nodetool cleanup, >>>>>>>>>> nodetool repair, or time (as in just wait)? >>>>>>>>>> >>>>>>>>>> I had a CF that was more or less storing session information. >>>>>>>>>> After some time, we decided that one piece of this information was >>>>>>>>>> pointless to track (and was 90%+ of the columns, and in 99% of those >>>>>>>>>> cases >>>>>>>>>> was ALL columns for a row). I wrote a process to remove all of >>>>>>>>>> those >>>>>>>>>> columns (which again in a vast majority of cases had the effect of >>>>>>>>>> removing >>>>>>>>>> the whole row). >>>>>>>>>> >>>>>>>>>> This CF had ~1 billion rows, so I expect to be left with ~100m >>>>>>>>>> rows. After I did this mass delete, everything was the same size on >>>>>>>>>> disk >>>>>>>>>> (which I expected, knowing how tombstoning works). It wasn't 100% >>>>>>>>>> clear to >>>>>>>>>> me what to poke to cause compactions to clear the tombstones. First >>>>>>>>>> I >>>>>>>>>> tried nodetool cleanup on a candidate node. But, afterwards the >>>>>>>>>> disk usage >>>>>>>>>> was the same. Then I tried nodetool repair on that same node. But >>>>>>>>>> again, >>>>>>>>>> disk usage is still the same. The CF has no snapshots. >>>>>>>>>> >>>>>>>>>> So, am I misunderstanding something? Is there another operation >>>>>>>>>> to try? Do I have to "just wait"? I've only done cleanup/repair on >>>>>>>>>> one >>>>>>>>>> node. Do I have to run one or the other over all nodes to clear >>>>>>>>>> tombstones? >>>>>>>>>> >>>>>>>>>> Cassandra 1.2.15 if it matters, >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> >>>>>>>>>> will >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> *Paulo Motta* >>>> >>>> Chaordic | *Platform* >>>> *www.chaordic.com.br <http://www.chaordic.com.br/>* >>>> +55 48 3232.3200 >>>> >>>> >>>> >>> >>> >>> >> > > > -- > *Paulo Motta* > > Chaordic | *Platform* > *www.chaordic.com.br <http://www.chaordic.com.br/>* > +55 48 3232.3200 >