I tried to do this, however the doubling in disk space is not "temporary" as you state in your note. What am I missing?
On Fri, Apr 11, 2014 at 10:44 AM, William Oberman <ober...@civicscience.com>wrote: > So, if I was impatient and just "wanted to make this happen now", I could: > > 1.) Change GCGraceSeconds of the CF to 0 > 2.) run nodetool compact (*) > 3.) Change GCGraceSeconds of the CF back to 10 days > > Since I have ~900M tombstones, even if I miss a few due to impatience, I > don't care *that* much as I could re-run my clean up tool against the now > much smaller CF. > > (*) A long long time ago I seem to recall reading advice about "don't ever > run nodetool compact", but I can't remember why. Is there any bad long > term consequence? Short term there are several: > -a heavy operation > -temporary 2x disk space > -one big SSTable afterwards > But moving forward, everything is ok right? CommitLog/MemTable->SStables, > minor compactions that merge SSTables, etc... The only flaw I can think of > is it will take forever until the SSTable minor compactions build up enough > to consider including the big SSTable in a compaction, making it likely > I'll have to self manage compactions. > > > > On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <mark.re...@boxever.com>wrote: > >> Correct, a tombstone will only be removed after gc_grace period has >> elapsed. The default value is set to 10 days which allows a great deal of >> time for consistency to be achieved prior to deletion. If you are >> operationally confident that you can achieve consistency via anti-entropy >> repairs within a shorter period you can always reduce that 10 day interval. >> >> >> Mark >> >> >> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman < >> ober...@civicscience.com> wrote: >> >>> I'm seeing a lot of articles about a dependency between removing >>> tombstones and GCGraceSeconds, which might be my problem (I just checked, >>> and this CF has GCGraceSeconds of 10 days). >>> >>> >>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli >>> <tbarbu...@gmail.com>wrote: >>> >>>> compaction should take care of it; for me it never worked so I run >>>> nodetool compaction on every node; that does it. >>>> >>>> >>>> 2014-04-11 16:05 GMT+02:00 William Oberman <ober...@civicscience.com>: >>>> >>>> I'm wondering what will clear tombstoned rows? nodetool cleanup, >>>>> nodetool repair, or time (as in just wait)? >>>>> >>>>> I had a CF that was more or less storing session information. After >>>>> some time, we decided that one piece of this information was pointless to >>>>> track (and was 90%+ of the columns, and in 99% of those cases was ALL >>>>> columns for a row). I wrote a process to remove all of those columns >>>>> (which again in a vast majority of cases had the effect of removing the >>>>> whole row). >>>>> >>>>> This CF had ~1 billion rows, so I expect to be left with ~100m rows. >>>>> After I did this mass delete, everything was the same size on disk (which >>>>> I expected, knowing how tombstoning works). It wasn't 100% clear to me >>>>> what to poke to cause compactions to clear the tombstones. First I tried >>>>> nodetool cleanup on a candidate node. But, afterwards the disk usage was >>>>> the same. Then I tried nodetool repair on that same node. But again, >>>>> disk >>>>> usage is still the same. The CF has no snapshots. >>>>> >>>>> So, am I misunderstanding something? Is there another operation to >>>>> try? Do I have to "just wait"? I've only done cleanup/repair on one >>>>> node. >>>>> Do I have to run one or the other over all nodes to clear tombstones? >>>>> >>>>> Cassandra 1.2.15 if it matters, >>>>> >>>>> Thanks! >>>>> >>>>> will >>>>> >>>> >>>> >>> >>> >>> >> >