If you have a data model with long lived and frequently updated rows, you can get around the "all fragments" problem by running a user defined compaction.
Look for the CompactionManagerMbean on the JMX API https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionManagerMBean.java#L67 Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 5/03/2013, at 1:52 AM, Michal Michalski <mich...@opera.com> wrote: > > I have read in the documentation, that after a major compaction, > > minor compactions are no longer automatically trigger. > > Does this mean, that I have to do the nodetool compact regulary? Or > > is there a way to get back to the automatically minor compactions? > > I think it's one of the most confusing parts of C* docs. > > There's nothing like a "switch" for minor compactions that gets magically > turned off when you trigger major compaction. Minor compactions won't get > trigerred automatically for _some_ time, because you'll only have one > gargantuan SSTable and unless you get enough new (smaller) SSTables to get > them compacted together (4 by default), no compactions will kick in. > > Of course you'll still have one huge SSTable and it will take a lot of time > to get another 3 of similar size to get them compacted. I think that it will > be a problem for your TTL-based data model, as you'll have tons of Tombstones > in the newer/smaller SSTables that you won't be able to compact together with > the huge SSTable containing data. > > BTW: As far as I remember, there was an "external" tool (I don't remember the > name) allowing to split SSTables - I didn't use it, so I can't suggest you > using it, but you may want to give it a try. > > M. > > W dniu 05.03.2013 09:46, Matthias Zeilinger pisze: >> Short question afterwards: >> >> I have read in the documentation, that after a major compaction, minor >> compactions are no longer automatically trigger. >> Does this mean, that I have to do the nodetool compact regulary? Or is there >> a way to get back to the automatically minor compactions? >> >> Thx, >> >> Br, >> Matthias Zeilinger >> Production Operation – Shared Services >> >> P: +43 (0) 50 858-31185 >> M: +43 (0) 664 85-34459 >> E: matthias.zeilin...@bwinparty.com >> >> bwin.party services (Austria) GmbH >> Marxergasse 1B >> A-1030 Vienna >> >> www.bwinparty.com >> >> >> -----Original Message----- >> From: Matthias Zeilinger [mailto:matthias.zeilin...@bwinparty.com] >> Sent: Dienstag, 05. März 2013 08:03 >> To: user@cassandra.apache.org >> Subject: RE: old data / tombstones are not deleted after ttl >> >> Yes it was a major compaction. >> I know it´s not a great solution, but I needed something to get rid of the >> old data, because I went out of diskspace. >> >> Br, >> Matthias Zeilinger >> Production Operation – Shared Services >> >> P: +43 (0) 50 858-31185 >> M: +43 (0) 664 85-34459 >> E: matthias.zeilin...@bwinparty.com >> >> bwin.party services (Austria) GmbH >> Marxergasse 1B >> A-1030 Vienna >> >> www.bwinparty.com >> >> >> -----Original Message----- >> From: Michal Michalski [mailto:mich...@opera.com] >> Sent: Dienstag, 05. März 2013 07:47 >> To: user@cassandra.apache.org >> Subject: Re: old data / tombstones are not deleted after ttl >> >> Was it a major compaction? I ask because it's definitely a solution that had >> to work, but it's also a solution that - in general - probably no-one here >> would suggest you to use. >> >> M. >> >> W dniu 05.03.2013 07:08, Matthias Zeilinger pisze: >>> Hi, >>> >>> I have done a manually compaction over the nodetool and this worked. >>> But thx for the explanation, why it wasn´t compacted >>> >>> Br, >>> Matthias Zeilinger >>> Production Operation – Shared Services >>> >>> P: +43 (0) 50 858-31185 >>> M: +43 (0) 664 85-34459 >>> E: matthias.zeilin...@bwinparty.com >>> >>> bwin.party services (Austria) GmbH >>> Marxergasse 1B >>> A-1030 Vienna >>> >>> www.bwinparty.com >>> >>> From: Bryan Talbot [mailto:btal...@aeriagames.com] >>> Sent: Montag, 04. März 2013 23:36 >>> To: user@cassandra.apache.org >>> Subject: Re: old data / tombstones are not deleted after ttl >>> >>> Those older files won't be included in a compaction until there are >>> min_compaction_threshold (4) files of that size. When you get another SS >>> table -Data.db file that is about 12-18GB then you'll have 4 and they will >>> be compacted together into one new file. At that time, if there are any >>> rows with only tombstones that are all older than gc_grace the row will be >>> removed (assuming the row exists exclusively in the 4 input SS tables). >>> Columns with data that is more than TTL seconds old will be written with a >>> tombstone. If the row does have column values in SS tables that are not >>> being compacted, the row will not be removed. >>> >>> >>> -Bryan >>> >>> On Sun, Mar 3, 2013 at 11:07 PM, Matthias Zeilinger >>> <matthias.zeilin...@bwinparty.com<mailto:matthias.zeilin...@bwinparty.com>> >>> wrote: >>> Hi, >>> >>> I´m running Cassandra 1.1.5 and have following issue. >>> >>> I´m using a 10 days TTL on my CF. I can see a lot of tombstones in there, >>> but they aren´t deleted after compaction. >>> >>> I have tried a nodetool –cleanup and also a restart of Cassandra, but >>> nothing happened. >>> >>> total 61G >>> drwxr-xr-x 2 cassandra dba 20K Mar 4 06:35 . >>> drwxr-xr-x 10 cassandra dba 4.0K Dec 10 13:05 .. >>> -rw-r--r-- 1 cassandra dba 15M Dec 15 22:04 >>> whatever-he-1398-CompressionInfo.db >>> -rw-r--r-- 1 cassandra dba 19G Dec 15 22:04 whatever-he-1398-Data.db >>> -rw-r--r-- 1 cassandra dba 15M Dec 15 22:04 >>> whatever-he-1398-Filter.db >>> -rw-r--r-- 1 cassandra dba 357M Dec 15 22:04 >>> whatever-he-1398-Index.db >>> -rw-r--r-- 1 cassandra dba 4.3K Dec 15 22:04 >>> whatever-he-1398-Statistics.db >>> -rw-r--r-- 1 cassandra dba 9.5M Feb 6 15:45 >>> whatever-he-5464-CompressionInfo.db >>> -rw-r--r-- 1 cassandra dba 12G Feb 6 15:45 whatever-he-5464-Data.db >>> -rw-r--r-- 1 cassandra dba 48M Feb 6 15:45 >>> whatever-he-5464-Filter.db >>> -rw-r--r-- 1 cassandra dba 736M Feb 6 15:45 >>> whatever-he-5464-Index.db >>> -rw-r--r-- 1 cassandra dba 4.3K Feb 6 15:45 >>> whatever-he-5464-Statistics.db >>> -rw-r--r-- 1 cassandra dba 9.7M Feb 21 19:13 >>> whatever-he-6829-CompressionInfo.db >>> -rw-r--r-- 1 cassandra dba 12G Feb 21 19:13 whatever-he-6829-Data.db >>> -rw-r--r-- 1 cassandra dba 47M Feb 21 19:13 >>> whatever-he-6829-Filter.db >>> -rw-r--r-- 1 cassandra dba 792M Feb 21 19:13 >>> whatever-he-6829-Index.db >>> -rw-r--r-- 1 cassandra dba 4.3K Feb 21 19:13 >>> whatever-he-6829-Statistics.db >>> -rw-r--r-- 1 cassandra dba 3.7M Mar 1 10:46 >>> whatever-he-7578-CompressionInfo.db >>> -rw-r--r-- 1 cassandra dba 4.3G Mar 1 10:46 whatever-he-7578-Data.db >>> -rw-r--r-- 1 cassandra dba 12M Mar 1 10:46 >>> whatever-he-7578-Filter.db >>> -rw-r--r-- 1 cassandra dba 274M Mar 1 10:46 >>> whatever-he-7578-Index.db >>> -rw-r--r-- 1 cassandra dba 4.3K Mar 1 10:46 >>> whatever-he-7578-Statistics.db >>> -rw-r--r-- 1 cassandra dba 3.6M Mar 1 11:21 >>> whatever-he-7582-CompressionInfo.db >>> -rw-r--r-- 1 cassandra dba 4.3G Mar 1 11:21 whatever-he-7582-Data.db >>> -rw-r--r-- 1 cassandra dba 9.7M Mar 1 11:21 >>> whatever-he-7582-Filter.db >>> -rw-r--r-- 1 cassandra dba 236M Mar 1 11:21 >>> whatever-he-7582-Index.db >>> -rw-r--r-- 1 cassandra dba 4.3K Mar 1 11:21 >>> whatever-he-7582-Statistics.db >>> -rw-r--r-- 1 cassandra dba 3.7M Mar 3 12:13 >>> whatever-he-7869-CompressionInfo.db >>> -rw-r--r-- 1 cassandra dba 4.3G Mar 3 12:13 whatever-he-7869-Data.db >>> -rw-r--r-- 1 cassandra dba 9.8M Mar 3 12:13 >>> whatever-he-7869-Filter.db >>> -rw-r--r-- 1 cassandra dba 239M Mar 3 12:13 >>> whatever-he-7869-Index.db >>> -rw-r--r-- 1 cassandra dba 4.3K Mar 3 12:13 >>> whatever-he-7869-Statistics.db >>> -rw-r--r-- 1 cassandra dba 924K Mar 3 18:02 >>> whatever-he-7953-CompressionInfo.db >>> -rw-r--r-- 1 cassandra dba 1.1G Mar 3 18:02 whatever-he-7953-Data.db >>> -rw-r--r-- 1 cassandra dba 2.1M Mar 3 18:02 >>> whatever-he-7953-Filter.db >>> -rw-r--r-- 1 cassandra dba 51M Mar 3 18:02 >>> whatever-he-7953-Index.db >>> -rw-r--r-- 1 cassandra dba 4.3K Mar 3 18:02 >>> whatever-he-7953-Statistics.db >>> -rw-r--r-- 1 cassandra dba 231K Mar 3 20:06 >>> whatever-he-7974-CompressionInfo.db >>> -rw-r--r-- 1 cassandra dba 268M Mar 3 20:06 whatever-he-7974-Data.db >>> -rw-r--r-- 1 cassandra dba 483K Mar 3 20:06 >>> whatever-he-7974-Filter.db >>> -rw-r--r-- 1 cassandra dba 12M Mar 3 20:06 >>> whatever-he-7974-Index.db >>> -rw-r--r-- 1 cassandra dba 4.3K Mar 3 20:06 >>> whatever-he-7974-Statistics.db >>> -rw-r--r-- 1 cassandra dba 116K Mar 4 06:28 >>> whatever-he-8002-CompressionInfo.db >>> -rw-r--r-- 1 cassandra dba 146M Mar 4 06:28 whatever-he-8002-Data.db >>> -rw-r--r-- 1 cassandra dba 646K Mar 4 06:28 >>> whatever-he-8002-Filter.db >>> -rw-r--r-- 1 cassandra dba 16M Mar 4 06:28 >>> whatever-he-8002-Index.db >>> -rw-r--r-- 1 cassandra dba 4.3K Mar 4 06:28 >>> whatever-he-8002-Statistics.db >>> -rw-r--r-- 1 cassandra dba 58K Mar 4 06:28 >>> whatever-he-8003-CompressionInfo.db >>> -rw-r--r-- 1 cassandra dba 67M Mar 4 06:28 whatever-he-8003-Data.db >>> -rw-r--r-- 1 cassandra dba 105K Mar 4 06:28 >>> whatever-he-8003-Filter.db >>> -rw-r--r-- 1 cassandra dba 2.5M Mar 4 06:28 >>> whatever-he-8003-Index.db >>> -rw-r--r-- 1 cassandra dba 4.3K Mar 4 06:28 >>> whatever-he-8003-Statistics.db >>> -rw-r--r-- 1 cassandra dba 230K Mar 4 06:30 >>> whatever-he-8004-CompressionInfo.db >>> -rw-r--r-- 1 cassandra dba 261M Mar 4 06:30 whatever-he-8004-Data.db >>> -rw-r--r-- 1 cassandra dba 480K Mar 4 06:30 >>> whatever-he-8004-Filter.db >>> -rw-r--r-- 1 cassandra dba 12M Mar 4 06:30 >>> whatever-he-8004-Index.db >>> -rw-r--r-- 1 cassandra dba 4.3K Mar 4 06:30 >>> whatever-he-8004-Statistics.db >>> -rw-r--r-- 1 cassandra dba 15K Mar 4 06:30 >>> whatever-he-8005-CompressionInfo.db >>> -rw-r--r-- 1 cassandra dba 16M Mar 4 06:30 whatever-he-8005-Data.db >>> -rw-r--r-- 1 cassandra dba 39K Mar 4 06:30 >>> whatever-he-8005-Filter.db >>> -rw-r--r-- 1 cassandra dba 944K Mar 4 06:30 >>> whatever-he-8005-Index.db >>> -rw-r--r-- 1 cassandra dba 4.3K Mar 4 06:30 >>> whatever-he-8005-Statistics.db >>> -rw-r--r-- 1 cassandra dba 5.0K Mar 4 06:35 >>> whatever-he-8006-CompressionInfo.db >>> -rw-r--r-- 1 cassandra dba 6.7M Mar 4 06:35 whatever-he-8006-Data.db >>> -rw-r--r-- 1 cassandra dba 81K Mar 4 06:35 >>> whatever-he-8006-Filter.db >>> -rw-r--r-- 1 cassandra dba 2.0M Mar 4 06:35 >>> whatever-he-8006-Index.db >>> -rw-r--r-- 1 cassandra dba 4.3K Mar 4 06:35 >>> whatever-he-8006-Statistics.db >>> >>> The things marked in red, I guess, are the old data, but they aren´t >>> deleted. As you can see on the date, they are older than 10 days. >>> >>> Is there any possibility to delete them? >>> >>> >>> Here is also the schema of the CF: >>> create column family whatever >>> with column_type = 'Standard' >>> and comparator = 'AsciiType' >>> and default_validation_class = 'AsciiType' >>> and key_validation_class = 'AsciiType' >>> and read_repair_chance = 0.0 >>> and dclocal_read_repair_chance = 0.0 >>> and gc_grace = 0 >>> and min_compaction_threshold = 4 >>> and max_compaction_threshold = 32 >>> and replicate_on_write = false >>> and compaction_strategy = >>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' >>> and caching = 'KEYS_ONLY' >>> and compression_options = {'sstable_compression' : >>> 'org.apache.cassandra.io.compress.SnappyCompressor'}; >>> >>> >>> Br, >>> Matthias Zeilinger >>> Production Operation – Shared Services >>> >>> P: +43 (0) 50 858-31185<tel:%2B43%20%280%29%2050%20858-31185> >>> M: +43 (0) 664 85-34459<tel:%2B43%20%280%29%20664%2085-34459> >>> E: >>> matthias.zeilin...@bwinparty.com<mailto:matthias.zeilinger@bwinparty.c >>> om> >>> >>> bwin.party services (Austria) GmbH >>> Marxergasse 1B >>> A-1030 Vienna >>> >>> www.bwinparty.com<http://www.bwinparty.com> >>> >>> >> >