[jira] [Commented] (CASSANDRA-6563) TTL histogram compactions not triggered at high Estimated droppable tombstones rate

2014-08-20 Thread Dan Hendry (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104386#comment-14104386
 ] 

Dan Hendry commented on CASSANDRA-6563:
---

I initially tried something similar to what was done here (adding the option to 
entirely skip the overlap check) but ran into the infinite compaction problem. 
Our cassandra nodes have a lot of data (4+TB/node) which churns quite quickly 
( one month) and the simple sstable age based heuristic was burning IO without 
actually getting rid of expired data (largely because individual compactions 
take so long).

Based on how the CompactionController.shouldPurge() is implemented, which seems 
to be how the compaction process *actually* decides if data can be dropped, I 
stared running this patch in production: 
https://github.com/kikinteractive/cassandra/compare/cassandra-1.2 . Basically 
it checks the sampled keys agains the bloom filters of overlapping SSTables to 
estimate overlap instead of trying to calculate it based on tokens. 

So far it seems to have been highly effective. Every tombstone compaction which 
has happened so far has resulted in sstable size reduction of at least the 
specified threshold. 

 TTL histogram compactions not triggered at high Estimated droppable 
 tombstones rate
 -

 Key: CASSANDRA-6563
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6563
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.12ish
Reporter: Chris Burroughs
Assignee: Paulo Motta
 Fix For: 2.0.9

 Attachments: 1.2.16-CASSANDRA-6563-v2.txt, 
 1.2.16-CASSANDRA-6563-v3.txt, 1.2.16-CASSANDRA-6563.txt, 
 2.0-CASSANDRA-6563-v3.txt, 2.0.7-CASSANDRA-6563.txt, patch-v1-iostat.png, 
 patch-v1-range1.png, patch-v2-range3.png, patched-droppadble-ratio.png, 
 patched-storage-load.png, patched1-compacted-bytes.png, 
 patched2-compacted-bytes.png, unpatched-droppable-ratio.png, 
 unpatched-storage-load.png, unpatched1-compacted-bytes.png, 
 unpatched2-compacted-bytes.png


 I have several column families in a largish cluster where virtually all 
 columns are written with a (usually the same) TTL.  My understanding of 
 CASSANDRA-3442 is that sstables that have a high (  20%) estimated 
 percentage of droppable tombstones should be individually compacted.  This 
 does not appear to be occurring with size tired compaction.
 Example from one node:
 {noformat}
 $ ll /data/sstables/data/ks/Cf/*Data.db
 -rw-rw-r-- 31 cassandra cassandra 26651211757 Nov 26 22:59 
 /data/sstables/data/ks/Cf/ks-Cf-ic-295562-Data.db
 -rw-rw-r-- 31 cassandra cassandra  6272641818 Nov 27 02:51 
 /data/sstables/data/ks/Cf/ks-Cf-ic-296121-Data.db
 -rw-rw-r-- 31 cassandra cassandra  1814691996 Dec  4 21:50 
 /data/sstables/data/ks/Cf/ks-Cf-ic-320449-Data.db
 -rw-rw-r-- 30 cassandra cassandra 10909061157 Dec 11 17:31 
 /data/sstables/data/ks/Cf/ks-Cf-ic-340318-Data.db
 -rw-rw-r-- 29 cassandra cassandra   459508942 Dec 12 10:37 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342259-Data.db
 -rw-rw-r--  1 cassandra cassandra  336908 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342307-Data.db
 -rw-rw-r--  1 cassandra cassandra 2063935 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342309-Data.db
 -rw-rw-r--  1 cassandra cassandra 409 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342314-Data.db
 -rw-rw-r--  1 cassandra cassandra31180007 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342319-Data.db
 -rw-rw-r--  1 cassandra cassandra 2398345 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342322-Data.db
 -rw-rw-r--  1 cassandra cassandra   21095 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342331-Data.db
 -rw-rw-r--  1 cassandra cassandra   81454 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342335-Data.db
 -rw-rw-r--  1 cassandra cassandra 1063718 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342339-Data.db
 -rw-rw-r--  1 cassandra cassandra  127004 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342344-Data.db
 -rw-rw-r--  1 cassandra cassandra  146785 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342346-Data.db
 -rw-rw-r--  1 cassandra cassandra  697338 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342351-Data.db
 -rw-rw-r--  1 cassandra cassandra 3921428 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342367-Data.db
 -rw-rw-r--  1 cassandra cassandra  240332 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342370-Data.db
 -rw-rw-r--  1 cassandra cassandra   45669 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342374-Data.db
 -rw-rw-r--  1 cassandra cassandra53127549 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342375-Data.db
 -rw-rw-r-- 16 cassandra cassandra 12466853166 Dec 25 

[jira] [Commented] (CASSANDRA-6563) TTL histogram compactions not triggered at high Estimated droppable tombstones rate

2014-05-22 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006208#comment-14006208
 ] 

Marcus Eriksson commented on CASSANDRA-6563:


bq. Should I file a new issue for this fix?
yes.

Thanks for the patch, could you rebase it on 2.0?

 TTL histogram compactions not triggered at high Estimated droppable 
 tombstones rate
 -

 Key: CASSANDRA-6563
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6563
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.12ish
Reporter: Chris Burroughs
Assignee: Paulo Motta
 Fix For: 1.2.17, 2.0.8

 Attachments: 1.2.16-CASSANDRA-6563-v2.txt, 
 1.2.16-CASSANDRA-6563-v3.txt, 1.2.16-CASSANDRA-6563.txt, 
 2.0.7-CASSANDRA-6563.txt, patch-v1-iostat.png, patch-v1-range1.png, 
 patch-v2-range3.png, patched-droppadble-ratio.png, patched-storage-load.png, 
 patched1-compacted-bytes.png, patched2-compacted-bytes.png, 
 unpatched-droppable-ratio.png, unpatched-storage-load.png, 
 unpatched1-compacted-bytes.png, unpatched2-compacted-bytes.png


 I have several column families in a largish cluster where virtually all 
 columns are written with a (usually the same) TTL.  My understanding of 
 CASSANDRA-3442 is that sstables that have a high (  20%) estimated 
 percentage of droppable tombstones should be individually compacted.  This 
 does not appear to be occurring with size tired compaction.
 Example from one node:
 {noformat}
 $ ll /data/sstables/data/ks/Cf/*Data.db
 -rw-rw-r-- 31 cassandra cassandra 26651211757 Nov 26 22:59 
 /data/sstables/data/ks/Cf/ks-Cf-ic-295562-Data.db
 -rw-rw-r-- 31 cassandra cassandra  6272641818 Nov 27 02:51 
 /data/sstables/data/ks/Cf/ks-Cf-ic-296121-Data.db
 -rw-rw-r-- 31 cassandra cassandra  1814691996 Dec  4 21:50 
 /data/sstables/data/ks/Cf/ks-Cf-ic-320449-Data.db
 -rw-rw-r-- 30 cassandra cassandra 10909061157 Dec 11 17:31 
 /data/sstables/data/ks/Cf/ks-Cf-ic-340318-Data.db
 -rw-rw-r-- 29 cassandra cassandra   459508942 Dec 12 10:37 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342259-Data.db
 -rw-rw-r--  1 cassandra cassandra  336908 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342307-Data.db
 -rw-rw-r--  1 cassandra cassandra 2063935 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342309-Data.db
 -rw-rw-r--  1 cassandra cassandra 409 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342314-Data.db
 -rw-rw-r--  1 cassandra cassandra31180007 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342319-Data.db
 -rw-rw-r--  1 cassandra cassandra 2398345 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342322-Data.db
 -rw-rw-r--  1 cassandra cassandra   21095 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342331-Data.db
 -rw-rw-r--  1 cassandra cassandra   81454 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342335-Data.db
 -rw-rw-r--  1 cassandra cassandra 1063718 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342339-Data.db
 -rw-rw-r--  1 cassandra cassandra  127004 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342344-Data.db
 -rw-rw-r--  1 cassandra cassandra  146785 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342346-Data.db
 -rw-rw-r--  1 cassandra cassandra  697338 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342351-Data.db
 -rw-rw-r--  1 cassandra cassandra 3921428 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342367-Data.db
 -rw-rw-r--  1 cassandra cassandra  240332 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342370-Data.db
 -rw-rw-r--  1 cassandra cassandra   45669 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342374-Data.db
 -rw-rw-r--  1 cassandra cassandra53127549 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342375-Data.db
 -rw-rw-r-- 16 cassandra cassandra 12466853166 Dec 25 22:40 
 /data/sstables/data/ks/Cf/ks-Cf-ic-396473-Data.db
 -rw-rw-r-- 12 cassandra cassandra  3903237198 Dec 29 19:42 
 /data/sstables/data/ks/Cf/ks-Cf-ic-408926-Data.db
 -rw-rw-r--  7 cassandra cassandra  3692260987 Jan  3 08:25 
 /data/sstables/data/ks/Cf/ks-Cf-ic-427733-Data.db
 -rw-rw-r--  4 cassandra cassandra  3971403602 Jan  6 20:50 
 /data/sstables/data/ks/Cf/ks-Cf-ic-437537-Data.db
 -rw-rw-r--  3 cassandra cassandra  1007832224 Jan  7 15:19 
 /data/sstables/data/ks/Cf/ks-Cf-ic-440331-Data.db
 -rw-rw-r--  2 cassandra cassandra   896132537 Jan  8 11:05 
 /data/sstables/data/ks/Cf/ks-Cf-ic-447740-Data.db
 -rw-rw-r--  1 cassandra cassandra   963039096 Jan  9 04:59 
 /data/sstables/data/ks/Cf/ks-Cf-ic-449425-Data.db
 -rw-rw-r--  1 cassandra cassandra   232168351 Jan  9 10:14 
 /data/sstables/data/ks/Cf/ks-Cf-ic-450287-Data.db
 -rw-rw-r--  1 cassandra cassandra73126319 Jan  9 11:28 
 

[jira] [Commented] (CASSANDRA-6563) TTL histogram compactions not triggered at high Estimated droppable tombstones rate

2014-05-22 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006597#comment-14006597
 ] 

Paulo Motta commented on CASSANDRA-6563:


2.0 patch here: 
https://issues.apache.org/jira/secure/attachment/12646410/2.0-CASSANDRA-6563-v3.txt

* Comments:
  * Due to the improvements to LCS in 2.0 I had to increase the TTL, sleep 
times and number of rows in testUncheckedTombstoneCompactionLeveledCompaction() 
to create an environment with overlapping-range sstables in multiple levels. 
Since this was mostly an educational test to show the benefits of the patch 
to LCS, and it was not totally deterministic, I decided to remove it in order 
not to burden the test suite. The main patch functionality is already well 
tested in testUncheckedTombstoneSizeTieredCompaction().

 TTL histogram compactions not triggered at high Estimated droppable 
 tombstones rate
 -

 Key: CASSANDRA-6563
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6563
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.12ish
Reporter: Chris Burroughs
Assignee: Paulo Motta
 Fix For: 1.2.17, 2.0.9

 Attachments: 1.2.16-CASSANDRA-6563-v2.txt, 
 1.2.16-CASSANDRA-6563-v3.txt, 1.2.16-CASSANDRA-6563.txt, 
 2.0-CASSANDRA-6563-v3.txt, 2.0.7-CASSANDRA-6563.txt, patch-v1-iostat.png, 
 patch-v1-range1.png, patch-v2-range3.png, patched-droppadble-ratio.png, 
 patched-storage-load.png, patched1-compacted-bytes.png, 
 patched2-compacted-bytes.png, unpatched-droppable-ratio.png, 
 unpatched-storage-load.png, unpatched1-compacted-bytes.png, 
 unpatched2-compacted-bytes.png


 I have several column families in a largish cluster where virtually all 
 columns are written with a (usually the same) TTL.  My understanding of 
 CASSANDRA-3442 is that sstables that have a high (  20%) estimated 
 percentage of droppable tombstones should be individually compacted.  This 
 does not appear to be occurring with size tired compaction.
 Example from one node:
 {noformat}
 $ ll /data/sstables/data/ks/Cf/*Data.db
 -rw-rw-r-- 31 cassandra cassandra 26651211757 Nov 26 22:59 
 /data/sstables/data/ks/Cf/ks-Cf-ic-295562-Data.db
 -rw-rw-r-- 31 cassandra cassandra  6272641818 Nov 27 02:51 
 /data/sstables/data/ks/Cf/ks-Cf-ic-296121-Data.db
 -rw-rw-r-- 31 cassandra cassandra  1814691996 Dec  4 21:50 
 /data/sstables/data/ks/Cf/ks-Cf-ic-320449-Data.db
 -rw-rw-r-- 30 cassandra cassandra 10909061157 Dec 11 17:31 
 /data/sstables/data/ks/Cf/ks-Cf-ic-340318-Data.db
 -rw-rw-r-- 29 cassandra cassandra   459508942 Dec 12 10:37 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342259-Data.db
 -rw-rw-r--  1 cassandra cassandra  336908 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342307-Data.db
 -rw-rw-r--  1 cassandra cassandra 2063935 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342309-Data.db
 -rw-rw-r--  1 cassandra cassandra 409 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342314-Data.db
 -rw-rw-r--  1 cassandra cassandra31180007 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342319-Data.db
 -rw-rw-r--  1 cassandra cassandra 2398345 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342322-Data.db
 -rw-rw-r--  1 cassandra cassandra   21095 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342331-Data.db
 -rw-rw-r--  1 cassandra cassandra   81454 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342335-Data.db
 -rw-rw-r--  1 cassandra cassandra 1063718 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342339-Data.db
 -rw-rw-r--  1 cassandra cassandra  127004 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342344-Data.db
 -rw-rw-r--  1 cassandra cassandra  146785 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342346-Data.db
 -rw-rw-r--  1 cassandra cassandra  697338 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342351-Data.db
 -rw-rw-r--  1 cassandra cassandra 3921428 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342367-Data.db
 -rw-rw-r--  1 cassandra cassandra  240332 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342370-Data.db
 -rw-rw-r--  1 cassandra cassandra   45669 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342374-Data.db
 -rw-rw-r--  1 cassandra cassandra53127549 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342375-Data.db
 -rw-rw-r-- 16 cassandra cassandra 12466853166 Dec 25 22:40 
 /data/sstables/data/ks/Cf/ks-Cf-ic-396473-Data.db
 -rw-rw-r-- 12 cassandra cassandra  3903237198 Dec 29 19:42 
 /data/sstables/data/ks/Cf/ks-Cf-ic-408926-Data.db
 -rw-rw-r--  7 cassandra cassandra  3692260987 Jan  3 08:25 
 /data/sstables/data/ks/Cf/ks-Cf-ic-427733-Data.db
 -rw-rw-r--  4 cassandra cassandra  3971403602 Jan  6 20:50 
 

[jira] [Commented] (CASSANDRA-6563) TTL histogram compactions not triggered at high Estimated droppable tombstones rate

2014-05-21 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005170#comment-14005170
 ] 

Paulo Motta commented on CASSANDRA-6563:


Please find the new version of the patch here: 
https://issues.apache.org/jira/secure/attachment/12646087/1.2.16-CASSANDRA-6563-v3.txt

Comments:
* I have initially implemented on 1.2-branch, since that's the version we're 
using currently. After it's reviewed and everything is fine I can port the 
patch 2.0-branch, since I don't think there'll be significant differences.
* I have called the property unchecked_tombstone_compaction (default: false), 
since aggressive_tombstone_compactions is too aggressive. :P
* I have added unit tests showing that tombstone compactions are not being 
properly executed (due to the check) when unchecked_tombstone_compaction is 
disabled, and are executed after the property is enabled, for both STCS and LCS.
* During the implementation, I found a minor bug in 
AbstractCompactionStrategy's constructor:
{code:java}
protected AbstractCompactionStrategy(ColumnFamilyStore cfs, MapString, 
String options)
{
assert cfs != null;
this.cfs = cfs;
this.options = options;
...
{code}
The fact that AbstractCompactionStrategy.options has the same reference of 
CFMetadata.compactionStrategyOptions, means that ColumnFamilyStore.reload() 
does not reload the compaction strategy when a compaction strategy option 
changes, due to the following piece of code:
{code:java}
private void maybeReloadCompactionStrategy()
{
// Check if there is a need for reloading
if 
(metadata.compactionStrategyClass.equals(compactionStrategy.getClass()) 
 
metadata.compactionStrategyOptions.equals(compactionStrategy.options)) 
//metadata.compactionStrategyOptions == compactionStrategy.options, so 
compaction is never reloaded
return;
{code}
I spotted this in my test, when I tried changing the value of 
unchecked_tombstone_compaction from false to true and calling 
ColumnFamilyStore.reload() was not reloading the compaction strategy. I don't 
know if ColumnFamilyStore.reload() is only called during tests, or also 
whenever the schema changes. In order to fix the bug, I made 
AbstractCompactionStrategy.options an ImmutableMap, so if 
CFMetadata.compactionStrategyOptions is updated, 
ColumnFamilyStore.maybeReloadCompactionStrategy will actually reload the 
compaction strategy:
{code:java}
protected AbstractCompactionStrategy(ColumnFamilyStore cfs, MapString, 
String options)
{
assert cfs != null;
this.cfs = cfs;
this.options = ImmutableMap.copyOf(options);
...
{code}
Should I file a new issue for this fix?

 TTL histogram compactions not triggered at high Estimated droppable 
 tombstones rate
 -

 Key: CASSANDRA-6563
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6563
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.12ish
Reporter: Chris Burroughs
Assignee: Paulo Motta
 Fix For: 1.2.17, 2.0.8

 Attachments: 1.2.16-CASSANDRA-6563-v2.txt, 
 1.2.16-CASSANDRA-6563-v3.txt, 1.2.16-CASSANDRA-6563.txt, 
 2.0.7-CASSANDRA-6563.txt, patch-v1-iostat.png, patch-v1-range1.png, 
 patch-v2-range3.png, patched-droppadble-ratio.png, patched-storage-load.png, 
 patched1-compacted-bytes.png, patched2-compacted-bytes.png, 
 unpatched-droppable-ratio.png, unpatched-storage-load.png, 
 unpatched1-compacted-bytes.png, unpatched2-compacted-bytes.png


 I have several column families in a largish cluster where virtually all 
 columns are written with a (usually the same) TTL.  My understanding of 
 CASSANDRA-3442 is that sstables that have a high (  20%) estimated 
 percentage of droppable tombstones should be individually compacted.  This 
 does not appear to be occurring with size tired compaction.
 Example from one node:
 {noformat}
 $ ll /data/sstables/data/ks/Cf/*Data.db
 -rw-rw-r-- 31 cassandra cassandra 26651211757 Nov 26 22:59 
 /data/sstables/data/ks/Cf/ks-Cf-ic-295562-Data.db
 -rw-rw-r-- 31 cassandra cassandra  6272641818 Nov 27 02:51 
 /data/sstables/data/ks/Cf/ks-Cf-ic-296121-Data.db
 -rw-rw-r-- 31 cassandra cassandra  1814691996 Dec  4 21:50 
 /data/sstables/data/ks/Cf/ks-Cf-ic-320449-Data.db
 -rw-rw-r-- 30 cassandra cassandra 10909061157 Dec 11 17:31 
 /data/sstables/data/ks/Cf/ks-Cf-ic-340318-Data.db
 -rw-rw-r-- 29 cassandra cassandra   459508942 Dec 12 10:37 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342259-Data.db
 -rw-rw-r--  1 cassandra cassandra  336908 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342307-Data.db
 -rw-rw-r--  1 cassandra cassandra 2063935 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342309-Data.db
 

[jira] [Commented] (CASSANDRA-6563) TTL histogram compactions not triggered at high Estimated droppable tombstones rate

2014-05-20 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003134#comment-14003134
 ] 

Marcus Eriksson commented on CASSANDRA-6563:


[~pauloricardomg] I think making it configurable is the way to go, calling the 
config option something like aggressive_tombstone_compactions or something, 
with default off.

 TTL histogram compactions not triggered at high Estimated droppable 
 tombstones rate
 -

 Key: CASSANDRA-6563
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6563
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.12ish
Reporter: Chris Burroughs
Assignee: Paulo Ricardo Motta Gomes
 Fix For: 1.2.17, 2.0.8

 Attachments: 1.2.16-CASSANDRA-6563-v2.txt, 1.2.16-CASSANDRA-6563.txt, 
 2.0.7-CASSANDRA-6563.txt, patch-v1-iostat.png, patch-v1-range1.png, 
 patch-v2-range3.png, patched-droppadble-ratio.png, patched-storage-load.png, 
 patched1-compacted-bytes.png, patched2-compacted-bytes.png, 
 unpatched-droppable-ratio.png, unpatched-storage-load.png, 
 unpatched1-compacted-bytes.png, unpatched2-compacted-bytes.png


 I have several column families in a largish cluster where virtually all 
 columns are written with a (usually the same) TTL.  My understanding of 
 CASSANDRA-3442 is that sstables that have a high (  20%) estimated 
 percentage of droppable tombstones should be individually compacted.  This 
 does not appear to be occurring with size tired compaction.
 Example from one node:
 {noformat}
 $ ll /data/sstables/data/ks/Cf/*Data.db
 -rw-rw-r-- 31 cassandra cassandra 26651211757 Nov 26 22:59 
 /data/sstables/data/ks/Cf/ks-Cf-ic-295562-Data.db
 -rw-rw-r-- 31 cassandra cassandra  6272641818 Nov 27 02:51 
 /data/sstables/data/ks/Cf/ks-Cf-ic-296121-Data.db
 -rw-rw-r-- 31 cassandra cassandra  1814691996 Dec  4 21:50 
 /data/sstables/data/ks/Cf/ks-Cf-ic-320449-Data.db
 -rw-rw-r-- 30 cassandra cassandra 10909061157 Dec 11 17:31 
 /data/sstables/data/ks/Cf/ks-Cf-ic-340318-Data.db
 -rw-rw-r-- 29 cassandra cassandra   459508942 Dec 12 10:37 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342259-Data.db
 -rw-rw-r--  1 cassandra cassandra  336908 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342307-Data.db
 -rw-rw-r--  1 cassandra cassandra 2063935 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342309-Data.db
 -rw-rw-r--  1 cassandra cassandra 409 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342314-Data.db
 -rw-rw-r--  1 cassandra cassandra31180007 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342319-Data.db
 -rw-rw-r--  1 cassandra cassandra 2398345 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342322-Data.db
 -rw-rw-r--  1 cassandra cassandra   21095 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342331-Data.db
 -rw-rw-r--  1 cassandra cassandra   81454 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342335-Data.db
 -rw-rw-r--  1 cassandra cassandra 1063718 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342339-Data.db
 -rw-rw-r--  1 cassandra cassandra  127004 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342344-Data.db
 -rw-rw-r--  1 cassandra cassandra  146785 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342346-Data.db
 -rw-rw-r--  1 cassandra cassandra  697338 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342351-Data.db
 -rw-rw-r--  1 cassandra cassandra 3921428 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342367-Data.db
 -rw-rw-r--  1 cassandra cassandra  240332 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342370-Data.db
 -rw-rw-r--  1 cassandra cassandra   45669 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342374-Data.db
 -rw-rw-r--  1 cassandra cassandra53127549 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342375-Data.db
 -rw-rw-r-- 16 cassandra cassandra 12466853166 Dec 25 22:40 
 /data/sstables/data/ks/Cf/ks-Cf-ic-396473-Data.db
 -rw-rw-r-- 12 cassandra cassandra  3903237198 Dec 29 19:42 
 /data/sstables/data/ks/Cf/ks-Cf-ic-408926-Data.db
 -rw-rw-r--  7 cassandra cassandra  3692260987 Jan  3 08:25 
 /data/sstables/data/ks/Cf/ks-Cf-ic-427733-Data.db
 -rw-rw-r--  4 cassandra cassandra  3971403602 Jan  6 20:50 
 /data/sstables/data/ks/Cf/ks-Cf-ic-437537-Data.db
 -rw-rw-r--  3 cassandra cassandra  1007832224 Jan  7 15:19 
 /data/sstables/data/ks/Cf/ks-Cf-ic-440331-Data.db
 -rw-rw-r--  2 cassandra cassandra   896132537 Jan  8 11:05 
 /data/sstables/data/ks/Cf/ks-Cf-ic-447740-Data.db
 -rw-rw-r--  1 cassandra cassandra   963039096 Jan  9 04:59 
 /data/sstables/data/ks/Cf/ks-Cf-ic-449425-Data.db
 -rw-rw-r--  1 cassandra cassandra   232168351 Jan  9 10:14 
 /data/sstables/data/ks/Cf/ks-Cf-ic-450287-Data.db
 -rw-rw-r--  1 cassandra cassandra

[jira] [Commented] (CASSANDRA-6563) TTL histogram compactions not triggered at high Estimated droppable tombstones rate

2014-05-16 Thread Paulo Ricardo Motta Gomes (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000258#comment-14000258
 ] 

Paulo Ricardo Motta Gomes commented on CASSANDRA-6563:
--

Below I will present some live cluster analysis about 10 days after deploying 
the original patch that entirely removes the check for range overlap in 
worthDroppingTombstones(). 

*Analysis Description*

In our dataset we use both LCS and STCS, but most of the CFs are STCS. A 
significant portion of our dataset is comprised of append-only TTL-ed data, so 
a good match for tombstone compaction. Most of our large CFs with high 
droppable tombstone ratio use STCS, but there are a few that use LCS that also 
benefited from the patch. 

I deployed the patch in 2 different ranges with similar results. The metrics 
were collected between 1st of May and 16 of May, the nodes were patched on the 
7th of May. Used cassandra version was 1.2.16.
  
In the analysis I compare the total space used (Cassandra Load), tombstone 
Ratio, disk utilization (system disk xvbd util), total bytes compacted and 
system load (linux cpu). For the last three metrics I also calculate the 
integral of the metric to make it easier to compare the total amount during the 
period.

*Analysis*

Graphs: 
https://issues.apache.org/jira/secure/attachment/12645241/patch-v1-range1.png

Each graph compares the metrics of the patched node with it's previous neighbor 
and next neighbor, no VNODES is used. So, the first row in the figure is node 
N-1, the second row is node N (the patched node, marked with asterisk), and the 
third row is node N+1.

* *Cassandra load*: In the patched node, it's possible to see a sudden decrease 
of 7% of disk space when the patch was applied, due to the execution of single 
SSTable compactions. The growth rate of disk usage is also decreased after the 
patch, since tombstone are cleared more often. In the whole period, there was a 
1.2% disk space increase in the patched node, against about 10% growth on the 
unpatched nodes.

* *Tombstone ratio*: After the patch is applied, it's possible to see a 
decrease in the droppable tombstone ratio, that revolves around the default 
level of 20% after that. The droppable tombstone ratio of unpatched nodes 
remains high for most CFs, what indicates that tombstone compactions are not 
being triggered at all.

* *Disk utilization*: it's not possible to detect any change in the disk 
utilization pattern after the patch is applied, what might indicate the I/O is 
not affected by the patch, at least for our mixed dataset. I double checked the 
IOPS graph for the period and there was not even a slight sign of change in the 
I/O pattern after the patch was applied. 
(https://issues.apache.org/jira/secure/attachment/12645312/patch-v1-iostat.png)

* *Total Bytes compacted*: The number of compacted bytes in the patched node 
was about 17% higher in the period. About 7% due to the initial tombstones that 
were cleared and more 7% due to cleared tombstones after the patch was applied 
(the difference between the 2 nodes sizes). The remaining 3% can be attributed 
to unnecessary compactions + normal variations because of different node ranges.

* *System CPU Load*: Was not affected by the patch.

*Alternative Patch*

I implemented another version of the patch (v2) as suggested by [~krummas], 
that instead of dropping the overlap check entirely, it only performs the check 
for SSTables containing rows with smaller timestamp than the candidate SSTable 
(https://issues.apache.org/jira/secure/attachment/12645316/1.2.16-CASSANDRA-6563-v2.txt).
 

One week ago I deployed this alternative patch on 2 of our production nodes, 
and unfortunately loosing the checks did not achieve significant results. I 
added some debugging log to the code and what I verified is that despite 
reducing the number of sstables to compare with, even if only one SSTable has a 
column with an equal or lower timestamp to the candidate SSTable, the token 
ranges of these sstables always overlap because of the Random Partitioner. So, 
this supports the claim that even with loosen checks, the single-sstable 
tombstone compaction is almost never being triggered. At least on the use cases 
that could benefit from it.

The graphs for the alternative patch analysis can be found here: 
https://issues.apache.org/jira/secure/attachment/12645240/patch-v2-range3.png

 TTL histogram compactions not triggered at high Estimated droppable 
 tombstones rate
 -

 Key: CASSANDRA-6563
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6563
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.12ish
Reporter: Chris Burroughs
Assignee: Paulo Ricardo Motta Gomes

[jira] [Commented] (CASSANDRA-6563) TTL histogram compactions not triggered at high Estimated droppable tombstones rate

2014-05-16 Thread Paulo Ricardo Motta Gomes (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000289#comment-14000289
 ] 

Paulo Ricardo Motta Gomes commented on CASSANDRA-6563:
--

Based on the previous analysis I propose the following alternatives:

# Drop the sstable range overlap check and hope that the analysis can be 
generalized and I/O will not be significantly impacted by the dropping the 
check.
# Since the check almost never lets tombstone compactions run, we can drop the 
check add a new CF property: enable_tombstone_compactions: false (default), and 
at least allow admins to safely enable tombstone compaction for CFs they know 
is a good match without impacting I/O in other CFs.

 TTL histogram compactions not triggered at high Estimated droppable 
 tombstones rate
 -

 Key: CASSANDRA-6563
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6563
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.12ish
Reporter: Chris Burroughs
Assignee: Paulo Ricardo Motta Gomes
 Fix For: 1.2.17, 2.0.8

 Attachments: 1.2.16-CASSANDRA-6563-v2.txt, 1.2.16-CASSANDRA-6563.txt, 
 2.0.7-CASSANDRA-6563.txt, patch-v1-iostat.png, patch-v1-range1.png, 
 patch-v2-range3.png, patched-droppadble-ratio.png, patched-storage-load.png, 
 patched1-compacted-bytes.png, patched2-compacted-bytes.png, 
 unpatched-droppable-ratio.png, unpatched-storage-load.png, 
 unpatched1-compacted-bytes.png, unpatched2-compacted-bytes.png


 I have several column families in a largish cluster where virtually all 
 columns are written with a (usually the same) TTL.  My understanding of 
 CASSANDRA-3442 is that sstables that have a high (  20%) estimated 
 percentage of droppable tombstones should be individually compacted.  This 
 does not appear to be occurring with size tired compaction.
 Example from one node:
 {noformat}
 $ ll /data/sstables/data/ks/Cf/*Data.db
 -rw-rw-r-- 31 cassandra cassandra 26651211757 Nov 26 22:59 
 /data/sstables/data/ks/Cf/ks-Cf-ic-295562-Data.db
 -rw-rw-r-- 31 cassandra cassandra  6272641818 Nov 27 02:51 
 /data/sstables/data/ks/Cf/ks-Cf-ic-296121-Data.db
 -rw-rw-r-- 31 cassandra cassandra  1814691996 Dec  4 21:50 
 /data/sstables/data/ks/Cf/ks-Cf-ic-320449-Data.db
 -rw-rw-r-- 30 cassandra cassandra 10909061157 Dec 11 17:31 
 /data/sstables/data/ks/Cf/ks-Cf-ic-340318-Data.db
 -rw-rw-r-- 29 cassandra cassandra   459508942 Dec 12 10:37 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342259-Data.db
 -rw-rw-r--  1 cassandra cassandra  336908 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342307-Data.db
 -rw-rw-r--  1 cassandra cassandra 2063935 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342309-Data.db
 -rw-rw-r--  1 cassandra cassandra 409 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342314-Data.db
 -rw-rw-r--  1 cassandra cassandra31180007 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342319-Data.db
 -rw-rw-r--  1 cassandra cassandra 2398345 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342322-Data.db
 -rw-rw-r--  1 cassandra cassandra   21095 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342331-Data.db
 -rw-rw-r--  1 cassandra cassandra   81454 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342335-Data.db
 -rw-rw-r--  1 cassandra cassandra 1063718 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342339-Data.db
 -rw-rw-r--  1 cassandra cassandra  127004 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342344-Data.db
 -rw-rw-r--  1 cassandra cassandra  146785 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342346-Data.db
 -rw-rw-r--  1 cassandra cassandra  697338 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342351-Data.db
 -rw-rw-r--  1 cassandra cassandra 3921428 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342367-Data.db
 -rw-rw-r--  1 cassandra cassandra  240332 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342370-Data.db
 -rw-rw-r--  1 cassandra cassandra   45669 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342374-Data.db
 -rw-rw-r--  1 cassandra cassandra53127549 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342375-Data.db
 -rw-rw-r-- 16 cassandra cassandra 12466853166 Dec 25 22:40 
 /data/sstables/data/ks/Cf/ks-Cf-ic-396473-Data.db
 -rw-rw-r-- 12 cassandra cassandra  3903237198 Dec 29 19:42 
 /data/sstables/data/ks/Cf/ks-Cf-ic-408926-Data.db
 -rw-rw-r--  7 cassandra cassandra  3692260987 Jan  3 08:25 
 /data/sstables/data/ks/Cf/ks-Cf-ic-427733-Data.db
 -rw-rw-r--  4 cassandra cassandra  3971403602 Jan  6 20:50 
 /data/sstables/data/ks/Cf/ks-Cf-ic-437537-Data.db
 -rw-rw-r--  3 cassandra cassandra  1007832224 Jan  7 15:19 
 /data/sstables/data/ks/Cf/ks-Cf-ic-440331-Data.db
 -rw-rw-r--  

[jira] [Commented] (CASSANDRA-6563) TTL histogram compactions not triggered at high Estimated droppable tombstones rate

2014-05-15 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993480#comment-13993480
 ] 

Marcus Eriksson commented on CASSANDRA-6563:


have to say I don't really feel comfortable dropping these checks, we could 
start doing alot of extra unnecessary IO

A better solution would be to get a more accurate estimate on how much the 
sstables overlap, (CASSANDRA-6474)

What we could do now (in 2.0) is perhaps loosen checks a bit, for example, we 
should probably only check for overlap in sstables which contain data that is 
older than the data in one we want to compact, since those are the ones that 
can block dropping the tombstones.

 TTL histogram compactions not triggered at high Estimated droppable 
 tombstones rate
 -

 Key: CASSANDRA-6563
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6563
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.12ish
Reporter: Chris Burroughs
Assignee: Paulo Ricardo Motta Gomes
 Fix For: 1.2.17, 2.0.8

 Attachments: 1.2.16-CASSANDRA-6563.txt, 2.0.7-CASSANDRA-6563.txt, 
 patched-droppadble-ratio.png, patched-storage-load.png, 
 patched1-compacted-bytes.png, patched2-compacted-bytes.png, 
 unpatched-droppable-ratio.png, unpatched-storage-load.png, 
 unpatched1-compacted-bytes.png, unpatched2-compacted-bytes.png


 I have several column families in a largish cluster where virtually all 
 columns are written with a (usually the same) TTL.  My understanding of 
 CASSANDRA-3442 is that sstables that have a high (  20%) estimated 
 percentage of droppable tombstones should be individually compacted.  This 
 does not appear to be occurring with size tired compaction.
 Example from one node:
 {noformat}
 $ ll /data/sstables/data/ks/Cf/*Data.db
 -rw-rw-r-- 31 cassandra cassandra 26651211757 Nov 26 22:59 
 /data/sstables/data/ks/Cf/ks-Cf-ic-295562-Data.db
 -rw-rw-r-- 31 cassandra cassandra  6272641818 Nov 27 02:51 
 /data/sstables/data/ks/Cf/ks-Cf-ic-296121-Data.db
 -rw-rw-r-- 31 cassandra cassandra  1814691996 Dec  4 21:50 
 /data/sstables/data/ks/Cf/ks-Cf-ic-320449-Data.db
 -rw-rw-r-- 30 cassandra cassandra 10909061157 Dec 11 17:31 
 /data/sstables/data/ks/Cf/ks-Cf-ic-340318-Data.db
 -rw-rw-r-- 29 cassandra cassandra   459508942 Dec 12 10:37 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342259-Data.db
 -rw-rw-r--  1 cassandra cassandra  336908 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342307-Data.db
 -rw-rw-r--  1 cassandra cassandra 2063935 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342309-Data.db
 -rw-rw-r--  1 cassandra cassandra 409 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342314-Data.db
 -rw-rw-r--  1 cassandra cassandra31180007 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342319-Data.db
 -rw-rw-r--  1 cassandra cassandra 2398345 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342322-Data.db
 -rw-rw-r--  1 cassandra cassandra   21095 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342331-Data.db
 -rw-rw-r--  1 cassandra cassandra   81454 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342335-Data.db
 -rw-rw-r--  1 cassandra cassandra 1063718 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342339-Data.db
 -rw-rw-r--  1 cassandra cassandra  127004 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342344-Data.db
 -rw-rw-r--  1 cassandra cassandra  146785 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342346-Data.db
 -rw-rw-r--  1 cassandra cassandra  697338 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342351-Data.db
 -rw-rw-r--  1 cassandra cassandra 3921428 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342367-Data.db
 -rw-rw-r--  1 cassandra cassandra  240332 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342370-Data.db
 -rw-rw-r--  1 cassandra cassandra   45669 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342374-Data.db
 -rw-rw-r--  1 cassandra cassandra53127549 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342375-Data.db
 -rw-rw-r-- 16 cassandra cassandra 12466853166 Dec 25 22:40 
 /data/sstables/data/ks/Cf/ks-Cf-ic-396473-Data.db
 -rw-rw-r-- 12 cassandra cassandra  3903237198 Dec 29 19:42 
 /data/sstables/data/ks/Cf/ks-Cf-ic-408926-Data.db
 -rw-rw-r--  7 cassandra cassandra  3692260987 Jan  3 08:25 
 /data/sstables/data/ks/Cf/ks-Cf-ic-427733-Data.db
 -rw-rw-r--  4 cassandra cassandra  3971403602 Jan  6 20:50 
 /data/sstables/data/ks/Cf/ks-Cf-ic-437537-Data.db
 -rw-rw-r--  3 cassandra cassandra  1007832224 Jan  7 15:19 
 /data/sstables/data/ks/Cf/ks-Cf-ic-440331-Data.db
 -rw-rw-r--  2 cassandra cassandra   896132537 Jan  8 11:05 
 /data/sstables/data/ks/Cf/ks-Cf-ic-447740-Data.db
 -rw-rw-r--  1 cassandra 

[jira] [Commented] (CASSANDRA-6563) TTL histogram compactions not triggered at high Estimated droppable tombstones rate

2014-05-15 Thread Paulo Ricardo Motta Gomes (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993861#comment-13993861
 ] 

Paulo Ricardo Motta Gomes commented on CASSANDRA-6563:
--

You're right, dropping the checks increases necessary I/O but also unnecessary 
I/O. I made a comparison of the compacted bytes for 2 ranges (with and without 
patch). It's possible to see a huge increase of useful compactions, but also a 
relevant increase of useless compacted bytes (even though the compactions 
haven't stabilized yet).

Subtitle:

* compact+: compacted bytes from useful compactions (compaction ratio  100%)
* tombstone+: compacted bytes from useful tombstone compactions (compaction 
ratio  100%)
* compact-: compacted bytes from useless compactions (compaction ratio = 100%)
* tombstone-: compacted bytes from useless compactions (compaction ratio = 100%)

*Unpatched vs Patched compacted bytes* (range 1)

!unpatched1-compacted-bytes.png|align=left, height=50%, width=50%! 
!patched1-compacted-bytes.png|align=right, height=50%, width=50%!

*Unpatched vs Patched compacted bytes* (range 2)

!unpatched2-compacted-bytes.png|align=left, height=50%, width=50%! 
!patched2-compacted-bytes.png|align=right, height=50%, width=50%!

What we want here is to increase the number of useful tombstone compactions 
without increasing the number of useless ones. I will implement a new patch 
that only takes older sstables into account to check for overlap and compare 
the results with the previous patch to see if we can achieve good results. 
Since most of our tombstone data is time-series TTLed data, I believe that this 
will have a good effect.

 TTL histogram compactions not triggered at high Estimated droppable 
 tombstones rate
 -

 Key: CASSANDRA-6563
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6563
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.12ish
Reporter: Chris Burroughs
Assignee: Paulo Ricardo Motta Gomes
 Fix For: 1.2.17, 2.0.8

 Attachments: 1.2.16-CASSANDRA-6563.txt, 2.0.7-CASSANDRA-6563.txt, 
 patched-droppadble-ratio.png, patched-storage-load.png, 
 patched1-compacted-bytes.png, patched2-compacted-bytes.png, 
 unpatched-droppable-ratio.png, unpatched-storage-load.png, 
 unpatched1-compacted-bytes.png, unpatched2-compacted-bytes.png


 I have several column families in a largish cluster where virtually all 
 columns are written with a (usually the same) TTL.  My understanding of 
 CASSANDRA-3442 is that sstables that have a high (  20%) estimated 
 percentage of droppable tombstones should be individually compacted.  This 
 does not appear to be occurring with size tired compaction.
 Example from one node:
 {noformat}
 $ ll /data/sstables/data/ks/Cf/*Data.db
 -rw-rw-r-- 31 cassandra cassandra 26651211757 Nov 26 22:59 
 /data/sstables/data/ks/Cf/ks-Cf-ic-295562-Data.db
 -rw-rw-r-- 31 cassandra cassandra  6272641818 Nov 27 02:51 
 /data/sstables/data/ks/Cf/ks-Cf-ic-296121-Data.db
 -rw-rw-r-- 31 cassandra cassandra  1814691996 Dec  4 21:50 
 /data/sstables/data/ks/Cf/ks-Cf-ic-320449-Data.db
 -rw-rw-r-- 30 cassandra cassandra 10909061157 Dec 11 17:31 
 /data/sstables/data/ks/Cf/ks-Cf-ic-340318-Data.db
 -rw-rw-r-- 29 cassandra cassandra   459508942 Dec 12 10:37 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342259-Data.db
 -rw-rw-r--  1 cassandra cassandra  336908 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342307-Data.db
 -rw-rw-r--  1 cassandra cassandra 2063935 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342309-Data.db
 -rw-rw-r--  1 cassandra cassandra 409 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342314-Data.db
 -rw-rw-r--  1 cassandra cassandra31180007 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342319-Data.db
 -rw-rw-r--  1 cassandra cassandra 2398345 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342322-Data.db
 -rw-rw-r--  1 cassandra cassandra   21095 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342331-Data.db
 -rw-rw-r--  1 cassandra cassandra   81454 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342335-Data.db
 -rw-rw-r--  1 cassandra cassandra 1063718 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342339-Data.db
 -rw-rw-r--  1 cassandra cassandra  127004 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342344-Data.db
 -rw-rw-r--  1 cassandra cassandra  146785 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342346-Data.db
 -rw-rw-r--  1 cassandra cassandra  697338 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342351-Data.db
 -rw-rw-r--  1 cassandra cassandra 3921428 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342367-Data.db
 -rw-rw-r--  1 cassandra cassandra  240332 Dec 12 

[jira] [Commented] (CASSANDRA-6563) TTL histogram compactions not triggered at high Estimated droppable tombstones rate

2014-05-12 Thread Paulo Ricardo Motta Gomes (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992956#comment-13992956
 ] 

Paulo Ricardo Motta Gomes commented on CASSANDRA-6563:
--

The following graphs compare the droppable tombstone ratio vs storage load of 
an unpatched vs patched node. These are both for LCS and STCS CFs, so the bug 
is definitely affecting both, not only STCS.

*Droppable Tombstone Ratio (unpatched vs patched)*

!unpatched-droppable-ratio.png|align=left, height=50%, width=50%! 
!patched-droppadble-ratio.png|align=right, height=50%, width=50%!

*Live disk space used (unpatched vs patched)*

!unpatched-storage-load.png|align=left, height=50%, width=50%! 
!patched-storage-load.png|align=right, height=50%, width=50%!

 TTL histogram compactions not triggered at high Estimated droppable 
 tombstones rate
 -

 Key: CASSANDRA-6563
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6563
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.12ish
Reporter: Chris Burroughs
Assignee: Paulo Ricardo Motta Gomes
 Fix For: 1.2.17, 2.0.8

 Attachments: 1.2.16-CASSANDRA-6563.txt, 2.0.7-CASSANDRA-6563.txt, 
 patched-droppadble-ratio.png, patched-storage-load.png, 
 unpatched-droppable-ratio.png, unpatched-storage-load.png


 I have several column families in a largish cluster where virtually all 
 columns are written with a (usually the same) TTL.  My understanding of 
 CASSANDRA-3442 is that sstables that have a high (  20%) estimated 
 percentage of droppable tombstones should be individually compacted.  This 
 does not appear to be occurring with size tired compaction.
 Example from one node:
 {noformat}
 $ ll /data/sstables/data/ks/Cf/*Data.db
 -rw-rw-r-- 31 cassandra cassandra 26651211757 Nov 26 22:59 
 /data/sstables/data/ks/Cf/ks-Cf-ic-295562-Data.db
 -rw-rw-r-- 31 cassandra cassandra  6272641818 Nov 27 02:51 
 /data/sstables/data/ks/Cf/ks-Cf-ic-296121-Data.db
 -rw-rw-r-- 31 cassandra cassandra  1814691996 Dec  4 21:50 
 /data/sstables/data/ks/Cf/ks-Cf-ic-320449-Data.db
 -rw-rw-r-- 30 cassandra cassandra 10909061157 Dec 11 17:31 
 /data/sstables/data/ks/Cf/ks-Cf-ic-340318-Data.db
 -rw-rw-r-- 29 cassandra cassandra   459508942 Dec 12 10:37 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342259-Data.db
 -rw-rw-r--  1 cassandra cassandra  336908 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342307-Data.db
 -rw-rw-r--  1 cassandra cassandra 2063935 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342309-Data.db
 -rw-rw-r--  1 cassandra cassandra 409 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342314-Data.db
 -rw-rw-r--  1 cassandra cassandra31180007 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342319-Data.db
 -rw-rw-r--  1 cassandra cassandra 2398345 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342322-Data.db
 -rw-rw-r--  1 cassandra cassandra   21095 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342331-Data.db
 -rw-rw-r--  1 cassandra cassandra   81454 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342335-Data.db
 -rw-rw-r--  1 cassandra cassandra 1063718 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342339-Data.db
 -rw-rw-r--  1 cassandra cassandra  127004 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342344-Data.db
 -rw-rw-r--  1 cassandra cassandra  146785 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342346-Data.db
 -rw-rw-r--  1 cassandra cassandra  697338 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342351-Data.db
 -rw-rw-r--  1 cassandra cassandra 3921428 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342367-Data.db
 -rw-rw-r--  1 cassandra cassandra  240332 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342370-Data.db
 -rw-rw-r--  1 cassandra cassandra   45669 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342374-Data.db
 -rw-rw-r--  1 cassandra cassandra53127549 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342375-Data.db
 -rw-rw-r-- 16 cassandra cassandra 12466853166 Dec 25 22:40 
 /data/sstables/data/ks/Cf/ks-Cf-ic-396473-Data.db
 -rw-rw-r-- 12 cassandra cassandra  3903237198 Dec 29 19:42 
 /data/sstables/data/ks/Cf/ks-Cf-ic-408926-Data.db
 -rw-rw-r--  7 cassandra cassandra  3692260987 Jan  3 08:25 
 /data/sstables/data/ks/Cf/ks-Cf-ic-427733-Data.db
 -rw-rw-r--  4 cassandra cassandra  3971403602 Jan  6 20:50 
 /data/sstables/data/ks/Cf/ks-Cf-ic-437537-Data.db
 -rw-rw-r--  3 cassandra cassandra  1007832224 Jan  7 15:19 
 /data/sstables/data/ks/Cf/ks-Cf-ic-440331-Data.db
 -rw-rw-r--  2 cassandra cassandra   896132537 Jan  8 11:05 
 /data/sstables/data/ks/Cf/ks-Cf-ic-447740-Data.db
 -rw-rw-r--  1 cassandra cassandra   963039096 Jan  9 04:59 
 

[jira] [Commented] (CASSANDRA-6563) TTL histogram compactions not triggered at high Estimated droppable tombstones rate

2014-05-11 Thread Paulo Ricardo Motta Gomes (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992870#comment-13992870
 ] 

Paulo Ricardo Motta Gomes commented on CASSANDRA-6563:
--

Related issue: CASSANDRA-6654

 TTL histogram compactions not triggered at high Estimated droppable 
 tombstones rate
 -

 Key: CASSANDRA-6563
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6563
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.12ish
Reporter: Chris Burroughs
Assignee: Paulo Ricardo Motta Gomes
 Fix For: 1.2.17, 2.0.8

 Attachments: 1.2.16-CASSANDRA-6563.txt, 2.0.7-CASSANDRA-6563.txt, 
 patched-droppadble-ratio.png, patched-storage-load.png, 
 unpatched-droppable-ratio.png, unpatched-storage-load.png


 I have several column families in a largish cluster where virtually all 
 columns are written with a (usually the same) TTL.  My understanding of 
 CASSANDRA-3442 is that sstables that have a high (  20%) estimated 
 percentage of droppable tombstones should be individually compacted.  This 
 does not appear to be occurring with size tired compaction.
 Example from one node:
 {noformat}
 $ ll /data/sstables/data/ks/Cf/*Data.db
 -rw-rw-r-- 31 cassandra cassandra 26651211757 Nov 26 22:59 
 /data/sstables/data/ks/Cf/ks-Cf-ic-295562-Data.db
 -rw-rw-r-- 31 cassandra cassandra  6272641818 Nov 27 02:51 
 /data/sstables/data/ks/Cf/ks-Cf-ic-296121-Data.db
 -rw-rw-r-- 31 cassandra cassandra  1814691996 Dec  4 21:50 
 /data/sstables/data/ks/Cf/ks-Cf-ic-320449-Data.db
 -rw-rw-r-- 30 cassandra cassandra 10909061157 Dec 11 17:31 
 /data/sstables/data/ks/Cf/ks-Cf-ic-340318-Data.db
 -rw-rw-r-- 29 cassandra cassandra   459508942 Dec 12 10:37 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342259-Data.db
 -rw-rw-r--  1 cassandra cassandra  336908 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342307-Data.db
 -rw-rw-r--  1 cassandra cassandra 2063935 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342309-Data.db
 -rw-rw-r--  1 cassandra cassandra 409 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342314-Data.db
 -rw-rw-r--  1 cassandra cassandra31180007 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342319-Data.db
 -rw-rw-r--  1 cassandra cassandra 2398345 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342322-Data.db
 -rw-rw-r--  1 cassandra cassandra   21095 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342331-Data.db
 -rw-rw-r--  1 cassandra cassandra   81454 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342335-Data.db
 -rw-rw-r--  1 cassandra cassandra 1063718 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342339-Data.db
 -rw-rw-r--  1 cassandra cassandra  127004 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342344-Data.db
 -rw-rw-r--  1 cassandra cassandra  146785 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342346-Data.db
 -rw-rw-r--  1 cassandra cassandra  697338 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342351-Data.db
 -rw-rw-r--  1 cassandra cassandra 3921428 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342367-Data.db
 -rw-rw-r--  1 cassandra cassandra  240332 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342370-Data.db
 -rw-rw-r--  1 cassandra cassandra   45669 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342374-Data.db
 -rw-rw-r--  1 cassandra cassandra53127549 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342375-Data.db
 -rw-rw-r-- 16 cassandra cassandra 12466853166 Dec 25 22:40 
 /data/sstables/data/ks/Cf/ks-Cf-ic-396473-Data.db
 -rw-rw-r-- 12 cassandra cassandra  3903237198 Dec 29 19:42 
 /data/sstables/data/ks/Cf/ks-Cf-ic-408926-Data.db
 -rw-rw-r--  7 cassandra cassandra  3692260987 Jan  3 08:25 
 /data/sstables/data/ks/Cf/ks-Cf-ic-427733-Data.db
 -rw-rw-r--  4 cassandra cassandra  3971403602 Jan  6 20:50 
 /data/sstables/data/ks/Cf/ks-Cf-ic-437537-Data.db
 -rw-rw-r--  3 cassandra cassandra  1007832224 Jan  7 15:19 
 /data/sstables/data/ks/Cf/ks-Cf-ic-440331-Data.db
 -rw-rw-r--  2 cassandra cassandra   896132537 Jan  8 11:05 
 /data/sstables/data/ks/Cf/ks-Cf-ic-447740-Data.db
 -rw-rw-r--  1 cassandra cassandra   963039096 Jan  9 04:59 
 /data/sstables/data/ks/Cf/ks-Cf-ic-449425-Data.db
 -rw-rw-r--  1 cassandra cassandra   232168351 Jan  9 10:14 
 /data/sstables/data/ks/Cf/ks-Cf-ic-450287-Data.db
 -rw-rw-r--  1 cassandra cassandra73126319 Jan  9 11:28 
 /data/sstables/data/ks/Cf/ks-Cf-ic-450307-Data.db
 -rw-rw-r--  1 cassandra cassandra40921916 Jan  9 12:08 
 /data/sstables/data/ks/Cf/ks-Cf-ic-450336-Data.db
 -rw-rw-r--  1 cassandra cassandra60881193 Jan  9 12:23 
 /data/sstables/data/ks/Cf/ks-Cf-ic-450341-Data.db
 -rw-rw-r--  1 cassandra cassandra4746 Jan  

[jira] [Commented] (CASSANDRA-6563) TTL histogram compactions not triggered at high Estimated droppable tombstones rate

2014-05-10 Thread Paulo Ricardo Motta Gomes (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992809#comment-13992809
 ] 

Paulo Ricardo Motta Gomes commented on CASSANDRA-6563:
--

I've also hit this bug and instrumented the code to identify what was 
happening. Below is a detailed explanation of the issues that lead to this:

* CASSANDRA-3442 introduced the automatic tombstone removal feature, which was 
one of the main new features of C* 1.2 
(http://www.datastax.com/dev/blog/tombstone-removal-improvement-in-1-2).

* In CASSANDRA-4022 the hints column family entered a compaction loop due to 
the single sstable compaction introduced by CASSANDRA-3442 that was not 
clearing tombstones because compacted rows with a lower timestamp were present 
in other sstables. So, the same SSTable was being compated over and over due to 
a large tombstone ratio that never changed.
A new heuristic was introduced to estimate the number of keys of the candidate 
SSTable that overlaps with other SSTables to prevent CASSANDRA-4022. However, 
this heuristic uses the token range to estimate the key overlap, which can be 
an accurate estimation if an OrderedPartitioner is used, but not for a 
RandomPartitioner, since most SSTables will contain the whole node's range. The 
result is that this heuristic not only prevents CASSANDRA-4022 from happening, 
but in my opinion it also prevents any tombstone compaction to happen at all, 
since most of the times the token ranges of all SSTables overlap. Jonathan 
Ellis even noted in the ticket discussion, I'm worried that with SizeTiered 
compaction we'll have overlap in a lot of cases where we could still compact if 
we looked closer, but this worry was not taken further and the fix was 
integrated.

* Even with the previous fix, the compaction bug reappeared in CASSANDRA-4781. 
Sylvain Lebresne noted that even with the estimation we could always end up in 
a case where the estimate thinks there is enough droppable tombstones, but in 
practice all the droppable tombstones are in overlapping ranges, and suggested 
to skip the worthDroppingTombstone check for SStables that were compacted 
before tombstone_compaction_interval seconds. The estimation was update because 
it contained a minor bug and the check for tombstone_compaction_interval was 
added.

So, after this change the code has pretty much been untouched since 1.2.0 and 
hasn't caused any problems since then, in my opinion because the tombstone 
compaction was never being triggered because of the conservative estimation, as 
illustrated in this JIRA ticket and the following mailing list threads:

* https://www.mail-archive.com/user@cassandra.apache.org/msg29979.html
* http://www.mail-archive.com/user@cassandra.apache.org/msg36144.html
* https://www.mail-archive.com/user@cassandra.apache.org/msg31760.html
* https://www.mail-archive.com/user@cassandra.apache.org/msg35793.html

In my opinion, the tombstone_compaction_interval check is sufficient to prevent 
CASSANDRA-4022 and CASSANDRA-4781, since it will give time for rows to be 
merged and the tombstone compaction to execute succesfully, reducing the 
droppable tombstone ratio and avoiding the compaction loop. So, I propose 
removing altogether the heuristic to estimate the key overlap in order to 
decide if a sstable should be tombstone compacted, keeping only the droppable 
tombstone treshold and tombstone_compaction_interval to decide that.

I'm attaching the patch for 1.2, which I'm testing in one of our production 
servers, and will provide a patch for 2.0.X soon. 

We already gained about 10% disk space due to tombstone compactions that were 
not being triggered before the patch. I will post the graphs soon with the 
space savings and decrease in droppable tombstone ratios. So far we haven't 
noticed any compaction loop.

 TTL histogram compactions not triggered at high Estimated droppable 
 tombstones rate
 -

 Key: CASSANDRA-6563
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6563
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.12ish
Reporter: Chris Burroughs
 Fix For: 1.2.17

 Attachments: 1.2.16-CASSANDRA-6563.txt


 I have several column families in a largish cluster where virtually all 
 columns are written with a (usually the same) TTL.  My understanding of 
 CASSANDRA-3442 is that sstables that have a high (  20%) estimated 
 percentage of droppable tombstones should be individually compacted.  This 
 does not appear to be occurring with size tired compaction.
 Example from one node:
 {noformat}
 $ ll /data/sstables/data/ks/Cf/*Data.db
 -rw-rw-r-- 31 cassandra cassandra 26651211757 Nov 26 22:59 
 /data/sstables/data/ks/Cf/ks-Cf-ic-295562-Data.db
 

[jira] [Commented] (CASSANDRA-6563) TTL histogram compactions not triggered at high Estimated droppable tombstones rate

2014-01-10 Thread Chris Burroughs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867958#comment-13867958
 ] 

Chris Burroughs commented on CASSANDRA-6563:


I think this may actually be CASSANDRA-6568

 TTL histogram compactions not triggered at high Estimated droppable 
 tombstones rate
 -

 Key: CASSANDRA-6563
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6563
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.12ish
Reporter: Chris Burroughs

 I have several column families in a largish cluster where virtually all 
 columns are written with a (usually the same) TTL.  My understanding of 
 CASSANDRA-3442 is that sstables that have a high (  20%) estimated 
 percentage of droppable tombstones should be individually compacted.  This 
 does not appear to be occurring with size tired compaction.
 Example from one node:
 {noformat}
 $ ll /data/sstables/data/ks/Cf/*Data.db
 -rw-rw-r-- 31 cassandra cassandra 26651211757 Nov 26 22:59 
 /data/sstables/data/ks/Cf/ks-Cf-ic-295562-Data.db
 -rw-rw-r-- 31 cassandra cassandra  6272641818 Nov 27 02:51 
 /data/sstables/data/ks/Cf/ks-Cf-ic-296121-Data.db
 -rw-rw-r-- 31 cassandra cassandra  1814691996 Dec  4 21:50 
 /data/sstables/data/ks/Cf/ks-Cf-ic-320449-Data.db
 -rw-rw-r-- 30 cassandra cassandra 10909061157 Dec 11 17:31 
 /data/sstables/data/ks/Cf/ks-Cf-ic-340318-Data.db
 -rw-rw-r-- 29 cassandra cassandra   459508942 Dec 12 10:37 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342259-Data.db
 -rw-rw-r--  1 cassandra cassandra  336908 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342307-Data.db
 -rw-rw-r--  1 cassandra cassandra 2063935 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342309-Data.db
 -rw-rw-r--  1 cassandra cassandra 409 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342314-Data.db
 -rw-rw-r--  1 cassandra cassandra31180007 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342319-Data.db
 -rw-rw-r--  1 cassandra cassandra 2398345 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342322-Data.db
 -rw-rw-r--  1 cassandra cassandra   21095 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342331-Data.db
 -rw-rw-r--  1 cassandra cassandra   81454 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342335-Data.db
 -rw-rw-r--  1 cassandra cassandra 1063718 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342339-Data.db
 -rw-rw-r--  1 cassandra cassandra  127004 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342344-Data.db
 -rw-rw-r--  1 cassandra cassandra  146785 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342346-Data.db
 -rw-rw-r--  1 cassandra cassandra  697338 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342351-Data.db
 -rw-rw-r--  1 cassandra cassandra 3921428 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342367-Data.db
 -rw-rw-r--  1 cassandra cassandra  240332 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342370-Data.db
 -rw-rw-r--  1 cassandra cassandra   45669 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342374-Data.db
 -rw-rw-r--  1 cassandra cassandra53127549 Dec 12 12:03 
 /data/sstables/data/ks/Cf/ks-Cf-ic-342375-Data.db
 -rw-rw-r-- 16 cassandra cassandra 12466853166 Dec 25 22:40 
 /data/sstables/data/ks/Cf/ks-Cf-ic-396473-Data.db
 -rw-rw-r-- 12 cassandra cassandra  3903237198 Dec 29 19:42 
 /data/sstables/data/ks/Cf/ks-Cf-ic-408926-Data.db
 -rw-rw-r--  7 cassandra cassandra  3692260987 Jan  3 08:25 
 /data/sstables/data/ks/Cf/ks-Cf-ic-427733-Data.db
 -rw-rw-r--  4 cassandra cassandra  3971403602 Jan  6 20:50 
 /data/sstables/data/ks/Cf/ks-Cf-ic-437537-Data.db
 -rw-rw-r--  3 cassandra cassandra  1007832224 Jan  7 15:19 
 /data/sstables/data/ks/Cf/ks-Cf-ic-440331-Data.db
 -rw-rw-r--  2 cassandra cassandra   896132537 Jan  8 11:05 
 /data/sstables/data/ks/Cf/ks-Cf-ic-447740-Data.db
 -rw-rw-r--  1 cassandra cassandra   963039096 Jan  9 04:59 
 /data/sstables/data/ks/Cf/ks-Cf-ic-449425-Data.db
 -rw-rw-r--  1 cassandra cassandra   232168351 Jan  9 10:14 
 /data/sstables/data/ks/Cf/ks-Cf-ic-450287-Data.db
 -rw-rw-r--  1 cassandra cassandra73126319 Jan  9 11:28 
 /data/sstables/data/ks/Cf/ks-Cf-ic-450307-Data.db
 -rw-rw-r--  1 cassandra cassandra40921916 Jan  9 12:08 
 /data/sstables/data/ks/Cf/ks-Cf-ic-450336-Data.db
 -rw-rw-r--  1 cassandra cassandra60881193 Jan  9 12:23 
 /data/sstables/data/ks/Cf/ks-Cf-ic-450341-Data.db
 -rw-rw-r--  1 cassandra cassandra4746 Jan  9 12:23 
 /data/sstables/data/ks/Cf/ks-Cf-ic-450350-Data.db
 -rw-rw-r--  1 cassandra cassandra5769 Jan  9 12:23 
 /data/sstables/data/ks/Cf/ks-Cf-ic-450352-Data.db
 {noformat}
 {noformat}
 295562: Estimated droppable tombstones: 0.899035828535183
 296121: Estimated droppable