RE: Using DTCS, TTL but old SSTables not being removed

2015-09-15 Thread Jacques-Henri Berthemet
Hi,

Any idea when 2.2.2 will be released?
I see there are still 3 issues left to fix:
https://issues.apache.org/jira/browse/CASSANDRA/fixforversion/1219/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-summary-panel


--
Jacques-Henri Berthemet

-Original Message-
From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com] 
Sent: dimanche 13 septembre 2015 20:34
To: user@cassandra.apache.org
Subject: Re: Using DTCS, TTL but old SSTables not being removed

2.2.1 has a pretty significant bug in compaction: 
https://issues.apache.org/jira/browse/CASSANDRA-10270

That prevents it from compacting files after 60 minutes. It may or may not be 
the cause of the problem you’re seeing, but it seems like it may be possibly 
related, and you can try the workaround in that ticket to see if it helps.





On 9/13/15, 10:54 AM, "Phil Budne" <p...@ultimate.com> wrote:

>Running Cassandra 2.2.1 on 3 nodes (on EC2, from Datastax AMI, then
>upgraded).  Inserting time-series data; All entries with TTL to expire
>3 hours after the "actual_time" of the observation.  Entries arrive
>with varied delay, and often in duplicate. Data is expiring (no longer
>visible from CQL), but old SSTables are not being removed (except on
>restart).
>
>CREATE KEYSPACE thing
>WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '2'}
>AND durable_writes = true;
>
>CREATE TABLE thing.thing_ia (
>id int,
>actual_time timestamp,
>data text,
>PRIMARY KEY (id, actual_time)
>) WITH CLUSTERING ORDER BY (actual_time ASC)
>AND bloom_filter_fp_chance = 0.01
>AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>AND comment = ''
>AND compaction = {'tombstone_threshold': '0.1', 
> 'tombstone_compaction_interval': '600', 'class': 
> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
>AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>AND dclocal_read_repair_chance = 0.1
>AND default_time_to_live = 0
>AND gc_grace_seconds = 60
>AND max_index_interval = 2048
>AND memtable_flush_period_in_ms = 0
>AND min_index_interval = 128
>AND read_repair_chance = 0.0
>AND speculative_retry = '99.0PERCENTILE';
>
>All times shown in UTC:
>
>$ python -c 'import time; print int(time.time())'
>1442166347
>
>$ date
>Sun Sep 13 17:46:19 UTC 2015
>
>$ cat ~/mmm.sh
>for x in la-*Data.db; do
>ls -l $x
>~/meta.sh $x >/tmp/mmm/$x
>head < /tmp/mmm/$x
>echo 
>grep Ances /tmp/mmm/$x
>echo ''
>done
>
>$ sh ~/mmm.sh 
>-rw-r--r-- 1 cassandra cassandra 31056032 Sep 12 05:41 la-203-big-Data.db
>SSTable: /raid0/cassandra/data/thing.thing_ia-.../la-203-big
>Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>Bloom Filter FP chance: 0.01
>Minimum timestamp: 1442025790163000
>Maximum timestamp: 1442034620451000
>SSTable max local deletion time: 1442045239
>Compression ratio: -1.0
>Estimated droppable tombstones: 0.946418951062831
>SSTable Level: 0
>Repaired at: 0
>
>Ancestors: [202]
>
>-rw-r--r-- 1 cassandra cassandra 23647585 Sep 12 06:09 la-204-big-Data.db
>SSTable: /raid0/cassandra/data/thing.thing_ia-.../la-204-big
>Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>Bloom Filter FP chance: 0.01
>Minimum timestamp: 1442034620472000
>Maximum timestamp: 1442038188419002
>SSTable max local deletion time: 1442073136
>Compression ratio: -1.0
>Estimated droppable tombstones: 0.9163514458998852
>SSTable Level: 0
>Repaired at: 0
>
>Ancestors: []
>
>-rw-r--r-- 1 cassandra cassandra 23456946 Sep 12 07:25 la-205-big-Data.db
>SSTable: /raid0/cassandra/data/thing.thing_ia-.../la-205-big
>Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>Bloom Filter FP chance: 0.01
>Minimum timestamp: 1442038188472000
>Maximum timestamp: 1442042703834001
>SSTable max local deletion time: 1442053303
>Compression ratio: -1.0
>Estimated droppable tombstones: 0.9442594560554178
>SSTable Level: 0
>Repaired at: 0
>
>Ancestors: []
>
>-rw-r--r-- 1 cassandra cassandra 23331024 Sep 12 08:11 la-206-big-Data.db
>SSTable: /raid0/cassandra/data/thing.thing_ia-.../la-206-big
>Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>Bloom Filter FP chance: 0.01
>Minimum timestamp: 1442042703845000
>Maximum timestamp: 1442045482391000
>SSTable max local deletion time: 1442056194
>Compression ratio: -1.0
>Estimated droppable tombstones: 0.922422134865437
>SSTable Level: 0
>Repaired at: 0
>
>Ancestors: []
>
>-rw-r--r-- 1 cassandra cassandra 23699494 Sep 1

Re: Using DTCS, TTL but old SSTables not being removed

2015-09-13 Thread Jeff Jirsa
2.2.1 has a pretty significant bug in compaction: 
https://issues.apache.org/jira/browse/CASSANDRA-10270

That prevents it from compacting files after 60 minutes. It may or may not be 
the cause of the problem you’re seeing, but it seems like it may be possibly 
related, and you can try the workaround in that ticket to see if it helps.





On 9/13/15, 10:54 AM, "Phil Budne"  wrote:

>Running Cassandra 2.2.1 on 3 nodes (on EC2, from Datastax AMI, then
>upgraded).  Inserting time-series data; All entries with TTL to expire
>3 hours after the "actual_time" of the observation.  Entries arrive
>with varied delay, and often in duplicate. Data is expiring (no longer
>visible from CQL), but old SSTables are not being removed (except on
>restart).
>
>CREATE KEYSPACE thing
>WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '2'}
>AND durable_writes = true;
>
>CREATE TABLE thing.thing_ia (
>id int,
>actual_time timestamp,
>data text,
>PRIMARY KEY (id, actual_time)
>) WITH CLUSTERING ORDER BY (actual_time ASC)
>AND bloom_filter_fp_chance = 0.01
>AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>AND comment = ''
>AND compaction = {'tombstone_threshold': '0.1', 
> 'tombstone_compaction_interval': '600', 'class': 
> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
>AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>AND dclocal_read_repair_chance = 0.1
>AND default_time_to_live = 0
>AND gc_grace_seconds = 60
>AND max_index_interval = 2048
>AND memtable_flush_period_in_ms = 0
>AND min_index_interval = 128
>AND read_repair_chance = 0.0
>AND speculative_retry = '99.0PERCENTILE';
>
>All times shown in UTC:
>
>$ python -c 'import time; print int(time.time())'
>1442166347
>
>$ date
>Sun Sep 13 17:46:19 UTC 2015
>
>$ cat ~/mmm.sh
>for x in la-*Data.db; do
>ls -l $x
>~/meta.sh $x >/tmp/mmm/$x
>head < /tmp/mmm/$x
>echo 
>grep Ances /tmp/mmm/$x
>echo ''
>done
>
>$ sh ~/mmm.sh 
>-rw-r--r-- 1 cassandra cassandra 31056032 Sep 12 05:41 la-203-big-Data.db
>SSTable: /raid0/cassandra/data/thing.thing_ia-.../la-203-big
>Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>Bloom Filter FP chance: 0.01
>Minimum timestamp: 1442025790163000
>Maximum timestamp: 1442034620451000
>SSTable max local deletion time: 1442045239
>Compression ratio: -1.0
>Estimated droppable tombstones: 0.946418951062831
>SSTable Level: 0
>Repaired at: 0
>
>Ancestors: [202]
>
>-rw-r--r-- 1 cassandra cassandra 23647585 Sep 12 06:09 la-204-big-Data.db
>SSTable: /raid0/cassandra/data/thing.thing_ia-.../la-204-big
>Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>Bloom Filter FP chance: 0.01
>Minimum timestamp: 1442034620472000
>Maximum timestamp: 1442038188419002
>SSTable max local deletion time: 1442073136
>Compression ratio: -1.0
>Estimated droppable tombstones: 0.9163514458998852
>SSTable Level: 0
>Repaired at: 0
>
>Ancestors: []
>
>-rw-r--r-- 1 cassandra cassandra 23456946 Sep 12 07:25 la-205-big-Data.db
>SSTable: /raid0/cassandra/data/thing.thing_ia-.../la-205-big
>Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>Bloom Filter FP chance: 0.01
>Minimum timestamp: 1442038188472000
>Maximum timestamp: 1442042703834001
>SSTable max local deletion time: 1442053303
>Compression ratio: -1.0
>Estimated droppable tombstones: 0.9442594560554178
>SSTable Level: 0
>Repaired at: 0
>
>Ancestors: []
>
>-rw-r--r-- 1 cassandra cassandra 23331024 Sep 12 08:11 la-206-big-Data.db
>SSTable: /raid0/cassandra/data/thing.thing_ia-.../la-206-big
>Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>Bloom Filter FP chance: 0.01
>Minimum timestamp: 1442042703845000
>Maximum timestamp: 1442045482391000
>SSTable max local deletion time: 1442056194
>Compression ratio: -1.0
>Estimated droppable tombstones: 0.922422134865437
>SSTable Level: 0
>Repaired at: 0
>
>Ancestors: []
>
>-rw-r--r-- 1 cassandra cassandra 23699494 Sep 12 09:11 la-207-big-Data.db
>SSTable: /raid0/cassandra/data/thing.thing_ia-.../la-207-big
>Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>Bloom Filter FP chance: 0.01
>Minimum timestamp: 1442045482398001
>Maximum timestamp: 144204909216
>SSTable max local deletion time: 1442059681
>Compression ratio: -1.0
>Estimated droppable tombstones: 0.9327568753815364
>SSTable Level: 0
>Repaired at: 0
>
>Ancestors: []
>
>-rw-r--r-- 1 cassandra cassandra 23900518 Sep 12 10:11 la-208-big-Data.db
>SSTable: /raid0/cassandra/data/thing.thing_ia-.../la-208-big
>Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>Bloom Filter FP chance: 0.01
>Minimum timestamp: 1442049092164001
>Maximum timestamp: 1442052684468000
>SSTable max local deletion time: 1442063293
>Compression ratio: -1.0
>Estimated droppable tombstones: 0.9249749035769007
>SSTable 

Re: Using DTCS, TTL but old SSTables not being removed

2015-09-13 Thread Phil Budne
Jeff Jirsa wrote:
> 2.2.1 has a pretty significant bug in compaction: 
> https://issues.apache.org/jira/browse/CASSANDRA-10270
>
> That prevents it from compacting files after 60 minutes. It may or
> may not be the cause of the problem you're seeing, but it seems like
> it may be possibly related, and you can try the workaround in that
> ticket to see if it helps.

Thanks! So far so good

I've tweaked index_summary_resize_interval_in_minutes
from 60 to -1, and restarted and I'm continuing to see:

. CompactionController.java:153 - Dropping expired SSTable 

debug messages(*) after more than an hour after the restart.

I'll know better as time goes by

(*) I've set the logging level for
org.apache.cassandra.db.compaction.DateTieredCompactionStrategy and
org.apache.cassandra.db.compaction.CompactionController to DEBUG