Not sure if these are what Jeff was referring to but as a workaround, you can configure the following STCS compaction subproperties: - min_threshold - set to 2 so that only a minimum of 2 similar-sized sstables are required to trigger a minor compaction instead of the default 4 - tombstone_threshold - set to 0.1 so that if at least 10% of an sstable are tombstones, Cassandra will compact the table alone instead of waiting for the higher default ratio of 0.2 - unchecked_tombstone_compaction - set to true to allow Cassandra to run tombstone compaction without having to check if an sstable is eligible for compaction
WARNING - For future reference, this is just a workaround. It isn't a fix for clusters with bad data models. Consider these as buying your cluster some breathing space while you redesign your data model. Cheers! On Thu, Aug 10, 2017 at 12:27 AM, Jeff Jirsa <jji...@gmail.com> wrote: > The deleting compaction strategy from protectwise (https://github.com/ > protectwise/cassandra-util/blob/master/deleting- > compaction-strategy/README.md) was written (I believe) to solve a similar > problem - business based deletion rules to enable flexible TTLs. May want > to glance at that. > > Other answers inline below > > > -- > Jeff Jirsa > > > On Aug 9, 2017, at 1:41 AM, Steinmaurer, Thomas < > thomas.steinmau...@dynatrace.com> wrote: > > Hello, > > > > our top contributor from a data volume perspective is time series data. We > are running with STCS since our initial production deployment in 2014 with > several clusters with a varying number of nodes, but currently with max. 9 > nodes per single cluster per different region in AWS with m4.xlarge / EBS > gp2 storage. We have a road of Cassandra versions starting with 1.2 to > actually using DSC 2.1.15 soon to be replaced by Apache Cassandra 2.1.18 > across all deployments. Lately we switched from Thrift (Astyanax) to > Native/CQL (DataStax driver). Overall we are pretty happy with stability > and the scale out offering. > > > > We store time series data in different resolutions, from 1min up to 1day > aggregates per “time slot”. Each resolution has its own column family / > table and a periodic worker is executing our business logic regarding time > series aging from e.g. 1min => 5min => … resolution + deletion in source > resolutions according to our retention per resolution policy. So deletions > will happen way later (e.g. at least > 14d). We don’t use TTLs on written > time series data (in production, see TWCS testing below), so purging is > exclusively handled by explicit DELETEs in our aging business logic > creating tombstones. > > > > Naturally with STCS and late explicit deletions / tombstones, it will take > a lot of time to finally reclaim disk space, even worse, we are now running > a major compaction every X weeks. We currently are also testing with STCS > min_threshold = 2 etc., but all in all, this all feels not being a > long-term solution. I guess there is nothing else we are missing from a > configuration/setting side with STCS? Single SSTable compaction might not > kick in as well, cause checking with sstablemeta, estimated droppable > tombstones for our time series based SSTables is pretty much 0.0 all the > time. I guess as we don’t write with TTL? > > > > Or you aren't issuing deletes, explicit deletes past GCGS will cause that > number to increase > > > > TWCS caught my eyes in 2015 I think, and even more at the Cassandra Summit > 2016 + other Tombstone related talks. Cassandra 3.0 is around 6 months > ahead for us, thus initial testing was with 2.1.18 patched with TWCS from > GitHub. > > > > Looks like TWCS is exactly what we need, thus test-wise, once we start > writing with TTL we end up with a single SSTable per passed window size and > data (SSTables) older than TTL + grace get automatically removed from disk. > Even with enabled out-of-orders DELETEs from our business logic, purging > SSTables seems not be stucked. Not sure if this is expected. Writing with > TTL is also a bit problematic, in case our retention policy changes in > general or for particular customers. > > > Search for my Cassandra summit talk from 2016 - there's a few other > compaction options you probably want to set to more aggressively trigger > single sstable compaction to help unstick it. > > > > A few questions, as we need some short-term (with C* 2.1) and long-term > (with C* 3.0) mitigation: > > · With STCS, estimated droppable tombstones being always 0.0 > (thus also no automatic single SSTable compaction may happen): Is this a > matter of not writing with TTL? If yes, would enabling TTL with STCS > improve the disk reclaim situation, cause then single SSTAble compactions > will kick in? > > · What is the semantic of “default_time_to_live” at table level? > From: http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html > : “After the default_time_to_live TTL value has been exceed, Cassandra > tombstones the entire table”. What does “entire table” mean? > > > It probably means sstable, but even that isn't really accurate - that's a > doc bug > > Hopefully / I guess I don’t end up with an empty table every X past TTLs? > > · Anything else I’m missing regarding STCS and reclaiming disk > space earlier in our TS use case? > > > LCS rewrites much more aggressively on partition updates - if you can > spare the IO it's likely going to be more efficient purging deleted data > than STCS > > · I know, changing compaction is a matter of executing ALTER > TABLE (or temporary via JMX for a single node), but as we have legacy data > being written without TTL, I wonder if we may end up in stuck SSTable again > > · In case of stuck SSTables with any compaction strategy, what is > the best way to debug/analyze why it got stuck (overlapping etc.)? > > > sstableexpiredblockers > > > > Thanks a lot and sorry for the lengthy email. > > > > Thomas > The contents of this e-mail are intended for the named addressee only. It > contains information that may be confidential. Unless you are the named > addressee or an authorized designee, you may not copy or use it, or > disclose it to anyone else. If you received it in error please notify us > immediately and then destroy it. Dynatrace Austria GmbH (registration > number FN 91482h) is a company registered in Linz whose registered office > is at 4040 Linz, Austria, Freistädterstraße 313 > >