Re: Adding new nodes to cluster to speedup pending compactions

2018-04-27 Thread Evelyn Smith
Hi Mikhall, There are a few ways to speed up compactions in the short term: - nodetool setcompactionthroughput 0 This will unthrottle compactions but obviously unthrottling compactions puts you at risk of high latency while compactions are running. - nodetool setconcurrentcompactors 2 You

Determining active sstables and table- dir

2018-04-27 Thread Carl Mueller
IN cases where a table was dropped and re-added, there are now two table directories with different uuids with sstables. If you don't have knowledge of which one is active, how do you determine which is the active table directory? I have tried cf_id from system.schema_columnfamilies and that can

Re: Switching to TWCS

2018-04-27 Thread Jonathan Haddad
TWCS uses the max timestamp in an sstable to determine what to compact together, it won't anti-compact your data. The goal is to minimize I/O. You'll have to wait for all your mixed-timestamp sstable data to TTL out before TWCS's windowing kicks in optimally.

Re: Switching to TWCS

2018-04-27 Thread Pranay akula
Yes the data is TTLed, but I don't think that's the criteria for the TWCS. My understanding is the data is divided into buckets based on written timestamp. Thanks Pranay On Fri, Apr 27, 2018, 1:17 PM Nitan Kainth wrote: > Is old data TTLed already? If not, then I don't

Re: Switching to TWCS

2018-04-27 Thread Nitan Kainth
Is old data TTLed already? If not, then I don't think TWCS will know when to delete data. My understanding about TWCS is, data has to be written with TTL. (Please correct me, if wrong) Regards, Nitan K. Cassandra and Oracle Architect/SME Datastax Certified Cassandra expert Oracle 10g Certified

Switching to TWCS

2018-04-27 Thread Pranay akula
Hi, Testing to switch from sizetiered to Timewindow, did changed compaction strategy on a table with a buckets of 3 days After switching when I checked min and max timestamps on sstables I did see data older than 3 days range in my case 30-60 days So when we switch from sizetired to Timewindow,

Re: Adding new nodes to cluster to speedup pending compactions

2018-04-27 Thread Jonathan Haddad
Your compaction time won't improve immediately simply by adding nodes because the old data still needs to be cleaned up. What's your end goal? Why is having a spike in pending compaction tasks following a massive write an issue? Are you seeing a dip in performance, violating an SLA, or do you

Re: Adding new nodes to cluster to speedup pending compactions

2018-04-27 Thread Mikhail Tsaplin
The cluster has 5 nodes of d2.xlarge AWS type (32GB RAM, Attached SSD disks), Cassandra 3.0.9. Increased compaction throughput from 16 to 200 - active compaction remaining time decreased. What will happen if another node will join the cluster? - will former nodes move part of theirs SSTables to

Re: Adding new nodes to cluster to speedup pending compactions

2018-04-27 Thread Nicolas Guyomar
Hi Mikhail, Could you please provide : - your cluster version/topology (number of nodes, cpu, ram available etc) - what kind of underlying storage you are using - cfstat using -H option cause I'm never sure I'm converting bytes=>GB You are storing 1Tb per node, so long running compaction is not

Adding new nodes to cluster to speedup pending compactions

2018-04-27 Thread Mikhail Tsaplin
Hi, I have a five nodes C* cluster suffering from a big number of pending compaction tasks: 1) 571; 2) 91; 3) 367; 4) 22; 5) 232 Initially, it was holding one big table (table_a). With Spark, I read that table, extended its data and stored in a second table_b. After this copying/extending process