Re: Adding new nodes to cluster to speedup pending compactions

2018-04-28 Thread Mikhail Tsaplin
Thanks, everybody. Setting compaction throughput improved compaction performance. On overloaded cluster number of SSTables dropped from ~16K to ~7K. This way I can wait until it will be stabilized. PS This task is a one time process - I am upgrading Elasticsearch from v2 to v6 and once I have

Re: Adding new nodes to cluster to speedup pending compactions

2018-04-27 Thread Evelyn Smith
Hi Mikhall, There are a few ways to speed up compactions in the short term: - nodetool setcompactionthroughput 0 This will unthrottle compactions but obviously unthrottling compactions puts you at risk of high latency while compactions are running. - nodetool setconcurrentcompactors 2 You

Re: Adding new nodes to cluster to speedup pending compactions

2018-04-27 Thread Jonathan Haddad
Your compaction time won't improve immediately simply by adding nodes because the old data still needs to be cleaned up. What's your end goal? Why is having a spike in pending compaction tasks following a massive write an issue? Are you seeing a dip in performance, violating an SLA, or do you

Re: Adding new nodes to cluster to speedup pending compactions

2018-04-27 Thread Mikhail Tsaplin
The cluster has 5 nodes of d2.xlarge AWS type (32GB RAM, Attached SSD disks), Cassandra 3.0.9. Increased compaction throughput from 16 to 200 - active compaction remaining time decreased. What will happen if another node will join the cluster? - will former nodes move part of theirs SSTables to

Re: Adding new nodes to cluster to speedup pending compactions

2018-04-27 Thread Nicolas Guyomar
Hi Mikhail, Could you please provide : - your cluster version/topology (number of nodes, cpu, ram available etc) - what kind of underlying storage you are using - cfstat using -H option cause I'm never sure I'm converting bytes=>GB You are storing 1Tb per node, so long running compaction is not

Adding new nodes to cluster to speedup pending compactions

2018-04-27 Thread Mikhail Tsaplin
Hi, I have a five nodes C* cluster suffering from a big number of pending compaction tasks: 1) 571; 2) 91; 3) 367; 4) 22; 5) 232 Initially, it was holding one big table (table_a). With Spark, I read that table, extended its data and stored in a second table_b. After this copying/extending process