Unsubscribe On Mon, Sep 11, 2017 at 4:48 PM, Paul Pollack <paul.poll...@klaviyo.com> wrote:
> Hi, > > We run 48 node cluster that stores counts in wide rows. Each node is using > roughly 1TB space on a 2TB EBS gp2 drive for data directory and > LeveledCompactionStrategy. We have been trying to bootstrap new nodes that > use a raid0 configuration over 2 1TB EBS drives to increase I/O throughput > cap from 160 MB/s to 250 MB/s (AWS limits). Every time a node finishes > streaming it is bombarded by a large number of compactions. We see CPU load > on the new node spike extremely high and CPU load on all the other nodes in > the cluster drop unreasonably low. Meanwhile our app's latency for writes > to this cluster average 10 seconds or greater. We've already tried > throttling compaction throughput to 1 mbps and we've always had > concurrent_compactors set to 2 but the disk is still saturated. In every > case we have had to shut down the Cassandra process on the new node to > resume acceptable operations. > > We're currently upgrading all of our clients to use the 3.11.0 version of > the DataStax Python driver, which will allow us to add our next newly > bootstrapped node to a blacklist, hoping that if it doesn't accept writes > the rest of the cluster can serve them adequately (as is the case whenever > we turn down the bootstrapping node), and allow it to finish its > compactions. > > We were also interested in hearing if anyone has had much luck using the > sstableofflinerelevel tool, and if this is a reasonable approach for our > issue. > > One of my colleagues found a post where a user had a similar issue and > found that bloom filters had an extremely high false positive ratio, and > although I didn't check that during any of these attempts to bootstrap it > seems to me like if we have that many compactions to do we're likely to > observe that same thing. > > Would appreciate any guidance anyone can offer. > > Thanks, > Paul >