SSDs definitely help. I think JBOD works correctly in newer versions
(CASSANDRA-6696 that separates data by token, so you dont have to rebuild
the whole node, not sure off the top of my head if there's followup
10GB probably more useful if you use vnodes than single tokens (likely to
change significantly in future versions as streaming gets more efficient).
Most people run fine on 8-16G heaps, roughly the same offheap, and use the
rest for os page cache (again, newer versions have some extra chunk
caches). Going from 128 to 256G of RAM is useful if the extra RAM will give
you some data access benefits (ie: if you have ~200G of hot data, going
from 128->256G of ram probably helps keep all the hot data in cache, that's
a huge help; if you have 4T of data and it's randomly accessed, the extra
128G of data is probably going to be mostly wasted).
At a past employer we ran thousands of nodes with 8 virtual cores (probably
4 with hyper-threading), 4T of ssd, and 32G of RAM - lots of cheap'ish
For the original post, the baseline looks reasonable. Key cache may benefit
from being larger (since you're using the full 2G), or it may be that
you're invalidating frequently so you're not benefiting as much - hard to
know which it is without knowing your access patterns. For all of those
values, you can glance at (
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html ) for
help tuning - that post is pretty darn reasonable and well annotated.
On Sat, Nov 4, 2017 at 6:28 PM, James Briggs <james.bri...@yahoo.com.invalid
> > I know that Cassandra is built for scale out on commodity hardware
> The term "commodity hardware" is not very useful, though the average
> server-class machine bought in 2017 can work.
> Netflix found that SSD helped greatly with compactions in production.
> Generally servers use 10 GB networking in 2017.
> 128 GB is commonly used, but I would use 256+ GB in new servers.
> I don't recommend the Cassandra JBOD configuration since losing
> one drive means rebuilding the node immediately, which many
> organizations aren't responsive enough to do.
> Thanks, James.
> Cassandra/MySQL DBA. Available in San Jose area or remote.
> cass_top: https://github.com/jamesbriggs/cassandra-top
> *From:* "Steinmaurer, Thomas" <thomas.steinmau...@dynatrace.com>
> *To:* "firstname.lastname@example.org" <email@example.com>
> *Sent:* Friday, November 3, 2017 6:34 AM
> *Subject:* cassandra.yaml configuration for large machines (scale up vs.
> scale out)
> I know that Cassandra is built for scale out on commodity hardware, but I
> wonder if anyone can share some experience when running Cassandra on rather
> capable machines.
> Let’s say we have a 3 node cluster with 128G RAM, 32 physical cores (16
> per CPU socket), Large Raid with Spinning Disks (so somewhere beyond 2000
> What are some recommended cassandra.yaml configuration / JVM settings,
> e.g. we have been using with something like that as a first baseline:
> · 31G heap, G1, -XX:MaxGCPauseMillis=2000
> · concurrent_compactors: 8
> · compaction_throughput_mb_per_sec: 128
> · key_cache_size_in_mb: 2048
> · concurrent_reads: 256
> · concurrent_writes: 256
> · native_transport_max_threads: 256
> Anything else we should add to our first baseline of settings?
> E.g. although we have a key cache of 2G, nodetool info gives me only 0.451
> as hit rate:
> Key Cache : entries 2919619, size 1.99 GB, capacity 2 GB,
> 71493172 hits, 158411217 requests, 0.451 recent hit rate, 14400 save period
> in seconds
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313