Re: cassandra.yaml configuration for large machines (scale up vs. scale out)

2017-11-04 Thread Jeff Jirsa
SSDs definitely help. I think JBOD works correctly in newer versions
(CASSANDRA-6696 that separates data by token, so you dont have to rebuild
the whole node, not sure off the top of my head if there's followup
tickets).
10GB probably more useful if you use vnodes than single tokens (likely to
change significantly in future versions as streaming gets more efficient).
Most people run fine on 8-16G heaps, roughly the same offheap, and use the
rest for os page cache (again, newer versions have some extra chunk
caches). Going from 128 to 256G of RAM is useful if the extra RAM will give
you some data access benefits (ie: if you have ~200G of hot data, going
from 128->256G of ram probably helps keep all the hot data in cache, that's
a huge help; if you have 4T of data and it's randomly accessed, the extra
128G of data is probably going to be mostly wasted).

At a past employer we ran thousands of nodes with 8 virtual cores (probably
4 with hyper-threading), 4T of ssd, and 32G of RAM - lots of cheap'ish
machines.

For the original post, the baseline looks reasonable. Key cache may benefit
from being larger (since you're using the full 2G), or it may be that
you're invalidating frequently so you're not benefiting as much - hard to
know which it is without knowing your access patterns. For all of those
values, you can glance at (
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html ) for
help tuning - that post is pretty darn reasonable and well annotated.


On Sat, Nov 4, 2017 at 6:28 PM, James Briggs <james.bri...@yahoo.com.invalid
> wrote:

> > I know that Cassandra is built for scale out on commodity hardware
>
> The term "commodity hardware" is not very useful, though the average
> server-class machine bought in 2017 can work.
>
> Netflix found that SSD helped greatly with compactions in production.
> Generally servers use 10 GB networking in 2017.
>
> 128 GB is commonly used, but I would use 256+ GB in new servers.
>
> I don't recommend the Cassandra JBOD configuration since losing
> one drive means rebuilding the node immediately, which many
> organizations aren't responsive enough to do.
>
> Thanks, James.
> --
> Cassandra/MySQL DBA. Available in San Jose area or remote.
> cass_top: https://github.com/jamesbriggs/cassandra-top
>
>
> --
> *From:* "Steinmaurer, Thomas" <thomas.steinmau...@dynatrace.com>
> *To:* "user@cassandra.apache.org" <user@cassandra.apache.org>
> *Sent:* Friday, November 3, 2017 6:34 AM
> *Subject:* cassandra.yaml configuration for large machines (scale up vs.
> scale out)
>
> Hello,
>
> I know that Cassandra is built for scale out on commodity hardware, but I
> wonder if anyone can share some experience when running Cassandra on rather
> capable machines.
>
> Let’s say we have a 3 node cluster with 128G RAM, 32 physical cores (16
> per CPU socket), Large Raid with Spinning Disks (so somewhere beyond 2000
> IOPS).
>
> What are some recommended cassandra.yaml configuration / JVM settings,
> e.g. we have been using with something like that as a first baseline:
> · 31G heap, G1, -XX:MaxGCPauseMillis=2000
> · concurrent_compactors: 8
> · compaction_throughput_mb_per_sec: 128
> · key_cache_size_in_mb: 2048
> · concurrent_reads: 256
> · concurrent_writes: 256
> · native_transport_max_threads: 256
>
> Anything else we should add to our first baseline of settings?
>
> E.g. although we have a key cache of 2G, nodetool info gives me only 0.451
> as hit rate:
>
> Key Cache  : entries 2919619, size 1.99 GB, capacity 2 GB,
> 71493172 hits, 158411217 requests, 0.451 recent hit rate, 14400 save period
> in seconds
>
>
> Thanks,
> Thomas
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>
>
>
>


Re: cassandra.yaml configuration for large machines (scale up vs. scale out)

2017-11-04 Thread James Briggs
> I know that Cassandra is built for scale out on commodity hardware
The term "commodity hardware" is not very useful, though the 
averageserver-class machine bought in 2017 can work.

Netflix found that SSD helped greatly with compactions in production.Generally 
servers use 10 GB networking in 2017.
128 GB is commonly used, but I would use 256+ GB in new servers.
 I don't recommend the Cassandra JBOD configuration since losingone drive means 
rebuilding the node immediately, which manyorganizations aren't responsive 
enough to do.

Thanks, James.
--
Cassandra/MySQL DBA. Available in San Jose area or remote.
cass_top: https://github.com/jamesbriggs/cassandra-top

  From: "Steinmaurer, Thomas" <thomas.steinmau...@dynatrace.com>
 To: "user@cassandra.apache.org" <user@cassandra.apache.org> 
 Sent: Friday, November 3, 2017 6:34 AM
 Subject: cassandra.yaml configuration for large machines (scale up vs. scale 
out)
   
 Hello,    I know that Cassandra is built for scale out 
on commodity hardware, but I wonder if anyone can share some experience when 
running Cassandra on rather capable machines.    Let’s say we have a 3 node 
cluster with 128G RAM, 32 physical cores (16 per CPU socket), Large Raid with 
Spinning Disks (so somewhere beyond 2000 IOPS).    What are some recommended 
cassandra.yaml configuration / JVM settings, e.g. we have been using with 
something like that as a first baseline: ·31G heap, G1, 
-XX:MaxGCPauseMillis=2000 ·concurrent_compactors: 8 ·
compaction_throughput_mb_per_sec: 128 ·key_cache_size_in_mb: 2048 · 
   concurrent_reads: 256 ·concurrent_writes: 256 ·
native_transport_max_threads: 256    Anything else we should add to our first 
baseline of settings?    E.g. although we have a key cache of 2G, nodetool info 
gives me only 0.451 as hit rate:    Key Cache  : entries 2919619, 
size 1.99 GB, capacity 2 GB, 71493172 hits, 158411217 requests, 0.451 recent 
hit rate, 14400 save period in seconds       Thanks, Thomas    The contents of 
this e-mail are intended for the named addressee only. It contains information 
that may be confidential. Unless you are the named addressee or an authorized 
designee, you may not copy or use it, or disclose it to anyone else. If you 
received it in error please notify us immediately and then destroy it. 
Dynatrace Austria GmbH (registration number FN 91482h) is a company registered 
in Linz whose registered office is at 4040 Linz, Austria, Freistädterstraße 313

   

cassandra.yaml configuration for large machines (scale up vs. scale out)

2017-11-03 Thread Steinmaurer, Thomas
Hello,

I know that Cassandra is built for scale out on commodity hardware, but I 
wonder if anyone can share some experience when running Cassandra on rather 
capable machines.

Let's say we have a 3 node cluster with 128G RAM, 32 physical cores (16 per CPU 
socket), Large Raid with Spinning Disks (so somewhere beyond 2000 IOPS).

What are some recommended cassandra.yaml configuration / JVM settings, e.g. we 
have been using with something like that as a first baseline:

* 31G heap, G1, -XX:MaxGCPauseMillis=2000

* concurrent_compactors: 8

* compaction_throughput_mb_per_sec: 128

* key_cache_size_in_mb: 2048

* concurrent_reads: 256

* concurrent_writes: 256

* native_transport_max_threads: 256

Anything else we should add to our first baseline of settings?

E.g. although we have a key cache of 2G, nodetool info gives me only 0.451 as 
hit rate:

Key Cache  : entries 2919619, size 1.99 GB, capacity 2 GB, 71493172 
hits, 158411217 requests, 0.451 recent hit rate, 14400 save period in seconds


Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313