Re: Sizing a Cassandra cluster

aaron morton Thu, 24 Mar 2011 13:52:54 -0700

Big old guess of something in the 1000's. 

Try benchmarking your work load and plug the numbers (my 5m is pretty high) 
in...


- 8 cores * 8 writers per core = 64 if each write request takes 5ms  = 1280 max 
per sec
- 1 spindle * 16 readers per spindle = 16 readers if each read request takes 
5ms =  320 max per sec
(reader and writer sizes from the help in conf/cassandra.yaml)

This is really just a guess, there are a lot more things going on in the system 
and it gets even more complicated once it's turned on. But I know sometimes you 
just need to show you've thought about it :)

Hope that helps.
Aaron

On 25 Mar 2011, at 02:27, Brian Fitzpatrick wrote:

> Thanks for the tips on the replication factor.  Any thoughts on the
> number of nodes in a cluster to support an RF=3 with a workload of 400
> ops/sec (4-8K sized rows, 50/50 read/write)?  Based on the "sweet
> spot" hardware referenced in the wiki (8-core, 16-32GB RAM), what kink
> of ops/sec could I reasonably expect from each node.  Just looking for
> a range to make some educated guesses.
> 
> Thanks,
> Brian
> 
> On Wed, Mar 23, 2011 at 9:04 PM, aaron morton <aa...@thelastpickle.com> wrote:
>> It really does depend on what your workload is like, and in the end will
>> involve a certain amount of fudge factor.
>> 
>> http://wiki.apache.org/cassandra/CassandraHardware provides some guidance.
>> http://wiki.apache.org/cassandra/MemtableThresholds can be used to get a
>> rough idea of the memory requirements. Note that secondary indexes are also
>> CF's with the same memory settings as the parent.
>> With RF3 you can lose afford to lose one replica for a key a token range and
>> still be available (Assuming Quorum CL). With RF 5 you can lose 2 replicas
>> and still be available for the keys in the range.
>> I'm been careful to say "lose X replicas" because the other nodes in the
>> cluster don't count when considering an operation for a key. Two examples, 9
>> node cluster with RF3. If you lose nodes 2 and 3 and they are replicas for
>> node 1, Quorum operations on keys in the range for node 1 will fail (ranges
>> for 2 and 3 will be ok). If you lose nodes 2 and 5 Quorum operations will
>> succeed for all keys.
>> RF 3 is reasonable starting point for some redundancy, RF 5 is more. After
>> that it's Web Scale (tm).
>> Hope that helps
>> Aaron
>> 
>> On 24 Mar 2011, at 04:04, Brian Fitzpatrick wrote:
>> 
>> I'm going through the process of specing out the hardware for a
>> Cassandra cluster. The relevant specs:
>> 
>> - Support 460 operations/sec (50/50 read/write workload). Row size
>> ranges from 4 to 8K.
>> - Support 29 million objects for the first year
>> - Support 365 GB storage for the first year, based on Cassandra tests
>> (data + index + overhead * replication factor of 3)
>> 
>> I'm looking for advice on the node size for this cluster, recommended
>> RAM per node, and whether RF=3 seems to be a good choice for general
>> availability and resistance to failure.
>> 
>> I've looked at the YCSB benchmark paper and through the archives of
>> this email list looking for pointers.  I haven't found any general
>> guidelines on recommended cluster size to support X operations/sec
>> with Y data size at RF factor of Z, that I could extrapolate from.
>> 
>> Any and all recommendations appreciated.
>> 
>> Thanks,
>> Brian
>> 
>>

Re: Sizing a Cassandra cluster

Reply via email to