Hi, 
one of latest Aaron's observation about the max load per Cassandra node caught 
my attention
> At ~840GB I'm probably running close
> to the max load I should have on a node,[AM] roughly 300GB to 400GB is the 
> max load
Since we currently have a Cassandra node with roughly 330GB of data, it looks 
like that's a good time for us to really understand what's that limit in our 
case. Also, a (maybe old) Stackoverflow question at 
http://stackoverflow.com/questions/4775388/how-much-data-per-node-in-cassandra-cluster
 , seems to suggest a higher limit per node.

Just considering the compaction issues, what are the factors we need to account 
to determine the max load? 

* disk space
Datastax cassandra docs state (pg 97) that a major compaction "temporarily 
doubles disk space usage". Is it a safe estimate to say that the Cassandra 
machine needs to have roughly the same amount of free disk space as the current 
load of the Cassandra node, or are there any other factor to consider?

* RAM
Is the amount of RAM in the machine (or dedicated to the Cassandra node) 
affecting in any way the speed/efficiency of the compaction process? 

* Performance degradation for overloaded nodes?
What kind of performance degradation can we expect for a Cassandra node which 
is "overloaded"? (f.i. with 500GB or more of data)


Thanks for the clarifications,
-- 
Filippo Diotalevi


Reply via email to