Hi, one of latest Aaron's observation about the max load per Cassandra node caught my attention > At ~840GB I'm probably running close > to the max load I should have on a node,[AM] roughly 300GB to 400GB is the > max load Since we currently have a Cassandra node with roughly 330GB of data, it looks like that's a good time for us to really understand what's that limit in our case. Also, a (maybe old) Stackoverflow question at http://stackoverflow.com/questions/4775388/how-much-data-per-node-in-cassandra-cluster , seems to suggest a higher limit per node.
Just considering the compaction issues, what are the factors we need to account to determine the max load? * disk space Datastax cassandra docs state (pg 97) that a major compaction "temporarily doubles disk space usage". Is it a safe estimate to say that the Cassandra machine needs to have roughly the same amount of free disk space as the current load of the Cassandra node, or are there any other factor to consider? * RAM Is the amount of RAM in the machine (or dedicated to the Cassandra node) affecting in any way the speed/efficiency of the compaction process? * Performance degradation for overloaded nodes? What kind of performance degradation can we expect for a Cassandra node which is "overloaded"? (f.i. with 500GB or more of data) Thanks for the clarifications, -- Filippo Diotalevi