Does this "max load" have correlation to replication factor? IE a 3 node cluster with rf of 3. Should i be worried at {max load} X 3 or what people generally mention the max load is?
On Thu, Jun 7, 2012 at 10:55 PM, Filippo Diotalevi <fili...@ntoklo.com>wrote: > Hi, > one of latest Aaron's observation about the max load per Cassandra node > caught my attention > > At ~840GB I'm probably running close > to the max load I should have on a node, > > [AM] roughly 300GB to 400GB is the max load > Since we currently have a Cassandra node with roughly 330GB of data, it > looks like that's a good time for us to really understand what's that limit > in our case. Also, a (maybe old) Stackoverflow question at > http://stackoverflow.com/questions/4775388/how-much-data-per-node-in-cassandra-cluster > , > seems to suggest a higher limit per node. > > Just considering the compaction issues, what are the factors we need to > account to determine the max load? > > * disk space > Datastax cassandra docs state (pg 97) that a major compaction "temporarily > doubles disk space usage". Is it a safe estimate to say that the Cassandra > machine needs to have roughly the same amount of free disk space as the > current load of the Cassandra node, or are there any other factor to > consider? > > * RAM > Is the amount of RAM in the machine (or dedicated to the Cassandra node) > affecting in any way the speed/efficiency of the compaction process? > > * Performance degradation for overloaded nodes? > What kind of performance degradation can we expect for a Cassandra node > which is "overloaded"? (f.i. with 500GB or more of data) > > > Thanks for the clarifications, > -- > Filippo Diotalevi > > > -- -Ben