On Tue, Oct 26, 2010 at 1:45 PM, Stu Hood <[email protected]> wrote:
> While the "adding virtual tokens/nodes to Cassandra" discussion is a good 
> one, there are a few factors that might delay (or remove?) the necessity of 
> adding that complexity:
>
> * In Cassandra 0.7, removing load from a node is fairly cheap: a bounded 
> number of reads are used to determine which portions of the large sorted data 
> files (sstables) to stream, followed by "sendfile" calls to deliver the data 
> to the destination
> * For a replication factor RF, RF nodes can send data to a new node: this 
> means that to have all existing N nodes in your cluster participate in adding 
> K nodes, you only need to add N / RF = K nodes per expansion: this is a much 
> easier factor to achieve than a power of 2.
>
> While the added nodes will not be immediately balanced, there are some 
> possible improvements to our existing load-balancing facilities to better 
> handle unbalanced cases: see 
> https://issues.apache.org/jira/browse/CASSANDRA-1418
>
> Finally, virtual nodes are not a panacea: reviewing the papers on 
> https://issues.apache.org/jira/browse/CASSANDRA-192 suggests that they are 
> significantly more difficult to implement than our current solution.
>
> We haven't ruled virtual nodes out, but I think many of us are leaning toward 
> exploring improvements to our current architecture.
>
> Thanks,
> Stu
>
> -----Original Message-----
> From: "Greg Kim" <[email protected]>
> Sent: Tuesday, October 26, 2010 12:21pm
> To: "[email protected]" <[email protected]>
> Subject: Best practice for adding new nodes to ring
>
> Hi,
>
> I have a question regarding the best practices for adding new nodes to an 
> existing cluster.  From reading the following wiki: 
> http://wiki.apache.org/cassandra/Operations  -- I understand that when 
> creating a brand new cluster -- we can use the following to calculate the 
> initial token for each node to achieve balance in the ring:
>  def tokens(nodes):
>     for i in range(1, nodes + 1):
>         print (i * (2 ** 127 - 1) / nodes)
>
>
> My question is on the best practice for adding new nodes to an existing 
> cluster.  There is a recommendation in the wiki which is to basically to 
> compute new tokens for every node and assign them manually using the nodetool 
> command.  We're planning on running either 16GB or 32GB heaps on each of our 
> nodes, so token re-assignment for each node in the cluster sounds like a very 
> expensive operation especially in situations where we're adding new nodes to 
> handle scaling issues w/ the existing cluster.
>
> I'm bit of a noob to cassandra, so wanted to see how others are currently 
> coping w/ this.  One option can be to grow the cluster in the power of 2 and 
> use bootstraping w/ automatic token generation.  Is this an option that 
> people are using? (but this gets exponentially expensive when you already 
> have a large # of nodes)
>
> Does anyone know why cassandra doesn't use virtual tokens (e.g. one node 
> token - creating 256 virtual node tokens in the ring)?  This way adding new 
> nodes to an existing cluster will significantly mitigate the unbalance issue 
> in the ring.
>
>
> Thanks
> gkim
>
>

One could implement "Virtual nodes" by running multiple instances of
cassandra on a single machine, each binding to a different IP,
possibly each using a different physical disk.

I can imagine this would cause some overhead and waste. However since
current JVM's do not manage large heap sizes well this would be the
way I would imagine running cassandra on a "Big iron/mainframe"
machine with 128GB RAM 4 processors and 48 disks

Reply via email to