Rebalancing after adding a new node

James Heather Thu, 03 Sep 2015 01:32:50 -0700

Suppose I create a table with a billion rows, on a cluster with N nodes.
Then I want to increase performance, so I add a new node to the cluster.
Obviously the data is still stored on the first N nodes, and not on the new
one. Is there a way of redistributing the data (online) to take advantage
of the new node?


I realise the answer might depend on the configuration of the table. If
there are schemas that fit this notion well, and schemas that don't, I'd be
interested to know about that too.

(This will be running on CDH5, if that makes a difference.)

James

Rebalancing after adding a new node

Reply via email to