Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Akira AJISAKA Sun, 08 Feb 2015 01:42:07 -0800

Hi Manoj,

You need to use balancer to re-balance data between nodes.
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer


> *dfs.datanode.fsdataset.volume.choosing.policy* have options 'Round
> Robin' or 'Available Space', are there any other configurations which
> need to be reviewed.
The option is for the disks in a node.

Regards,
Akira

On 2/6/15 11:34, Manoj Venkatesh wrote:

Dear Hadoop experts,

I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
and 2 additional nodes were added later to increase disk and CPU
capacity. What i see is that processing is shared amongst all the nodes
whereas the storage is reaching capacity on the original 6 nodes whereas
the newly added machines have relatively large amount of storage still
unoccupied.

I was wondering if there is an automated or any way of redistributing
data so that all the nodes are equally utilized. I have checked for the
configuration parameter -
*dfs.datanode.fsdataset.volume.choosing.policy* have options 'Round
Robin' or 'Available Space', are there any other configurations which
need to be reviewed.

Thanks,
Manoj

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Reply via email to