Hi Manoj,

Existing data is not automatically redistributed when you add new DataNodes. 
Take a look at the 'hdfs balancer' command which can be run as a separate 
administrative tool to rebalance data distribution across DataNodes.


From: Manoj Venkatesh <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Friday, February 6, 2015 at 11:34 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Adding datanodes to Hadoop cluster - Will data redistribute?

Dear Hadoop experts,

I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation and 2 
additional nodes were added later to increase disk and CPU capacity. What i see 
is that processing is shared amongst all the nodes whereas the storage is 
reaching capacity on the original 6 nodes whereas the newly added machines have 
relatively large amount of storage still unoccupied.

I was wondering if there is an automated or any way of redistributing data so 
that all the nodes are equally utilized. I have checked for the configuration 
parameter - dfs.datanode.fsdataset.volume.choosing.policy have options 'Round 
Robin' or 'Available Space', are there any other configurations which need to 
be reviewed.

Thanks,
Manoj

Reply via email to