Hi Manoj, You need to use balancer to re-balance data between nodes. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer
> *dfs.datanode.fsdataset.volume.choosing.policy* have options 'Round > Robin' or 'Available Space', are there any other configurations which > need to be reviewed. The option is for the disks in a node. Regards, Akira On 2/6/15 11:34, Manoj Venkatesh wrote:
Dear Hadoop experts, I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation and 2 additional nodes were added later to increase disk and CPU capacity. What i see is that processing is shared amongst all the nodes whereas the storage is reaching capacity on the original 6 nodes whereas the newly added machines have relatively large amount of storage still unoccupied. I was wondering if there is an automated or any way of redistributing data so that all the nodes are equally utilized. I have checked for the configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy* have options 'Round Robin' or 'Available Space', are there any other configurations which need to be reviewed. Thanks, Manoj
