I had a similar question recently. Please check out balancer http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer this will balance the data across the nodes.
- Manoj From: Chen Song <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Wednesday, February 11, 2015 at 7:44 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: hadoop cluster with non-uniform disk spec We have a hadoop cluster consisting of 500 nodes. But the nodes are not uniform in term of disk spaces. Half of the racks are newer with 11 volumes of 1.1T on each node, while the other half have 5 volume of 900GB on each node. dfs.datanode.fsdataset.volume.choosing.policy is set to org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy. It winds up with the state of half of nodes are full while the other half underutilized. I am wondering if there is a known solution for this problem. Thank you for any suggestions. -- Chen Song The information transmitted in this email is intended only for the person or entity to which it is addressed, and may contain material confidential to Xoom Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. If you received this email in error, please contact the sender and delete the material from your files.
