Hi, It is not explicitly said but did you use the balancer? http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
Regards Bertrand On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <[email protected]>wrote: > Hello, > > I am using one of the old legacy version (0.20) of hadoop for our cluster. > We have scheduled for an upgrade to the newer version within a couple of > months, but I would like to understand a couple of things before moving > towards the upgrade plan. > > We have about 200 datanodes and some of them have larger storage than > others. The storage for the datanodes varies between 12 TB to 72 TB. > > We found that the disk-used percentage is not symmetric through all the > datanodes. For larger storage nodes the percentage of disk-space used is > much lower than that of other nodes with smaller storage space. In larger > storage nodes the percentage of used disk space varies, but on average > about 30-50%. For the smaller storage nodes this number is as high as > 99.9%. Is this expected ? If so, then we are not using a lot of the disk > space effectively. Is this solved in a future release ? > > If no, I would like to know if there are any checks/debugs that one can > do to find an improvement with the current version or upgrading hadoop > should solve this problem. > > I am happy to provide additional information if needed. > > Thanks for any help. > > -Tapas > >
