What do you mean that the balancer is always active? It is to be used as a tool and it exits once it balances in a specific run (loops until it does, but always exits at end). The balancer does balance based on usage percentage so that is what you're probably looking for/missing.
On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <[email protected]> wrote: > Hi, > > On Mar 18, 2013, at 8:21 PM, 李洪忠 <[email protected]> wrote: > > Maybe you need to modify the rackware script to make the rack balance, ie, > all the racks are the same size, on rack by 6 small nodes, one rack by 1 > large nodes. > P.S. > you need to reboot the cluster for rackware script modify. > > > Like I mentioned earlier in my reply to Bertrand, we haven't considered rack > awareness for the cluster, currently it is considered as just one rack. Can > that be the problem ? I don't know… > > -Tapas > > > > 于 2013/3/19 7:17, Bertrand Dechoux 写道: > > And by active, it means that it does actually stops by itself? Else it might > mean that the throttling/limit might be an issue with regard to the data > volume or velocity. > > What threshold is used? > > About the small and big datanodes, how are they distributed with regards to > racks? > About files, how is used the replication factor(s) and block size(s)? > > Surely trivial questions again. > > Bertrand > > On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <[email protected]> > wrote: >> >> Hi, >> >> Sorry about that, had it written, but thought it was obvious. >> Yes, balancer is active and running on the namenode. >> >> -Tapas >> >> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <[email protected]> wrote: >> >> Hi, >> >> It is not explicitly said but did you use the balancer? >> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer >> >> Regards >> >> Bertrand >> >> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <[email protected]> >> wrote: >>> >>> Hello, >>> >>> I am using one of the old legacy version (0.20) of hadoop for our >>> cluster. We have scheduled for an upgrade to the newer version within a >>> couple of months, but I would like to understand a couple of things before >>> moving towards the upgrade plan. >>> >>> We have about 200 datanodes and some of them have larger storage than >>> others. The storage for the datanodes varies between 12 TB to 72 TB. >>> >>> We found that the disk-used percentage is not symmetric through all the >>> datanodes. For larger storage nodes the percentage of disk-space used is >>> much lower than that of other nodes with smaller storage space. In larger >>> storage nodes the percentage of used disk space varies, but on average about >>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is >>> this expected ? If so, then we are not using a lot of the disk space >>> effectively. Is this solved in a future release ? >>> >>> If no, I would like to know if there are any checks/debugs that one can >>> do to find an improvement with the current version or upgrading hadoop >>> should solve this problem. >>> >>> I am happy to provide additional information if needed. >>> >>> Thanks for any help. >>> >>> -Tapas >>> >> > > > > -- > Bertrand Dechoux > > > -- Harsh J
