Maybe you need to modify the rackware script to make the rack balance, ie, all the racks are the same size, on rack by 6 small nodes, one rack by 1 large nodes.
P.S.
you need to reboot the cluster for rackware script modify.

于 2013/3/19 7:17, Bertrand Dechoux 写道:
And by active, it means that it does actually stops by itself? Else it might mean that the throttling/limit might be an issue with regard to the data volume or velocity.

What threshold is used?

About the small and big datanodes, how are they distributed with regards to racks?
About files, how is used the replication factor(s) and block size(s)?

Surely trivial questions again.

Bertrand

On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <[email protected] <mailto:[email protected]>> wrote:

    Hi,

    Sorry about that, had it written, but thought it was obvious.
    Yes, balancer is active and running on the namenode.

    -Tapas

    On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <[email protected]
    <mailto:[email protected]>> wrote:

    Hi,

    It is not explicitly said but did you use the balancer?
    http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer

    Regards

    Bertrand

    On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi
    <[email protected] <mailto:[email protected]>> wrote:

        Hello,

        I am using one of the old legacy version (0.20) of hadoop for
        our cluster. We have scheduled for an upgrade to the newer
        version within a couple of months, but I would like to
        understand a couple of things before moving towards the
        upgrade plan.

        We have about 200 datanodes and some of them have larger
        storage than others. The storage for the datanodes varies
        between 12 TB to 72 TB.

        We found that the disk-used percentage is not symmetric
        through all the datanodes. For larger storage nodes the
        percentage of disk-space used is much lower than that of
        other nodes with smaller storage space. In larger storage
        nodes the percentage of used disk space varies, but on
        average about 30-50%. For the smaller storage nodes this
        number is as high as 99.9%. Is this expected ? If so, then we
        are not using a lot of the disk space effectively. Is this
        solved in a future release ?

        If no, I would like to know  if there are any checks/debugs
        that one can do to find an improvement with the current
        version or upgrading hadoop should solve this problem.

        I am happy to provide additional information if needed.

        Thanks for any help.

        -Tapas





--
Bertrand Dechoux

Reply via email to