Re: disk used percentage is not symmetric on datanodes (balancer)

Harsh J Mon, 18 Mar 2013 21:51:19 -0700

What do you mean that the balancer is always active? It is to be used
as a tool and it exits once it balances in a specific run (loops until
it does, but always exits at end). The balancer does balance based on
usage percentage so that is what you're probably looking for/missing.


On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <[email protected]> wrote:
> Hi,
>
> On Mar 18, 2013, at 8:21 PM, 李洪忠 <[email protected]> wrote:
>
> Maybe you need to modify the rackware script to make the rack balance, ie,
> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
> large nodes.
> P.S.
> you need to reboot the cluster for rackware script modify.
>
>
> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
> awareness for the cluster, currently it is considered as just one rack. Can
> that be the problem ? I don't know…
>
> -Tapas
>
>
>
> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>
> And by active, it means that it does actually stops by itself? Else it might
> mean that the throttling/limit might be an issue with regard to the data
> volume or velocity.
>
> What threshold is used?
>
> About the small and big datanodes, how are they distributed with regards to
> racks?
> About files, how is used the replication factor(s) and block size(s)?
>
> Surely trivial questions again.
>
> Bertrand
>
> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <[email protected]>
> wrote:
>>
>> Hi,
>>
>> Sorry about that, had it written, but thought it was obvious.
>> Yes, balancer is active and running on the namenode.
>>
>> -Tapas
>>
>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <[email protected]> wrote:
>>
>> Hi,
>>
>> It is not explicitly said but did you use the balancer?
>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>
>> Regards
>>
>> Bertrand
>>
>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <[email protected]>
>> wrote:
>>>
>>> Hello,
>>>
>>> I am using one of the old legacy version (0.20) of hadoop for our
>>> cluster. We have scheduled for an upgrade to the newer version within a
>>> couple of months, but I would like to understand a couple of things before
>>> moving towards the upgrade plan.
>>>
>>> We have about 200 datanodes and some of them have larger storage than
>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>
>>> We found that the disk-used percentage is not symmetric through all the
>>> datanodes. For larger storage nodes the percentage of disk-space used is
>>> much lower than that of other nodes with smaller storage space. In larger
>>> storage nodes the percentage of used disk space varies, but on average about
>>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
>>> this expected ? If so, then we are not using a lot of the disk space
>>> effectively. Is this solved in a future release ?
>>>
>>> If no, I would like to know  if there are any checks/debugs that one can
>>> do to find an improvement with the current version or upgrading hadoop
>>> should solve this problem.
>>>
>>> I am happy to provide additional information if needed.
>>>
>>> Thanks for any help.
>>>
>>> -Tapas
>>>
>>
>
>
>
> --
> Bertrand Dechoux
>
>
>



-- 
Harsh J

Re: disk used percentage is not symmetric on datanodes (balancer)

Reply via email to