Re: disk used percentage is not symmetric on datanodes (balancer)

Tapas Sarangi Mon, 18 Mar 2013 18:27:17 -0700

Hi,

On Mar 18, 2013, at 8:21 PM, 李洪忠 <[email protected]> wrote:


> Maybe you need to modify the rackware script to make the rack balance, ie, 
> all the racks are the same size,  on rack by 6 small nodes, one rack by 1 
> large nodes. 
> P.S.
> you need to reboot the cluster for rackware script modify.

Like I mentioned earlier in my reply to Bertrand, we haven't considered rack 
awareness for the cluster, currently it is considered as just one rack. Can 
that be the problem ? I don't know…

-Tapas


>   
> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>> And by active, it means that it does actually stops by itself? Else it might 
>> mean that the throttling/limit might be an issue with regard to the data 
>> volume or velocity.
>> 
>> What threshold is used?
>> 
>> About the small and big datanodes, how are they distributed with regards to 
>> racks?
>> About files, how is used the replication factor(s) and block size(s)?
>> 
>> Surely trivial questions again.
>> 
>> Bertrand
>> 
>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <[email protected]> 
>> wrote:
>> Hi,
>> 
>> Sorry about that, had it written, but thought it was obvious. 
>> Yes, balancer is active and running on the namenode.
>> 
>> -Tapas
>> 
>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <[email protected]> wrote:
>> 
>>> Hi,
>>> 
>>> It is not explicitly said but did you use the balancer?
>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>> 
>>> Regards
>>> 
>>> Bertrand
>>> 
>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <[email protected]> 
>>> wrote:
>>> Hello,
>>> 
>>> I am using one of the old legacy version (0.20) of hadoop for our cluster. 
>>> We have scheduled for an upgrade to the newer version within a couple of 
>>> months, but I would like to understand a couple of things before moving 
>>> towards the upgrade plan.
>>> 
>>> We have about 200 datanodes and some of them have larger storage than 
>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>> 
>>> We found that the disk-used percentage is not symmetric through all the 
>>> datanodes. For larger storage nodes the percentage of disk-space used is 
>>> much lower than that of other nodes with smaller storage space. In larger 
>>> storage nodes the percentage of used disk space varies, but on average 
>>> about 30-50%. For the smaller storage nodes this number is as high as 
>>> 99.9%. Is this expected ? If so, then we are not using a lot of the disk 
>>> space effectively. Is this solved in a future release ?
>>> 
>>> If no, I would like to know  if there are any checks/debugs that one can do 
>>> to find an improvement with the current version or upgrading hadoop should 
>>> solve this problem.
>>> 
>>> I am happy to provide additional information if needed.
>>> 
>>> Thanks for any help.
>>> 
>>> -Tapas
>>> 
>> 
>> 
>> 
>> 
>> -- 
>> Bertrand Dechoux
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Reply via email to