Re: disk used percentage is not symmetric on datanodes (balancer)

Tapas Sarangi Sun, 24 Mar 2013 18:25:41 -0700

Thanks. Does this need a restart of hadoop in the nodes where this modification 
is made ?


-----

On Mar 24, 2013, at 8:06 PM, Jamal B <[email protected]> wrote:

> dfs.datanode.du.reserved
> 
> You could tweak that param on the smaller nodes to "force" the flow of blocks 
> to other nodes.   A short term hack at best, but should help the situation a 
> bit.
> 
> On Mar 24, 2013 7:09 PM, "Tapas Sarangi" <[email protected]> wrote:
> 
> On Mar 24, 2013, at 4:34 PM, Jamal B <[email protected]> wrote:
> 
>> It shouldn't cause further problems since most of your small nodes are 
>> already their capacity.  You could set or increase the dfs reserved property 
>> on your smaller nodes to force the flow of blocks onto the larger nodes.
>> 
>> 
> 
> Thanks.  Can you please specify which are the dfs properties that we can set 
> or modify to force the flow of blocks directed towards the larger nodes than 
> the smaller nodes ?
> 
> -----
> 
> 
> 
>> 
> 
> 
>> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <[email protected]> wrote:
>> Hi,
>> 
>> Thanks for the idea, I will give this a try and report back. 
>> 
>> My worry is if we decommission a small node (one at a time), will it move 
>> the data to larger nodes or choke another smaller nodes ? In principle it 
>> should distribute the blocks, the point is it is not distributing the way we 
>> expect it to, so do you think this may cause further problems ?
>> 
>> ---------
>> 
>> On Mar 24, 2013, at 3:37 PM, Jamal B <[email protected]> wrote:
>> 
>>> Then I think the only way around this would be to decommission  1 at a 
>>> time, the smaller nodes, and ensure that the blocks are moved to the larger 
>>> nodes.  
>>> And once complete, bring back in the smaller nodes, but maybe only after 
>>> you tweak the rack topology to match your disk layout more than network 
>>> layout to compensate for the unbalanced nodes.  
>>> 
>>> Just my 2 cents
>>> 
>>> 
>>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <[email protected]> 
>>> wrote:
>>> Thanks. We have a 1-1 configuration of drives and folder in all the 
>>> datanodes.
>>> 
>>> -Tapas
>>> 
>>> On Mar 24, 2013, at 3:29 PM, Jamal B <[email protected]> wrote:
>>> 
>>>> On both types of nodes, what is your dfs.data.dir set to? Does it specify 
>>>> multiple folders on the same set's of drives or is it 1-1 between folder 
>>>> and drive?  If it's set to multiple folders on the same drives, it is 
>>>> probably multiplying the amount of "available capacity" incorrectly in 
>>>> that it assumes a 1-1 relationship between folder and total capacity of 
>>>> the drive.
>>>> 
>>>> 
>>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <[email protected]> 
>>>> wrote:
>>>> Yes, thanks for pointing, but I already know that it is completing the 
>>>> balancing when exiting otherwise it shouldn't exit. 
>>>> Your answer doesn't solve the problem I mentioned earlier in my message. 
>>>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up 
>>>> from the cluster even though "df" shows the cluster has about 500 TB of 
>>>> free space. 
>>>> 
>>>> -------
>>>>  
>>>> 
>>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
>>>> <[email protected]> wrote:
>>>> 
>>>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>>> 
>>>>> So the value is bytes per second. If it is running and exiting,it means 
>>>>> it has completed the balancing. 
>>>>> 
>>>>> 
>>>>> On 24 March 2013 11:32, Tapas Sarangi <[email protected]> wrote:
>>>>> Yes, we are running balancer, though a balancer process runs for almost a 
>>>>> day or more before exiting and starting over.
>>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume 
>>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If 
>>>>> it is in Bits then we have a problem.
>>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>> 
>>>>> -----
>>>>> 
>>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
>>>>> <[email protected]> wrote:
>>>>> 
>>>>>> Are you running balancer? If balancer is running and if it is slow, try 
>>>>>> increasing the balancer bandwidth
>>>>>> 
>>>>>> 
>>>>>> On 24 March 2013 09:21, Tapas Sarangi <[email protected]> wrote:
>>>>>> Thanks for the follow up. I don't know whether attachment will pass 
>>>>>> through this mailing list, but I am attaching a pdf that contains the 
>>>>>> usage of all live nodes.
>>>>>> 
>>>>>> All nodes starting with letter "g" are the ones with smaller storage 
>>>>>> space where as nodes starting with letter "s" have larger storage space. 
>>>>>> As you will see, most of the "gXX" nodes are completely full whereas 
>>>>>> "sXX" nodes have a lot of unused space. 
>>>>>> 
>>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode 
>>>>>> where it is not able to write any further even though the total space 
>>>>>> available in the cluster is about 500 TB. We believe this has something 
>>>>>> to do with the way it is balancing the nodes, but don't understand the 
>>>>>> problem yet. May be the attached PDF will help some of you (experts) to 
>>>>>> see what is going wrong here...
>>>>>> 
>>>>>> Thanks
>>>>>> ------
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Balancer know about topology,but when calculate balancing it operates 
>>>>>>> only with nodes not with racks.
>>>>>>> You can see how it work in Balancer.java in  BalancerDatanode about 
>>>>>>> string 509.
>>>>>>> 
>>>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>>> 
>>>>>>> For example:
>>>>>>> cluster_capacity=3.5Pb
>>>>>>> cluster_dfsused=2Pb
>>>>>>> 
>>>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster 
>>>>>>> capacity
>>>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) 
>>>>>>> .Balancer think that all good if  avgutil 
>>>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>>> 
>>>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its 
>>>>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>>>> 
>>>>>>> Balancer cant help you.
>>>>>>> 
>>>>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE 
>>>>>>> if you can.
>>>>>>> 
>>>>>>>  
>>>>>>> 
>>>>>>> 
>>>>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb 
>>>>>>>> you will be able to have only 12Tb replication data.
>>>>>>> 
>>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 
>>>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>>> 
>>>>>>>> 
>>>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must 
>>>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>>>> For example:
>>>>>>>> 
>>>>>>>> rack1: 1 node with 72Tb
>>>>>>>> rack2: 6 nodes with 12Tb
>>>>>>>> rack3: 3 nodes with 24Tb
>>>>>>>> 
>>>>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>>>> 
>>>>>>> 
>>>>>>> The same question I asked earlier in this message, does multiple racks 
>>>>>>> with default threshold for the balancer minimizes the difference 
>>>>>>> between racks ?
>>>>>>> 
>>>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better 
>>>>>>>> choise.  
>>>>>>> 
>>>>>>> It wasn't my decision, and I probably can't change it now. I am new to 
>>>>>>> this cluster and trying to understand few issues. I will explore other 
>>>>>>> options as you mentioned.
>>>>>>> 
>>>>>>> -- 
>>>>>>> http://balajin.net/blog
>>>>>>> http://flic.kr/balajijegan
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>>> 
>>>> 
>>> 
>>> 
>> 
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Reply via email to