Re: disk used percentage is not symmetric on datanodes (balancer)

Jamal B Sun, 24 Mar 2013 18:06:47 -0700

dfs.datanode.du.reserved

You could tweak that param on the smaller nodes to "force" the flow of
blocks to other nodes.   A short term hack at best, but should help the
situation a bit.
On Mar 24, 2013 7:09 PM, "Tapas Sarangi" <[email protected]> wrote:


>
> On Mar 24, 2013, at 4:34 PM, Jamal B <[email protected]> wrote:
>
> It shouldn't cause further problems since most of your small nodes are
> already their capacity.  You could set or increase the dfs reserved
> property on your smaller nodes to force the flow of blocks onto the larger
> nodes.
>
>
> Thanks.  Can you please specify which are the dfs properties that we can
> set or modify to force the flow of blocks directed towards the larger nodes
> than the smaller nodes ?
>
> -----
>
>
>
>
>
>
> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <[email protected]> wrote:
>
>> Hi,
>>
>> Thanks for the idea, I will give this a try and report back.
>>
>> My worry is if we decommission a small node (one at a time), will it move
>> the data to larger nodes or choke another smaller nodes ? In principle it
>> should distribute the blocks, the point is it is not distributing the way
>> we expect it to, so do you think this may cause further problems ?
>>
>> ---------
>>
>> On Mar 24, 2013, at 3:37 PM, Jamal B <[email protected]> wrote:
>>
>> Then I think the only way around this would be to decommission  1 at a
>> time, the smaller nodes, and ensure that the blocks are moved to the larger
>> nodes.
>>
>> And once complete, bring back in the smaller nodes, but maybe only after
>> you tweak the rack topology to match your disk layout more than network
>> layout to compensate for the unbalanced nodes.
>>
>>
>> Just my 2 cents
>>
>>
>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi 
>> <[email protected]>wrote:
>>
>>> Thanks. We have a 1-1 configuration of drives and folder in all the
>>> datanodes.
>>>
>>> -Tapas
>>>
>>> On Mar 24, 2013, at 3:29 PM, Jamal B <[email protected]> wrote:
>>>
>>> On both types of nodes, what is your dfs.data.dir set to? Does it
>>> specify multiple folders on the same set's of drives or is it 1-1 between
>>> folder and drive?  If it's set to multiple folders on the same drives, it
>>> is probably multiplying the amount of "available capacity" incorrectly in
>>> that it assumes a 1-1 relationship between folder and total capacity of the
>>> drive.
>>>
>>>
>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi 
>>> <[email protected]>wrote:
>>>
>>>> Yes, thanks for pointing, but I already know that it is completing the
>>>> balancing when exiting otherwise it shouldn't exit.
>>>> Your answer doesn't solve the problem I mentioned earlier in my
>>>> message. 'hdfs' is stalling and hadoop is not writing unless space is
>>>> cleared up from the cluster even though "df" shows the cluster has about
>>>> 500 TB of free space.
>>>>
>>>> -------
>>>>
>>>>
>>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>> [email protected]> wrote:
>>>>
>>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>>
>>>> So the value is bytes per second. If it is running and exiting,it means
>>>> it has completed the balancing.
>>>>
>>>>
>>>> On 24 March 2013 11:32, Tapas Sarangi <[email protected]> wrote:
>>>>
>>>>> Yes, we are running balancer, though a balancer process runs for
>>>>> almost a day or more before exiting and starting over.
>>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>>> is in Bits then we have a problem.
>>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>>
>>>>> -----
>>>>>
>>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>>> [email protected]> wrote:
>>>>>
>>>>> Are you running balancer? If balancer is running and if it is slow,
>>>>> try increasing the balancer bandwidth
>>>>>
>>>>>
>>>>> On 24 March 2013 09:21, Tapas Sarangi <[email protected]> wrote:
>>>>>
>>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>>> through this mailing list, but I am attaching a pdf that contains the 
>>>>>> usage
>>>>>> of all live nodes.
>>>>>>
>>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>>> space where as nodes starting with letter "s" have larger storage space. 
>>>>>> As
>>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>>> nodes have a lot of unused space.
>>>>>>
>>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>>> where it is not able to write any further even though the total space
>>>>>> available in the cluster is about 500 TB. We believe this has something 
>>>>>> to
>>>>>> do with the way it is balancing the nodes, but don't understand the 
>>>>>> problem
>>>>>> yet. May be the attached PDF will help some of you (experts) to see what 
>>>>>> is
>>>>>> going wrong here...
>>>>>>
>>>>>> Thanks
>>>>>> ------
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Balancer know about topology,but when calculate balancing it operates
>>>>>> only with nodes not with racks.
>>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>>> string 509.
>>>>>>
>>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>>
>>>>>> For example:
>>>>>> cluster_capacity=3.5Pb
>>>>>> cluster_dfsused=2Pb
>>>>>>
>>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>>> capacity
>>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>>> .Balancer think that all good if  avgutil
>>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>>
>>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node
>>>>>> its only 6.5Tb and for 72Tb its about 40Tb.
>>>>>>
>>>>>> Balancer cant help you.
>>>>>>
>>>>>> Show me
>>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>>> you can.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>>
>>>>>>>
>>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB
>>>>>>> and 72 TB, but not true for more than two nodes in the cluster.
>>>>>>>
>>>>>>>
>>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>>> For example:
>>>>>>>
>>>>>>> rack1: 1 node with 72Tb
>>>>>>> rack2: 6 nodes with 12Tb
>>>>>>> rack3: 3 nodes with 24Tb
>>>>>>>
>>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>>> rack.
>>>>>>>
>>>>>>>
>>>>>>> The same question I asked earlier in this message, does multiple
>>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>>> between racks ?
>>>>>>>
>>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>>> choise.
>>>>>>>
>>>>>>>
>>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>>> to this cluster and trying to understand few issues. I will explore 
>>>>>>> other
>>>>>>> options as you mentioned.
>>>>>>>
>>>>>>> --
>>>>>>> http://balajin.net/blog
>>>>>>> http://flic.kr/balajijegan
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Reply via email to