dfs.datanode.du.reserved You could tweak that param on the smaller nodes to "force" the flow of blocks to other nodes. A short term hack at best, but should help the situation a bit. On Mar 24, 2013 7:09 PM, "Tapas Sarangi" <[email protected]> wrote:
> > On Mar 24, 2013, at 4:34 PM, Jamal B <[email protected]> wrote: > > It shouldn't cause further problems since most of your small nodes are > already their capacity. You could set or increase the dfs reserved > property on your smaller nodes to force the flow of blocks onto the larger > nodes. > > > Thanks. Can you please specify which are the dfs properties that we can > set or modify to force the flow of blocks directed towards the larger nodes > than the smaller nodes ? > > ----- > > > > > > > On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <[email protected]> wrote: > >> Hi, >> >> Thanks for the idea, I will give this a try and report back. >> >> My worry is if we decommission a small node (one at a time), will it move >> the data to larger nodes or choke another smaller nodes ? In principle it >> should distribute the blocks, the point is it is not distributing the way >> we expect it to, so do you think this may cause further problems ? >> >> --------- >> >> On Mar 24, 2013, at 3:37 PM, Jamal B <[email protected]> wrote: >> >> Then I think the only way around this would be to decommission 1 at a >> time, the smaller nodes, and ensure that the blocks are moved to the larger >> nodes. >> >> And once complete, bring back in the smaller nodes, but maybe only after >> you tweak the rack topology to match your disk layout more than network >> layout to compensate for the unbalanced nodes. >> >> >> Just my 2 cents >> >> >> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi >> <[email protected]>wrote: >> >>> Thanks. We have a 1-1 configuration of drives and folder in all the >>> datanodes. >>> >>> -Tapas >>> >>> On Mar 24, 2013, at 3:29 PM, Jamal B <[email protected]> wrote: >>> >>> On both types of nodes, what is your dfs.data.dir set to? Does it >>> specify multiple folders on the same set's of drives or is it 1-1 between >>> folder and drive? If it's set to multiple folders on the same drives, it >>> is probably multiplying the amount of "available capacity" incorrectly in >>> that it assumes a 1-1 relationship between folder and total capacity of the >>> drive. >>> >>> >>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi >>> <[email protected]>wrote: >>> >>>> Yes, thanks for pointing, but I already know that it is completing the >>>> balancing when exiting otherwise it shouldn't exit. >>>> Your answer doesn't solve the problem I mentioned earlier in my >>>> message. 'hdfs' is stalling and hadoop is not writing unless space is >>>> cleared up from the cluster even though "df" shows the cluster has about >>>> 500 TB of free space. >>>> >>>> ------- >>>> >>>> >>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) < >>>> [email protected]> wrote: >>>> >>>> -setBalancerBandwidth <bandwidth in bytes per second> >>>> >>>> So the value is bytes per second. If it is running and exiting,it means >>>> it has completed the balancing. >>>> >>>> >>>> On 24 March 2013 11:32, Tapas Sarangi <[email protected]> wrote: >>>> >>>>> Yes, we are running balancer, though a balancer process runs for >>>>> almost a day or more before exiting and starting over. >>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume >>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it >>>>> is in Bits then we have a problem. >>>>> What's the unit for "dfs.balance.bandwidthPerSec" ? >>>>> >>>>> ----- >>>>> >>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) < >>>>> [email protected]> wrote: >>>>> >>>>> Are you running balancer? If balancer is running and if it is slow, >>>>> try increasing the balancer bandwidth >>>>> >>>>> >>>>> On 24 March 2013 09:21, Tapas Sarangi <[email protected]> wrote: >>>>> >>>>>> Thanks for the follow up. I don't know whether attachment will pass >>>>>> through this mailing list, but I am attaching a pdf that contains the >>>>>> usage >>>>>> of all live nodes. >>>>>> >>>>>> All nodes starting with letter "g" are the ones with smaller storage >>>>>> space where as nodes starting with letter "s" have larger storage space. >>>>>> As >>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX" >>>>>> nodes have a lot of unused space. >>>>>> >>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode >>>>>> where it is not able to write any further even though the total space >>>>>> available in the cluster is about 500 TB. We believe this has something >>>>>> to >>>>>> do with the way it is balancing the nodes, but don't understand the >>>>>> problem >>>>>> yet. May be the attached PDF will help some of you (experts) to see what >>>>>> is >>>>>> going wrong here... >>>>>> >>>>>> Thanks >>>>>> ------ >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Balancer know about topology,but when calculate balancing it operates >>>>>> only with nodes not with racks. >>>>>> You can see how it work in Balancer.java in BalancerDatanode about >>>>>> string 509. >>>>>> >>>>>> I was wrong about 350Tb,35Tb it calculates in such way : >>>>>> >>>>>> For example: >>>>>> cluster_capacity=3.5Pb >>>>>> cluster_dfsused=2Pb >>>>>> >>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster >>>>>> capacity >>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) >>>>>> .Balancer think that all good if avgutil >>>>>> +10>node_utilizazation>=avgutil-10. >>>>>> >>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node >>>>>> its only 6.5Tb and for 72Tb its about 40Tb. >>>>>> >>>>>> Balancer cant help you. >>>>>> >>>>>> Show me >>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if >>>>>> you can. >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and >>>>>>> 72Tb you will be able to have only 12Tb replication data. >>>>>>> >>>>>>> >>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB >>>>>>> and 72 TB, but not true for more than two nodes in the cluster. >>>>>>> >>>>>>> >>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must >>>>>>> be with identical capacity.Racks must be identical capacity. >>>>>>> For example: >>>>>>> >>>>>>> rack1: 1 node with 72Tb >>>>>>> rack2: 6 nodes with 12Tb >>>>>>> rack3: 3 nodes with 24Tb >>>>>>> >>>>>>> It helps with balancing,because dublicated block must be another >>>>>>> rack. >>>>>>> >>>>>>> >>>>>>> The same question I asked earlier in this message, does multiple >>>>>>> racks with default threshold for the balancer minimizes the difference >>>>>>> between racks ? >>>>>>> >>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better >>>>>>> choise. >>>>>>> >>>>>>> >>>>>>> It wasn't my decision, and I probably can't change it now. I am new >>>>>>> to this cluster and trying to understand few issues. I will explore >>>>>>> other >>>>>>> options as you mentioned. >>>>>>> >>>>>>> -- >>>>>>> http://balajin.net/blog >>>>>>> http://flic.kr/balajijegan >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> http://balajin.net/blog >>>> http://flic.kr/balajijegan >>>> >>>> >>>> >>> >>> >> >> >
