Hi Akash, You can read about balancer here: https://apache.github.io/hadoop/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer HADOOP-1652(https://issues.apache.org/jira/browse/HADOOP-1652) has some details around it as well, it has some docs attached to it, you can read them... For the code, you can explore something over here: https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java#L473-L479
-Ayush On Sun, 5 Nov 2023 at 22:33, Akash Jain <jain...@icloud.com.invalid> wrote: > > Hello, > > For my project, I am analyzing an algorithm to balance the disk usage across > thousands of storage nodes across different availability zones. > > Let’s say > Availability zone 1 > Disk usage for data of customer 1 is 70% > Disk usage for data of customer 2 is 10% > > Availability zone 2 > Disk usage for data of customer 1 is 30% > Disk usage for data of customer 2 is 90% > > and so forth… > > Clearly in above example customer 1 data has much higher data locality in AZ1 > compared to AZ2. Similarly for customer 2 data it is more data locality in > AZ1 compared to AZ1 > > In an ideal world, the data of the customers would look something like this > > > Availability zone 1 > Disk usage for data of customer 1 is 50% > Disk usage for data of customer 2 is 50% > > Availability zone 2 > Disk usage for data of customer 1 is 50% > Disk usage for data of customer 2 is 50% > > > HDFS Balancer looks related, however I have some questions: > > 1. Why does the algorithm tries to pair an over utilized node with under > utilized instead of every node holding average data? > (https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/data-storage/content/step_2__storage_group_pairing.html) > > 2. Where can I find more algorithmic details of how the pairing happens? > > 3. Is this the only balancing algorithm supported by HDFS? > > Thanks --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org