I got the same error adding -asService in the command line (metrics already exists), the only diff is that it will retry every 5 mins
2025-03-09 15:05:04,542 INFO balancer.Balancer: Finished one round, will wait for 5.0 minutes for next round That does not seem a good workaround, my cluster have hundreds of TB to rebalance when adding a data node, and I don't remember having such issues when I was using hadoop 2.9.1. Is there any issue with balancer on recent hadoop versions? Thanks, Sébastien Le dim. 9 mars 2025 à 16:02, Sébastien Rebecchi <srebec...@kameleoon.com> a écrit : > OK I can try then, hoping it will help. > Btw even if it works, it does not explain this metrics exception. > Any idea how to solve this, I can't find a way to delete that metrics in > any hadoop doc. > > Thanks > > Sébastien. > > Le dim. 9 mars 2025 à 15:39, Zhanghaobo <hfutzhan...@163.com> a écrit : > >> got it, you can use it as a service and see what will happen. >> >> ---- Replied Message ---- >> From Sébastien Rebecchi<srebec...@kameleoon.com> >> <srebec...@kameleoon.com> >> Date 03/09/2025 22:22 >> To Zhanghaobo<hfutzhan...@163.com> <hfutzhan...@163.com> >> Cc user@hadoop.apache.org、hdfs-...@hadoop.apache.org >> Subject Re: Can not run HDFS balancer cause metrics already exists >> Hi Zhanghaobo, >> >> Thanks for the message. >> >> No I don't use as service, as I said the command line is the following: hdfs >> balancer -Ddfs.balancer.movedWinWidth=5400000 >> -Ddfs.balancer.moverThreads=1000 -Ddfs.balancer.dispatcherThreads=200 >> -Ddfs.datanode.balance.max.concurrent.moves=50 >> -Ddfs.datanode.balance.bandwidthPerSec=100m >> -Ddfs.balancer.max-size-to-move=10737418240 -threshold 1 >> >> Also no other balancer is running concurrently on any other node. >> >> Sébastien >> >> Le dim. 9 mars 2025 à 13:57, Zhanghaobo <hfutzhan...@163.com> a écrit : >> >>> >>> Hi, @Sébastien Rebecchi >>> Don't know more details about how you start balancer, did you use >>> -asService? >>> >>> >>> ---- Replied Message ---- >>> From Sébastien Rebecchi<srebec...@kameleoon.com.INVALID> >>> <srebec...@kameleoon.com.INVALID> >>> Date 3/9/2025 18:03 >>> To <user@hadoop.apache.org>, >>> <user@hadoop.apache.org><hdfs-...@hadoop.apache.org> >>> <hdfs-...@hadoop.apache.org> >>> Subject Re: Can not run HDFS balancer cause metrics already exists >>> Hello >>> >>> Could anyone help on this please? >>> Situation is still the same after several days. >>> I add some precisions >>> - hadoop version 3.4.1 >>> - balancer command line run: hdfs balancer >>> -Ddfs.balancer.movedWinWidth=5400000 -Ddfs.balancer.moverThreads=1000 >>> -Ddfs.balancer.dispatcherThreads=200 >>> -Ddfs.datanode.balance.max.concurrent.moves=50 >>> -Ddfs.datanode.balance.bandwidthPerSec=100m >>> -Ddfs.balancer.max-size-to-move=10737418240 -threshold 1 >>> >>> Thank you >>> >>> >>> Le mar. 4 mars 2025, 16:59, Sébastien Rebecchi <srebec...@kameleoon.com> >>> a écrit : >>> >>>> Hello >>>> >>>> After having added a new node on my HDFS cluster, I try running >>>> balancer, but it always fails with the following error, even after retrying >>>> multiple times during the day, and even after having restarted name node >>>> What should I do to unlock? >>>> >>>> Thanks, >>>> >>>> Sébastien >>>> >>>> >>>> ERROR balancer.Balancer: Exiting balancer due an exception >>>> org.apache.hadoop.metrics2.MetricsException: Metrics source >>>> Balancer-{HERE REPLACE BY CLUSTER'S BLOCK POOL ID} already exists! >>>> at >>>> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152) >>>> at >>>> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125) >>>> at >>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) >>>> at >>>> org.apache.hadoop.hdfs.server.balancer.BalancerMetrics.create(BalancerMetrics.java:52) >>>> at >>>> org.apache.hadoop.hdfs.server.balancer.Balancer.<init>(Balancer.java:362) >>>> at >>>> org.apache.hadoop.hdfs.server.balancer.Balancer.doBalance(Balancer.java:824) >>>> at >>>> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:868) >>>> at >>>> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:975) >>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82) >>>> at >>>> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1133) >>>> >>>