There is no such file /system/balancer.id, the /system directory is empty
when balancer is not running
When i run balancer i see this file is created, but then when balancer
exists this file is deleted properly and /system gets empty as usual
Each time balancer moves a very small part of data (around 10G, see below
an example) and then exists after 2 minutes with the same MetricsException

2025-03-10 10:25:52,266 INFO balancer.Dispatcher: Total bytes (blocks)
moved in this iteration 10.00 GB (169)
Mar 10, 2025, 10:25:52 AM          0             10.00 GB            25.69
TB              10 GB                169  hdfs://{HIDDEN HOSTNAME
HERE}:54310
2025-03-10 10:26:01,270 INFO balancer.Balancer:
dfs.namenode.get-blocks.max-qps = 20 (default=20)
2025-03-10 10:26:01,270 INFO balancer.Balancer: dfs.balancer.movedWinWidth
= 5400000 (default=5400000)
2025-03-10 10:26:01,270 INFO balancer.Balancer: dfs.balancer.moverThreads =
1000 (default=1000)
2025-03-10 10:26:01,271 INFO balancer.Balancer:
dfs.balancer.dispatcherThreads = 200 (default=200)
2025-03-10 10:26:01,271 INFO balancer.Balancer: dfs.balancer.getBlocks.size
= 2147483648 (default=2147483648)
2025-03-10 10:26:01,271 INFO balancer.Balancer:
dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
2025-03-10 10:26:01,271 INFO balancer.Balancer:
dfs.datanode.balance.max.concurrent.moves = 50 (default=100)
2025-03-10 10:26:01,271 INFO balancer.Balancer:
dfs.datanode.balance.bandwidthPerSec = 104857600 (default=104857600)
2025-03-10 10:26:01,271 INFO balancer.Balancer:
dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
2025-03-10 10:26:01,271 INFO balancer.Balancer: dfs.blocksize = 134217728
(default=134217728)
Mar 10, 2025, 10:26:01 AM Balancing took 1.8892166666666668 minutes
2025-03-10 10:26:01,301 ERROR balancer.Balancer: Exiting balancer due an
exception
org.apache.hadoop.metrics2.MetricsException: Metrics source
Balancer-BP-716662839-{HIDDEN IP HERE}-1737639021855 already exists!
        at
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
        at
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
        at
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
        at
org.apache.hadoop.hdfs.server.balancer.BalancerMetrics.create(BalancerMetrics.java:52)
        at
org.apache.hadoop.hdfs.server.balancer.Balancer.<init>(Balancer.java:362)
        at
org.apache.hadoop.hdfs.server.balancer.Balancer.doBalance(Balancer.java:824)
        at
org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:868)
        at
org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:975)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
        at
org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1133)

Le lun. 10 mars 2025 à 00:32, Zhanghaobo <hfutzhan...@163.com> a écrit :

> try to delete /system/balancer.id  and search some error or warn logs in
> namenode.
>
> ---- Replied Message ----
> From Sébastien Rebecchi<srebec...@kameleoon.com.INVALID>
> <srebec...@kameleoon.com.INVALID>
> Date 3/9/2025 23:08
> To Zhanghaobo<hfutzhan...@163.com> <hfutzhan...@163.com>
> Cc hadoop-user-maillist<user@hadoop.apache.org>,
> <user@hadoop.apache.org>hdfs-dev<hdfs-...@hadoop.apache.org>
> <hdfs-...@hadoop.apache.org>
> Subject Re: Can not run HDFS balancer cause metrics already exists
> I got the same error adding -asService in the command line (metrics
> already exists), the only diff is that it will retry every 5 mins
>
> 2025-03-09 15:05:04,542 INFO balancer.Balancer: Finished one round, will
> wait for 5.0 minutes for next round
>
> That does not seem a good workaround, my cluster have hundreds of TB to
> rebalance when adding a data node, and I don't remember having such issues
> when I was using hadoop 2.9.1.
> Is there any issue with balancer on recent hadoop versions?
>
> Thanks,
> Sébastien
>
> Le dim. 9 mars 2025 à 16:02, Sébastien Rebecchi <srebec...@kameleoon.com>
> a écrit :
>
>> OK I can try then, hoping it will help.
>> Btw even if it works, it does not explain this metrics exception.
>> Any idea how to solve this, I can't find a way to delete that metrics in
>> any hadoop doc.
>>
>> Thanks
>>
>> Sébastien.
>>
>> Le dim. 9 mars 2025 à 15:39, Zhanghaobo <hfutzhan...@163.com> a écrit :
>>
>>> got it, you can use it as a service and see what will happen.
>>>
>>> ---- Replied Message ----
>>> From Sébastien Rebecchi<srebec...@kameleoon.com>
>>> <srebec...@kameleoon.com>
>>> Date 03/09/2025 22:22
>>> To Zhanghaobo<hfutzhan...@163.com> <hfutzhan...@163.com>
>>> Cc user@hadoop.apache.org、hdfs-...@hadoop.apache.org
>>> Subject Re: Can not run HDFS balancer cause metrics already exists
>>> Hi Zhanghaobo,
>>>
>>> Thanks for the message.
>>>
>>> No I don't use as service, as I said the command line is the following: hdfs
>>> balancer -Ddfs.balancer.movedWinWidth=5400000
>>> -Ddfs.balancer.moverThreads=1000 -Ddfs.balancer.dispatcherThreads=200
>>> -Ddfs.datanode.balance.max.concurrent.moves=50
>>> -Ddfs.datanode.balance.bandwidthPerSec=100m
>>> -Ddfs.balancer.max-size-to-move=10737418240 -threshold 1
>>>
>>> Also no other balancer is running concurrently on any other node.
>>>
>>> Sébastien
>>>
>>> Le dim. 9 mars 2025 à 13:57, Zhanghaobo <hfutzhan...@163.com> a écrit :
>>>
>>>>
>>>> Hi,  @Sébastien Rebecchi
>>>> Don't know more details about how you start balancer, did you use
>>>> -asService?
>>>>
>>>>
>>>> ---- Replied Message ----
>>>> From Sébastien Rebecchi<srebec...@kameleoon.com.INVALID>
>>>> <srebec...@kameleoon.com.INVALID>
>>>> Date 3/9/2025 18:03
>>>> To <user@hadoop.apache.org>,
>>>> <user@hadoop.apache.org><hdfs-...@hadoop.apache.org>
>>>> <hdfs-...@hadoop.apache.org>
>>>> Subject Re: Can not run HDFS balancer cause metrics already exists
>>>> Hello
>>>>
>>>> Could anyone help on this please?
>>>> Situation is still the same after several days.
>>>> I add some precisions
>>>> - hadoop version 3.4.1
>>>> - balancer command line run: hdfs balancer
>>>> -Ddfs.balancer.movedWinWidth=5400000 -Ddfs.balancer.moverThreads=1000
>>>> -Ddfs.balancer.dispatcherThreads=200
>>>> -Ddfs.datanode.balance.max.concurrent.moves=50
>>>> -Ddfs.datanode.balance.bandwidthPerSec=100m
>>>> -Ddfs.balancer.max-size-to-move=10737418240 -threshold 1
>>>>
>>>> Thank you
>>>>
>>>>
>>>> Le mar. 4 mars 2025, 16:59, Sébastien Rebecchi <srebec...@kameleoon.com>
>>>> a écrit :
>>>>
>>>>> Hello
>>>>>
>>>>> After having added a new node on my HDFS cluster, I try running
>>>>> balancer, but it always fails with the following error, even after 
>>>>> retrying
>>>>> multiple times during the day, and even after having restarted name node
>>>>> What should I do to unlock?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Sébastien
>>>>>
>>>>>
>>>>> ERROR balancer.Balancer: Exiting balancer due an exception
>>>>> org.apache.hadoop.metrics2.MetricsException: Metrics source
>>>>> Balancer-{HERE REPLACE BY CLUSTER'S BLOCK POOL ID} already exists!
>>>>>         at
>>>>> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>>>>>         at
>>>>> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>>>>>         at
>>>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.balancer.BalancerMetrics.create(BalancerMetrics.java:52)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.balancer.Balancer.<init>(Balancer.java:362)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.balancer.Balancer.doBalance(Balancer.java:824)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:868)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:975)
>>>>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1133)
>>>>>
>>>>

Reply via email to