yunjiong zhao created HDFS-11377:
------------------------------------

             Summary: Balancer hung due to "No mover threads available"
                 Key: HDFS-11377
                 URL: https://issues.apache.org/jira/browse/HDFS-11377
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 2.7.3
            Reporter: yunjiong zhao
            Assignee: yunjiong zhao


When running balancer on large cluster which have more than 3000 Datanodes, it 
might be hung due to "No mover threads available".
The stack trace shows it waiting forever like below.
{code}
"main" #1 prio=5 os_prio=0 tid=0x00007ff6cc014800 nid=0x6b2c waiting on 
condition [0x00007ff6d1bad000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1043)
        at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1017)
        at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:981)
        at 
org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:611)
        at 
org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:663)
        at 
org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:776)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at 
org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:905)
{code}

In the log, there are lots of WARN about "No mover threads available".
{quote}
2017-01-26 15:36:40,085 WARN org.apache.hadoop.hdfs.server.balancer.Dispatcher: 
No mover threads available: skip moving blk_13700554102_1112815018180 with 
size=268435456 from 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 
10.115.67.137:50010
2017-01-26 15:36:40,085 WARN org.apache.hadoop.hdfs.server.balancer.Dispatcher: 
No mover threads available: skip moving blk_4009558842_1103118359883 with 
size=268435456 from 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 
10.115.67.137:50010
2017-01-26 15:36:40,085 WARN org.apache.hadoop.hdfs.server.balancer.Dispatcher: 
No mover threads available: skip moving blk_13881956058_1112996460026 with 
size=133509566 from 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 
10.115.67.36:50010
{quote}

What happened here is, when there are no mover threads available, 
DDatanode.isPendingQEmpty() will return false, so Balancer hung.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to