[jira] [Updated] (HDFS-11377) Balancer hung due to no available mover threads
[ https://issues.apache.org/jira/browse/HDFS-11377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-11377: --- Target Version/s: 2.9.0, 2.7.4, 3.0.0-alpha3 (was: 2.9.0, 3.0.0-alpha3) Fix Version/s: 2.8.2 2.7.4 Merged this into branch-2.8 and branch-2.7. Changing fix version. > Balancer hung due to no available mover threads > --- > > Key: HDFS-11377 > URL: https://issues.apache.org/jira/browse/HDFS-11377 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.7.3 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Fix For: 2.9.0, 2.7.4, 3.0.0-alpha3, 2.8.2 > > Attachments: HDFS-11377.001.patch, HDFS-11377.002.patch > > > When running balancer on large cluster which have more than 3000 Datanodes, > it might be hung due to "No mover threads available". > The stack trace shows it waiting forever like below. > {code} > "main" #1 prio=5 os_prio=0 tid=0x7ff6cc014800 nid=0x6b2c waiting on > condition [0x7ff6d1bad000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1043) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1017) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:981) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:611) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:663) > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:776) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:905) > {code} > In the log, there are lots of WARN about "No mover threads available". > {quote} > 2017-01-26 15:36:40,085 WARN > org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads > available: skip moving blk_13700554102_1112815018180 with size=268435456 from > 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through > 10.115.67.137:50010 > 2017-01-26 15:36:40,085 WARN > org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads > available: skip moving blk_4009558842_1103118359883 with size=268435456 from > 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through > 10.115.67.137:50010 > 2017-01-26 15:36:40,085 WARN > org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads > available: skip moving blk_13881956058_1112996460026 with size=133509566 from > 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 10.115.67.36:50010 > {quote} > What happened here is, when there are no mover threads available, > DDatanode.isPendingQEmpty() will return false, so Balancer hung. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11377) Balancer hung due to no available mover threads
[ https://issues.apache.org/jira/browse/HDFS-11377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-11377: - Resolution: Fixed Status: Resolved (was: Patch Available) The remove operation should be safe since the method {{removePendingBlock}} has using {{synchronized}}. The failed test is not related. Committed to trunk and branch-2. Thanks [~zhaoyunjiong] for the contribution and thanks [~manojg] for the review! > Balancer hung due to no available mover threads > --- > > Key: HDFS-11377 > URL: https://issues.apache.org/jira/browse/HDFS-11377 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.7.3 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Fix For: 2.9.0, 3.0.0-alpha3 > > Attachments: HDFS-11377.001.patch, HDFS-11377.002.patch > > > When running balancer on large cluster which have more than 3000 Datanodes, > it might be hung due to "No mover threads available". > The stack trace shows it waiting forever like below. > {code} > "main" #1 prio=5 os_prio=0 tid=0x7ff6cc014800 nid=0x6b2c waiting on > condition [0x7ff6d1bad000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1043) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1017) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:981) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:611) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:663) > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:776) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:905) > {code} > In the log, there are lots of WARN about "No mover threads available". > {quote} > 2017-01-26 15:36:40,085 WARN > org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads > available: skip moving blk_13700554102_1112815018180 with size=268435456 from > 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through > 10.115.67.137:50010 > 2017-01-26 15:36:40,085 WARN > org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads > available: skip moving blk_4009558842_1103118359883 with size=268435456 from > 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through > 10.115.67.137:50010 > 2017-01-26 15:36:40,085 WARN > org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads > available: skip moving blk_13881956058_1112996460026 with size=133509566 from > 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 10.115.67.36:50010 > {quote} > What happened here is, when there are no mover threads available, > DDatanode.isPendingQEmpty() will return false, so Balancer hung. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11377) Balancer hung due to no available mover threads
[ https://issues.apache.org/jira/browse/HDFS-11377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-11377: - Hadoop Flags: Reviewed Fix Version/s: 3.0.0-alpha3 2.9.0 > Balancer hung due to no available mover threads > --- > > Key: HDFS-11377 > URL: https://issues.apache.org/jira/browse/HDFS-11377 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.7.3 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Fix For: 2.9.0, 3.0.0-alpha3 > > Attachments: HDFS-11377.001.patch, HDFS-11377.002.patch > > > When running balancer on large cluster which have more than 3000 Datanodes, > it might be hung due to "No mover threads available". > The stack trace shows it waiting forever like below. > {code} > "main" #1 prio=5 os_prio=0 tid=0x7ff6cc014800 nid=0x6b2c waiting on > condition [0x7ff6d1bad000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1043) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1017) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:981) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:611) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:663) > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:776) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:905) > {code} > In the log, there are lots of WARN about "No mover threads available". > {quote} > 2017-01-26 15:36:40,085 WARN > org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads > available: skip moving blk_13700554102_1112815018180 with size=268435456 from > 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through > 10.115.67.137:50010 > 2017-01-26 15:36:40,085 WARN > org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads > available: skip moving blk_4009558842_1103118359883 with size=268435456 from > 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through > 10.115.67.137:50010 > 2017-01-26 15:36:40,085 WARN > org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads > available: skip moving blk_13881956058_1112996460026 with size=133509566 from > 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 10.115.67.36:50010 > {quote} > What happened here is, when there are no mover threads available, > DDatanode.isPendingQEmpty() will return false, so Balancer hung. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11377) Balancer hung due to no available mover threads
[ https://issues.apache.org/jira/browse/HDFS-11377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yunjiong zhao updated HDFS-11377: - Attachment: HDFS-11377.002.patch Remove unused variable MAX_NO_PENDING_MOVE_ITERATIONS. Thanks [~linyiqun] for your time. > Balancer hung due to no available mover threads > --- > > Key: HDFS-11377 > URL: https://issues.apache.org/jira/browse/HDFS-11377 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.7.3 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-11377.001.patch, HDFS-11377.002.patch > > > When running balancer on large cluster which have more than 3000 Datanodes, > it might be hung due to "No mover threads available". > The stack trace shows it waiting forever like below. > {code} > "main" #1 prio=5 os_prio=0 tid=0x7ff6cc014800 nid=0x6b2c waiting on > condition [0x7ff6d1bad000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1043) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1017) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:981) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:611) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:663) > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:776) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:905) > {code} > In the log, there are lots of WARN about "No mover threads available". > {quote} > 2017-01-26 15:36:40,085 WARN > org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads > available: skip moving blk_13700554102_1112815018180 with size=268435456 from > 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through > 10.115.67.137:50010 > 2017-01-26 15:36:40,085 WARN > org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads > available: skip moving blk_4009558842_1103118359883 with size=268435456 from > 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through > 10.115.67.137:50010 > 2017-01-26 15:36:40,085 WARN > org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads > available: skip moving blk_13881956058_1112996460026 with size=133509566 from > 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 10.115.67.36:50010 > {quote} > What happened here is, when there are no mover threads available, > DDatanode.isPendingQEmpty() will return false, so Balancer hung. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11377) Balancer hung due to no available mover threads
[ https://issues.apache.org/jira/browse/HDFS-11377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-11377: - Target Version/s: 2.9.0, 3.0.0-alpha3 Component/s: balancer & mover Summary: Balancer hung due to no available mover threads (was: Balancer hung due to "No mover threads available") > Balancer hung due to no available mover threads > --- > > Key: HDFS-11377 > URL: https://issues.apache.org/jira/browse/HDFS-11377 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.7.3 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-11377.001.patch > > > When running balancer on large cluster which have more than 3000 Datanodes, > it might be hung due to "No mover threads available". > The stack trace shows it waiting forever like below. > {code} > "main" #1 prio=5 os_prio=0 tid=0x7ff6cc014800 nid=0x6b2c waiting on > condition [0x7ff6d1bad000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1043) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1017) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:981) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:611) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:663) > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:776) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:905) > {code} > In the log, there are lots of WARN about "No mover threads available". > {quote} > 2017-01-26 15:36:40,085 WARN > org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads > available: skip moving blk_13700554102_1112815018180 with size=268435456 from > 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through > 10.115.67.137:50010 > 2017-01-26 15:36:40,085 WARN > org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads > available: skip moving blk_4009558842_1103118359883 with size=268435456 from > 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through > 10.115.67.137:50010 > 2017-01-26 15:36:40,085 WARN > org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads > available: skip moving blk_13881956058_1112996460026 with size=133509566 from > 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 10.115.67.36:50010 > {quote} > What happened here is, when there are no mover threads available, > DDatanode.isPendingQEmpty() will return false, so Balancer hung. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org