[jira] [Updated] (HDFS-11377) Balancer hung due to no available mover threads

2017-05-22 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-11377:
---
Target Version/s: 2.9.0, 2.7.4, 3.0.0-alpha3  (was: 2.9.0, 3.0.0-alpha3)
   Fix Version/s: 2.8.2
  2.7.4

Merged this into branch-2.8 and branch-2.7. Changing fix version.

> Balancer hung due to no available mover threads
> ---
>
> Key: HDFS-11377
> URL: https://issues.apache.org/jira/browse/HDFS-11377
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.7.3
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha3, 2.8.2
>
> Attachments: HDFS-11377.001.patch, HDFS-11377.002.patch
>
>
> When running balancer on large cluster which have more than 3000 Datanodes, 
> it might be hung due to "No mover threads available".
> The stack trace shows it waiting forever like below.
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7ff6cc014800 nid=0x6b2c waiting on 
> condition [0x7ff6d1bad000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1043)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1017)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:981)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:611)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:663)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:776)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:905)
> {code}
> In the log, there are lots of WARN about "No mover threads available".
> {quote}
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_13700554102_1112815018180 with size=268435456 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 
> 10.115.67.137:50010
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_4009558842_1103118359883 with size=268435456 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 
> 10.115.67.137:50010
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_13881956058_1112996460026 with size=133509566 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 10.115.67.36:50010
> {quote}
> What happened here is, when there are no mover threads available, 
> DDatanode.isPendingQEmpty() will return false, so Balancer hung.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11377) Balancer hung due to no available mover threads

2017-02-05 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11377:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

The remove operation should be safe since the method {{removePendingBlock}} has 
using {{synchronized}}. The failed test is not related. Committed to trunk and 
branch-2. Thanks [~zhaoyunjiong] for the contribution and thanks [~manojg] for 
the review!

> Balancer hung due to no available mover threads
> ---
>
> Key: HDFS-11377
> URL: https://issues.apache.org/jira/browse/HDFS-11377
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.7.3
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HDFS-11377.001.patch, HDFS-11377.002.patch
>
>
> When running balancer on large cluster which have more than 3000 Datanodes, 
> it might be hung due to "No mover threads available".
> The stack trace shows it waiting forever like below.
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7ff6cc014800 nid=0x6b2c waiting on 
> condition [0x7ff6d1bad000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1043)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1017)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:981)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:611)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:663)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:776)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:905)
> {code}
> In the log, there are lots of WARN about "No mover threads available".
> {quote}
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_13700554102_1112815018180 with size=268435456 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 
> 10.115.67.137:50010
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_4009558842_1103118359883 with size=268435456 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 
> 10.115.67.137:50010
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_13881956058_1112996460026 with size=133509566 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 10.115.67.36:50010
> {quote}
> What happened here is, when there are no mover threads available, 
> DDatanode.isPendingQEmpty() will return false, so Balancer hung.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11377) Balancer hung due to no available mover threads

2017-02-05 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11377:
-
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha3
   2.9.0

> Balancer hung due to no available mover threads
> ---
>
> Key: HDFS-11377
> URL: https://issues.apache.org/jira/browse/HDFS-11377
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.7.3
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HDFS-11377.001.patch, HDFS-11377.002.patch
>
>
> When running balancer on large cluster which have more than 3000 Datanodes, 
> it might be hung due to "No mover threads available".
> The stack trace shows it waiting forever like below.
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7ff6cc014800 nid=0x6b2c waiting on 
> condition [0x7ff6d1bad000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1043)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1017)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:981)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:611)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:663)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:776)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:905)
> {code}
> In the log, there are lots of WARN about "No mover threads available".
> {quote}
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_13700554102_1112815018180 with size=268435456 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 
> 10.115.67.137:50010
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_4009558842_1103118359883 with size=268435456 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 
> 10.115.67.137:50010
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_13881956058_1112996460026 with size=133509566 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 10.115.67.36:50010
> {quote}
> What happened here is, when there are no mover threads available, 
> DDatanode.isPendingQEmpty() will return false, so Balancer hung.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11377) Balancer hung due to no available mover threads

2017-02-01 Thread yunjiong zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yunjiong zhao updated HDFS-11377:
-
Attachment: HDFS-11377.002.patch

Remove unused variable MAX_NO_PENDING_MOVE_ITERATIONS.
Thanks [~linyiqun] for your time.

> Balancer hung due to no available mover threads
> ---
>
> Key: HDFS-11377
> URL: https://issues.apache.org/jira/browse/HDFS-11377
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.7.3
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: HDFS-11377.001.patch, HDFS-11377.002.patch
>
>
> When running balancer on large cluster which have more than 3000 Datanodes, 
> it might be hung due to "No mover threads available".
> The stack trace shows it waiting forever like below.
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7ff6cc014800 nid=0x6b2c waiting on 
> condition [0x7ff6d1bad000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1043)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1017)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:981)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:611)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:663)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:776)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:905)
> {code}
> In the log, there are lots of WARN about "No mover threads available".
> {quote}
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_13700554102_1112815018180 with size=268435456 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 
> 10.115.67.137:50010
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_4009558842_1103118359883 with size=268435456 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 
> 10.115.67.137:50010
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_13881956058_1112996460026 with size=133509566 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 10.115.67.36:50010
> {quote}
> What happened here is, when there are no mover threads available, 
> DDatanode.isPendingQEmpty() will return false, so Balancer hung.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11377) Balancer hung due to no available mover threads

2017-01-31 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11377:
-
Target Version/s: 2.9.0, 3.0.0-alpha3
 Component/s: balancer & mover
 Summary: Balancer hung due to no available mover threads  (was: 
Balancer hung due to "No mover threads available")

> Balancer hung due to no available mover threads
> ---
>
> Key: HDFS-11377
> URL: https://issues.apache.org/jira/browse/HDFS-11377
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.7.3
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: HDFS-11377.001.patch
>
>
> When running balancer on large cluster which have more than 3000 Datanodes, 
> it might be hung due to "No mover threads available".
> The stack trace shows it waiting forever like below.
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7ff6cc014800 nid=0x6b2c waiting on 
> condition [0x7ff6d1bad000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1043)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1017)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:981)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:611)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:663)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:776)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:905)
> {code}
> In the log, there are lots of WARN about "No mover threads available".
> {quote}
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_13700554102_1112815018180 with size=268435456 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 
> 10.115.67.137:50010
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_4009558842_1103118359883 with size=268435456 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 
> 10.115.67.137:50010
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_13881956058_1112996460026 with size=133509566 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 10.115.67.36:50010
> {quote}
> What happened here is, when there are no mover threads available, 
> DDatanode.isPendingQEmpty() will return false, so Balancer hung.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org