[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2020-10-24 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-13174:
---
Fix Version/s: 2.10.2

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2
>Reporter: István Fajth
>Assignee: István Fajth
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.4, 2.10.2
>
> Attachments: HDFS-13174.001.patch, HDFS-13174.002.patch, 
> HDFS-13174.003.patch, HDFS-13174.004.patch, HDFS-13174.005.patch
>
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, and there are moves decided and enqueued 
> between two DataNode, after 20 minutes Mover will stop with the following 
> exception reported to the console (lines might differ as this exception came 
> from a CDH5.12.1 installation).
>  java.io.IOException: Block move timed out
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Note that this issue is not coming up if all blocks can be moved inside the 
> DataNodes without having to move the block to an other DataNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-06-15 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-13174:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks Istvan for the contribution!

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HDFS-13174.001.patch, HDFS-13174.002.patch, 
> HDFS-13174.003.patch, HDFS-13174.004.patch, HDFS-13174.005.patch
>
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, and there are moves decided and enqueued 
> between two DataNode, after 20 minutes Mover will stop with the following 
> exception reported to the console (lines might differ as this exception came 
> from a CDH5.12.1 installation).
>  java.io.IOException: Block move timed out
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Note that this issue is not coming up if all blocks can be moved inside the 
> DataNodes without having to move the block to an other DataNode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-06-15 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-13174:
---
Fix Version/s: 3.0.4
   3.1.1
   3.2.0

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HDFS-13174.001.patch, HDFS-13174.002.patch, 
> HDFS-13174.003.patch, HDFS-13174.004.patch, HDFS-13174.005.patch
>
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, and there are moves decided and enqueued 
> between two DataNode, after 20 minutes Mover will stop with the following 
> exception reported to the console (lines might differ as this exception came 
> from a CDH5.12.1 installation).
>  java.io.IOException: Block move timed out
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Note that this issue is not coming up if all blocks can be moved inside the 
> DataNodes without having to move the block to an other DataNode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-06-14 Thread Istvan Fajth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth updated HDFS-13174:

Attachment: HDFS-13174.005.patch

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
> Attachments: HDFS-13174.001.patch, HDFS-13174.002.patch, 
> HDFS-13174.003.patch, HDFS-13174.004.patch, HDFS-13174.005.patch
>
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, and there are moves decided and enqueued 
> between two DataNode, after 20 minutes Mover will stop with the following 
> exception reported to the console (lines might differ as this exception came 
> from a CDH5.12.1 installation).
>  java.io.IOException: Block move timed out
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Note that this issue is not coming up if all blocks can be moved inside the 
> DataNodes without having to move the block to an other DataNode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-06-11 Thread Istvan Fajth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth updated HDFS-13174:

Attachment: HDFS-13174.004.patch

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
> Attachments: HDFS-13174.001.patch, HDFS-13174.002.patch, 
> HDFS-13174.003.patch, HDFS-13174.004.patch
>
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, and there are moves decided and enqueued 
> between two DataNode, after 20 minutes Mover will stop with the following 
> exception reported to the console (lines might differ as this exception came 
> from a CDH5.12.1 installation).
>  java.io.IOException: Block move timed out
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Note that this issue is not coming up if all blocks can be moved inside the 
> DataNodes without having to move the block to an other DataNode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-06-08 Thread Istvan Fajth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth updated HDFS-13174:

Status: Patch Available  (was: Open)

Added a new patch, after unit test failure revealed that I was too lazy to run 
it once and figure out that with the number of DataNode change I should change 
the blocks' replication factor as well.

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 3.0.0-alpha2, 2.7.4, 2.8.0
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
> Attachments: HDFS-13174.001.patch, HDFS-13174.002.patch, 
> HDFS-13174.003.patch
>
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, and there are moves decided and enqueued 
> between two DataNode, after 20 minutes Mover will stop with the following 
> exception reported to the console (lines might differ as this exception came 
> from a CDH5.12.1 installation).
>  java.io.IOException: Block move timed out
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Note that this issue is not coming up if all blocks can be moved inside the 
> DataNodes without having to move the block to an other DataNode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-06-08 Thread Istvan Fajth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth updated HDFS-13174:

Attachment: HDFS-13174.003.patch

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
> Attachments: HDFS-13174.001.patch, HDFS-13174.002.patch, 
> HDFS-13174.003.patch
>
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, and there are moves decided and enqueued 
> between two DataNode, after 20 minutes Mover will stop with the following 
> exception reported to the console (lines might differ as this exception came 
> from a CDH5.12.1 installation).
>  java.io.IOException: Block move timed out
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Note that this issue is not coming up if all blocks can be moved inside the 
> DataNodes without having to move the block to an other DataNode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-06-08 Thread Istvan Fajth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth updated HDFS-13174:

Status: Open  (was: Patch Available)

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 3.0.0-alpha2, 2.7.4, 2.8.0
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
> Attachments: HDFS-13174.001.patch, HDFS-13174.002.patch
>
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, and there are moves decided and enqueued 
> between two DataNode, after 20 minutes Mover will stop with the following 
> exception reported to the console (lines might differ as this exception came 
> from a CDH5.12.1 installation).
>  java.io.IOException: Block move timed out
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Note that this issue is not coming up if all blocks can be moved inside the 
> DataNodes without having to move the block to an other DataNode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-06-08 Thread Istvan Fajth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth updated HDFS-13174:

Status: Patch Available  (was: Open)

Hello [~jojochuang],

thank you very much for the review, I have attached a new version of the patch, 
that addresses the code related issues you have found.

Let me address the questions you proposed as well:
 - about ignoring the general timeout:
 The main problem with the MAX_ITERATION_TIME for the Mover is that it does not 
have any iterations, so if you want an overall timeout for the Mover either we 
can introduce iterations for the Mover as well, or just ignore this timeout. 
The patch introduces the latter, as I see it is sufficient as there are other 
timeouts the problem appears when the move is between nodes, for that we have a 
connection timeout set to the HdfsConstants.READ_TIMEOUT (60 sec), and 5 times 
the HdfsConstants.READ_TIMEOUT value set as general socket timeout, so after 5 
minutes we abandon a move to a DataNode that does not respond. (See 
Dispatcher.java line 365-367 and 373 after applying my patch.) DataNodes should 
respond twice within the time set via 
HdfsClientConfigKeys.DFS_CLIENT_SOCKET_TIMEOUT_KEY, default is 60 seconds, so 
this seems to be accurate for the Mover thread to just stuck on a failing 
DataNode for a reasonable time I think. These delays are the ones failing the 
iteration in the Balancer and in case it happens Balancer cleans up all the 
work already scheduled for the given iteration and start a new one, as the 
Mover does not have iterations, the MAX_ITERATION_TIME check I have removed 
failed the Mover in the same scenario.

 - about the flakyness of the test:
 I just added a note in the current patch for the test. After starting up the 
cluster, in between the two time checks the following happens: blocks read for 
the DNs, Balancer decides to move two blocks, schedules the two block move. 
This seems to be quite a few operation. The Balancer should fail after 2 
seconds of being run, so at the 3rd heartbeat at the 3rd second mark. so that 
leaves us 500ms with the scheduling, and getting the result from the DN, on my 
environment the total runtime detected is under 3100ms, I felt safe leaving 
500ms for slower or busier environments, but if needed we can either remove 
this assertion, or increase the time to be more on the safe side. I am against 
removing the time check, as that would leave us not testing the timeout at all, 
just that the Balancer has stopped the iteration in a status where there were 
still moves in progress.

Let me know your thoughts on the approach I took, also please check the new 
patch. Thank you!

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 3.0.0-alpha2, 2.7.4, 2.8.0
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
> Attachments: HDFS-13174.001.patch, HDFS-13174.002.patch
>
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, and there are moves decided and enqueued 
> between two DataNode, after 20 minutes Mover will stop with the following 
> exception reported to the console (lines might differ as this exception came 
> from a CDH5.12.1 installation).
>  java.io.IOException: Block move timed out
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Note that this issue is not coming up if all blocks can be moved inside the 
> DataNodes without having to move the block to an other DataNode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To 

[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-06-08 Thread Istvan Fajth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth updated HDFS-13174:

Attachment: HDFS-13174.002.patch

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
> Attachments: HDFS-13174.001.patch, HDFS-13174.002.patch
>
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, and there are moves decided and enqueued 
> between two DataNode, after 20 minutes Mover will stop with the following 
> exception reported to the console (lines might differ as this exception came 
> from a CDH5.12.1 installation).
>  java.io.IOException: Block move timed out
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Note that this issue is not coming up if all blocks can be moved inside the 
> DataNodes without having to move the block to an other DataNode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-06-08 Thread Istvan Fajth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth updated HDFS-13174:

Attachment: (was: HDFS-13174.002.patch)

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
> Attachments: HDFS-13174.001.patch
>
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, and there are moves decided and enqueued 
> between two DataNode, after 20 minutes Mover will stop with the following 
> exception reported to the console (lines might differ as this exception came 
> from a CDH5.12.1 installation).
>  java.io.IOException: Block move timed out
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Note that this issue is not coming up if all blocks can be moved inside the 
> DataNodes without having to move the block to an other DataNode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-06-08 Thread Istvan Fajth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth updated HDFS-13174:

Status: Open  (was: Patch Available)

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 3.0.0-alpha2, 2.7.4, 2.8.0
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
> Attachments: HDFS-13174.001.patch, HDFS-13174.002.patch
>
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, and there are moves decided and enqueued 
> between two DataNode, after 20 minutes Mover will stop with the following 
> exception reported to the console (lines might differ as this exception came 
> from a CDH5.12.1 installation).
>  java.io.IOException: Block move timed out
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Note that this issue is not coming up if all blocks can be moved inside the 
> DataNodes without having to move the block to an other DataNode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-06-08 Thread Istvan Fajth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth updated HDFS-13174:

Attachment: HDFS-13174.002.patch

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
> Attachments: HDFS-13174.001.patch, HDFS-13174.002.patch
>
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, and there are moves decided and enqueued 
> between two DataNode, after 20 minutes Mover will stop with the following 
> exception reported to the console (lines might differ as this exception came 
> from a CDH5.12.1 installation).
>  java.io.IOException: Block move timed out
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Note that this issue is not coming up if all blocks can be moved inside the 
> DataNodes without having to move the block to an other DataNode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-06-02 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-13174:
--
Target Version/s: 2.7.8  (was: 2.7.7)

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
> Attachments: HDFS-13174.001.patch
>
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, and there are moves decided and enqueued 
> between two DataNode, after 20 minutes Mover will stop with the following 
> exception reported to the console (lines might differ as this exception came 
> from a CDH5.12.1 installation).
>  java.io.IOException: Block move timed out
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Note that this issue is not coming up if all blocks can be moved inside the 
> DataNodes without having to move the block to an other DataNode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-05-02 Thread Istvan Fajth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth updated HDFS-13174:

Release Note: 
Mover could have fail after 20+ minutes if a block move was enqueued for this 
long, between two DataNodes due to an internal constant that was introduced for 
Balancer, but affected Mover as well.
The internal constant can be configured with the 
dfs.balancer.max-iteration-time parameter after the patch, and affects only the 
Balancer. Default is 20 minutes.
  Status: Patch Available  (was: Open)

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 3.0.0-alpha2, 2.7.4, 2.8.0
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
> Attachments: HDFS-13174.001.patch
>
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, and there are moves decided and enqueued 
> between two DataNode, after 20 minutes Mover will stop with the following 
> exception reported to the console (lines might differ as this exception came 
> from a CDH5.12.1 installation).
>  java.io.IOException: Block move timed out
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Note that this issue is not coming up if all blocks can be moved inside the 
> DataNodes without having to move the block to an other DataNode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-05-02 Thread Istvan Fajth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth updated HDFS-13174:

Description: 
In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
class, that is checked during dispatching the moves that the Balancer and the 
Mover does. This timeout is hardwired to 20 minutes.

In the Balancer we have iterations, and even if an iteration is timing out the 
Balancer runs further and does an other iteration before it fails if there were 
no moves happened in a few iterations.

The Mover on the other hand does not have iterations, so if moving a path runs 
for more than 20 minutes, and there are moves decided and enqueued between two 
DataNode, after 20 minutes Mover will stop with the following exception 
reported to the console (lines might differ as this exception came from a 
CDH5.12.1 installation).
 java.io.IOException: Block move timed out
 at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
 at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
 at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
 at org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)

 

Note that this issue is not coming up if all blocks can be moved inside the 
DataNodes without having to move the block to an other DataNode.

  was:
In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
class, that is checked during dispatching the moves that the Balancer and the 
Mover does. This timeout is hardwired to 20 minutes.

In the Balancer we have iterations, and even if an iteration is timing out the 
Balancer runs further and does an other iteration before it fails if there were 
no moves happened in a few iterations.

The Mover on the other hand does not have iterations, so if moving a path runs 
for more than 20 minutes, after 20 minutes Mover will stop with the following 
exception reported to the console (lines might differ as this exception came 
from a CDH5.12.1 installation):
java.io.IOException: Block move timed out
at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
> Attachments: HDFS-13174.001.patch
>
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, and there are moves decided and enqueued 
> between two DataNode, after 20 minutes Mover will stop with the following 
> exception reported to the console (lines might differ as this exception came 
> from a CDH5.12.1 installation).
>  java.io.IOException: Block move timed out
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at 

[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-05-02 Thread Istvan Fajth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth updated HDFS-13174:

Attachment: HDFS-13174.001.patch

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
> Attachments: HDFS-13174.001.patch
>
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, after 20 minutes Mover will stop with the 
> following exception reported to the console (lines might differ as this 
> exception came from a CDH5.12.1 installation):
> java.io.IOException: Block move timed out
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-04-02 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-13174:
-
Target Version/s: 3.0.1, 2.8.4, 2.7.6, 2.9.2  (was: 2.9.1, 3.0.1, 2.8.4, 
2.7.6)

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, after 20 minutes Mover will stop with the 
> following exception reported to the console (lines might differ as this 
> exception came from a CDH5.12.1 installation):
> java.io.IOException: Block move timed out
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-03-20 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13174:
--
Target Version/s: 2.9.1, 3.0.1, 2.8.4, 2.7.6  (was: 3.1.0, 2.9.1, 3.0.1, 
2.8.4, 2.7.6)

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, after 20 minutes Mover will stop with the 
> following exception reported to the console (lines might differ as this 
> exception came from a CDH5.12.1 installation):
> java.io.IOException: Block move timed out
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-03-19 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-13174:
--
Target Version/s: 3.1.0, 2.9.1, 3.0.1, 2.8.4, 2.7.6
   Fix Version/s: (was: 2.7.6)
  (was: 2.8.4)
  (was: 3.0.1)
  (was: 3.1.0)

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, after 20 minutes Mover will stop with the 
> following exception reported to the console (lines might differ as this 
> exception came from a CDH5.12.1 installation):
> java.io.IOException: Block move timed out
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org