[jira] [Commented] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17220203#comment-17220203 ] Konstantin Shvachko commented on HDFS-13174: This actually fixes a pretty bad bug, which makes balancing very slow on a large cluster. The problem is that while the Balancer cancels unfinished moves after 20 min iteration time, the Dispatcher schedules new moves. The canceling thread eventually wins, but the race can go for a long time. I have seen iterations lasting from 1 up to 10 hours because canceling cannot finish. The problem is present in all versions starting from 2.7.4. I will backport it to branch 2.10 if there are no objections. There are 2 easy to fix conflicts. Ran all Balancer and Mover unit tests and tested the fix on a production cluster. > hdfs mover -p /path times out after 20 min > -- > > Key: HDFS-13174 > URL: https://issues.apache.org/jira/browse/HDFS-13174 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2 >Reporter: István Fajth >Assignee: István Fajth >Priority: Major > Fix For: 3.2.0, 3.1.1, 3.0.4 > > Attachments: HDFS-13174.001.patch, HDFS-13174.002.patch, > HDFS-13174.003.patch, HDFS-13174.004.patch, HDFS-13174.005.patch > > > In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source > class, that is checked during dispatching the moves that the Balancer and the > Mover does. This timeout is hardwired to 20 minutes. > In the Balancer we have iterations, and even if an iteration is timing out > the Balancer runs further and does an other iteration before it fails if > there were no moves happened in a few iterations. > The Mover on the other hand does not have iterations, so if moving a path > runs for more than 20 minutes, and there are moves decided and enqueued > between two DataNode, after 20 minutes Mover will stop with the following > exception reported to the console (lines might differ as this exception came > from a CDH5.12.1 installation). > java.io.IOException: Block move timed out > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Note that this issue is not coming up if all blocks can be moved inside the > DataNodes without having to move the block to an other DataNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514356#comment-16514356 ] Hudson commented on HDFS-13174: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14437 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14437/]) HDFS-13174. hdfs mover -p /path times out after 20 min. Contributed by (weichiu: rev c966a3837af1c1a1c4a441f491b0d76d5c9e5d78) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/mover/TestMover.java > hdfs mover -p /path times out after 20 min > -- > > Key: HDFS-13174 > URL: https://issues.apache.org/jira/browse/HDFS-13174 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2 >Reporter: Istvan Fajth >Assignee: Istvan Fajth >Priority: Major > Fix For: 3.2.0, 3.1.1, 3.0.4 > > Attachments: HDFS-13174.001.patch, HDFS-13174.002.patch, > HDFS-13174.003.patch, HDFS-13174.004.patch, HDFS-13174.005.patch > > > In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source > class, that is checked during dispatching the moves that the Balancer and the > Mover does. This timeout is hardwired to 20 minutes. > In the Balancer we have iterations, and even if an iteration is timing out > the Balancer runs further and does an other iteration before it fails if > there were no moves happened in a few iterations. > The Mover on the other hand does not have iterations, so if moving a path > runs for more than 20 minutes, and there are moves decided and enqueued > between two DataNode, after 20 minutes Mover will stop with the following > exception reported to the console (lines might differ as this exception came > from a CDH5.12.1 installation). > java.io.IOException: Block move timed out > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Note that this issue is not coming up if all blocks can be moved inside the > DataNodes without having to move the block to an other DataNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514331#comment-16514331 ] Wei-Chiu Chuang commented on HDFS-13174: Thanks [~pifta] for the insight. Here's the relevant code: {code:title=TestBalancer#testMaxIterationTime} // set client socket timeout to have an IN_PROGRESS notification back from // the DataNode about the copy in every second. conf.setLong(DFSConfigKeys.DFS_CLIENT_SOCKET_TIMEOUT_KEY, 2000L); {code} and {code:title=BlockReceiver#(constructor)} // For replaceBlock() calls response should be sent to avoid socketTimeout // at clients. So sending with the interval of 0.5 * socketTimeout final long readTimeout = datanode.getDnConf().socketTimeout; this.responseInterval = (long) (readTimeout * 0.5); {code} Patch v4 makes sense to me +1. Patch v5 actually failed shaded client build, most likely because of the dependency. > hdfs mover -p /path times out after 20 min > -- > > Key: HDFS-13174 > URL: https://issues.apache.org/jira/browse/HDFS-13174 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2 >Reporter: Istvan Fajth >Assignee: Istvan Fajth >Priority: Major > Attachments: HDFS-13174.001.patch, HDFS-13174.002.patch, > HDFS-13174.003.patch, HDFS-13174.004.patch, HDFS-13174.005.patch > > > In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source > class, that is checked during dispatching the moves that the Balancer and the > Mover does. This timeout is hardwired to 20 minutes. > In the Balancer we have iterations, and even if an iteration is timing out > the Balancer runs further and does an other iteration before it fails if > there were no moves happened in a few iterations. > The Mover on the other hand does not have iterations, so if moving a path > runs for more than 20 minutes, and there are moves decided and enqueued > between two DataNode, after 20 minutes Mover will stop with the following > exception reported to the console (lines might differ as this exception came > from a CDH5.12.1 installation). > java.io.IOException: Block move timed out > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Note that this issue is not coming up if all blocks can be moved inside the > DataNodes without having to move the block to an other DataNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512277#comment-16512277 ] genericqa commented on HDFS-13174: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 3m 36s{color} | {color:red} branch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 2m 12s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 94m 58s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}139m 54s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | HDFS-13174 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12927795/HDFS-13174.005.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux dc0c165241e6 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9119b3c | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/24438/testReport/ | | Max. process+thread count | 3159 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/24438/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > hdfs mover -p /path times out after 20 min >
[jira] [Commented] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512097#comment-16512097 ] Istvan Fajth commented on HDFS-13174: - Hi [~jojochuang], I can fix those warnings, though I am not completely agree on that one, though the problem I see should be a few other tickets. Let me quickly explain. The constant I am using there is DFS_CLIENT_SOCKET_TIMEOUT_KEY, this property which is deprecated in DFSConfigKeys class, and has been moved to HdfsClientConfigKeys class by the patch in HDFS-8803. The aim of HDFS-8803 is to move client configurations to the hdfs-client module. However this way we have arrived to a situation where the hadoop-hdfs module is dependent on the hadoop-hdfs-client module, not just because of this one constant, and also the hadoop-hdfs-client module contains classes from the org.apache.hadoop.hdfs.server packages also. On this particular constant, the interesting part is that through DNConf class the setting has a direct effect on how the DataNode works, which also tells me that this is not a client only setting. With the current dependency setup though it seems normal to use the configuration key from the client module, but on the long run, I do not think this is a good practice, as client configurations and classes should not effect server side and vice-versa, and these kind of things should go to a common ancestor in the dependency chain. But I am certain that this is way beyond the scope of this ticket, I just wanted to share the rationale behind my first decision. Anyways, I am adding a new patch, that changes to the non-deprecated version of the constant, and conform with the current state of the project, and influence this refactoring in the appropriate tickets further on as I have time. > hdfs mover -p /path times out after 20 min > -- > > Key: HDFS-13174 > URL: https://issues.apache.org/jira/browse/HDFS-13174 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2 >Reporter: Istvan Fajth >Assignee: Istvan Fajth >Priority: Major > Attachments: HDFS-13174.001.patch, HDFS-13174.002.patch, > HDFS-13174.003.patch, HDFS-13174.004.patch > > > In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source > class, that is checked during dispatching the moves that the Balancer and the > Mover does. This timeout is hardwired to 20 minutes. > In the Balancer we have iterations, and even if an iteration is timing out > the Balancer runs further and does an other iteration before it fails if > there were no moves happened in a few iterations. > The Mover on the other hand does not have iterations, so if moving a path > runs for more than 20 minutes, and there are moves decided and enqueued > between two DataNode, after 20 minutes Mover will stop with the following > exception reported to the console (lines might differ as this exception came > from a CDH5.12.1 installation). > java.io.IOException: Block move timed out > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Note that this issue is not coming up if all blocks can be moved inside the > DataNodes without having to move the block to an other DataNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511487#comment-16511487 ] Wei-Chiu Chuang commented on HDFS-13174: [~pifta] I'm really really sorry about missing this: would you please also take the time to fix the javac warning? {quote} [WARNING] /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java:[1598,30] [deprecation] DFS_CLIENT_SOCKET_TIMEOUT_KEY in DFSConfigKeys has been deprecated [WARNING] /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/mover/TestMover.java:[705,30] [deprecation] DFS_CLIENT_SOCKET_TIMEOUT_KEY in DFSConfigKeys has been deprecated {quote} > hdfs mover -p /path times out after 20 min > -- > > Key: HDFS-13174 > URL: https://issues.apache.org/jira/browse/HDFS-13174 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2 >Reporter: Istvan Fajth >Assignee: Istvan Fajth >Priority: Major > Attachments: HDFS-13174.001.patch, HDFS-13174.002.patch, > HDFS-13174.003.patch, HDFS-13174.004.patch > > > In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source > class, that is checked during dispatching the moves that the Balancer and the > Mover does. This timeout is hardwired to 20 minutes. > In the Balancer we have iterations, and even if an iteration is timing out > the Balancer runs further and does an other iteration before it fails if > there were no moves happened in a few iterations. > The Mover on the other hand does not have iterations, so if moving a path > runs for more than 20 minutes, and there are moves decided and enqueued > between two DataNode, after 20 minutes Mover will stop with the following > exception reported to the console (lines might differ as this exception came > from a CDH5.12.1 installation). > java.io.IOException: Block move timed out > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Note that this issue is not coming up if all blocks can be moved inside the > DataNodes without having to move the block to an other DataNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509000#comment-16509000 ] genericqa commented on HDFS-13174: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 29s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 56s{color} | {color:red} hadoop-hdfs-project_hadoop-hdfs generated 2 new + 531 unchanged - 0 fixed = 533 total (was 531) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 29s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}111m 44s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 57s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}179m 40s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency | | | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA | | | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.server.namenode.TestCacheDirectives | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | HDFS-13174 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12927385/HDFS-13174.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 09869c065405 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 2e5cfe6 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | javac |
[jira] [Commented] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508779#comment-16508779 ] Istvan Fajth commented on HDFS-13174: - Hi [~jojochuang], good catch on the old habits of mine, I have attached a new patch (004) changing that two lines with System.currentTimeMillis() to Time.monotonicNow(). > hdfs mover -p /path times out after 20 min > -- > > Key: HDFS-13174 > URL: https://issues.apache.org/jira/browse/HDFS-13174 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2 >Reporter: Istvan Fajth >Assignee: Istvan Fajth >Priority: Major > Attachments: HDFS-13174.001.patch, HDFS-13174.002.patch, > HDFS-13174.003.patch, HDFS-13174.004.patch > > > In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source > class, that is checked during dispatching the moves that the Balancer and the > Mover does. This timeout is hardwired to 20 minutes. > In the Balancer we have iterations, and even if an iteration is timing out > the Balancer runs further and does an other iteration before it fails if > there were no moves happened in a few iterations. > The Mover on the other hand does not have iterations, so if moving a path > runs for more than 20 minutes, and there are moves decided and enqueued > between two DataNode, after 20 minutes Mover will stop with the following > exception reported to the console (lines might differ as this exception came > from a CDH5.12.1 installation). > java.io.IOException: Block move timed out > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Note that this issue is not coming up if all blocks can be moved inside the > DataNodes without having to move the block to an other DataNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508712#comment-16508712 ] Wei-Chiu Chuang commented on HDFS-13174: BTW In Hadoop we usually use Time.monotonicNow() to get time intervals. > hdfs mover -p /path times out after 20 min > -- > > Key: HDFS-13174 > URL: https://issues.apache.org/jira/browse/HDFS-13174 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2 >Reporter: Istvan Fajth >Assignee: Istvan Fajth >Priority: Major > Attachments: HDFS-13174.001.patch, HDFS-13174.002.patch, > HDFS-13174.003.patch > > > In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source > class, that is checked during dispatching the moves that the Balancer and the > Mover does. This timeout is hardwired to 20 minutes. > In the Balancer we have iterations, and even if an iteration is timing out > the Balancer runs further and does an other iteration before it fails if > there were no moves happened in a few iterations. > The Mover on the other hand does not have iterations, so if moving a path > runs for more than 20 minutes, and there are moves decided and enqueued > between two DataNode, after 20 minutes Mover will stop with the following > exception reported to the console (lines might differ as this exception came > from a CDH5.12.1 installation). > java.io.IOException: Block move timed out > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Note that this issue is not coming up if all blocks can be moved inside the > DataNodes without having to move the block to an other DataNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508711#comment-16508711 ] Wei-Chiu Chuang commented on HDFS-13174: Thanks [~pifta] +1 Good call on HdfsConstants.READ_TIMEOUT and HdfsClientConfigKeys.DFS_CLIENT_SOCKET_TIMEOUT_KEY. That makes sense to me. I'm fine to leave the per iteration test timeout at 3.5 second. We can update the timeout if it becomes a source of flakiness. > hdfs mover -p /path times out after 20 min > -- > > Key: HDFS-13174 > URL: https://issues.apache.org/jira/browse/HDFS-13174 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2 >Reporter: Istvan Fajth >Assignee: Istvan Fajth >Priority: Major > Attachments: HDFS-13174.001.patch, HDFS-13174.002.patch, > HDFS-13174.003.patch > > > In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source > class, that is checked during dispatching the moves that the Balancer and the > Mover does. This timeout is hardwired to 20 minutes. > In the Balancer we have iterations, and even if an iteration is timing out > the Balancer runs further and does an other iteration before it fails if > there were no moves happened in a few iterations. > The Mover on the other hand does not have iterations, so if moving a path > runs for more than 20 minutes, and there are moves decided and enqueued > between two DataNode, after 20 minutes Mover will stop with the following > exception reported to the console (lines might differ as this exception came > from a CDH5.12.1 installation). > java.io.IOException: Block move timed out > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Note that this issue is not coming up if all blocks can be moved inside the > DataNodes without having to move the block to an other DataNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506758#comment-16506758 ] genericqa commented on HDFS-13174: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 7s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 54s{color} | {color:red} hadoop-hdfs-project_hadoop-hdfs generated 2 new + 531 unchanged - 0 fixed = 533 total (was 531) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 57s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 12s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}157m 59s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestPersistBlocks | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | HDFS-13174 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12927137/HDFS-13174.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 9df82821f270 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cf41083 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | javac | https://builds.apache.org/job/PreCommit-HDFS-Build/24414/artifact/out/diff-compile-javac-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/24414/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results |
[jira] [Commented] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506175#comment-16506175 ] genericqa commented on HDFS-13174: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 12s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 54s{color} | {color:red} hadoop-hdfs-project_hadoop-hdfs generated 2 new + 531 unchanged - 0 fixed = 533 total (was 531) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 58s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 47s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}160m 55s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.mover.TestMover | | | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics | | | hadoop.hdfs.client.impl.TestBlockReaderLocal | | | hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | HDFS-13174 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12927047/HDFS-13174.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 33872a2162be 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / c42dcc7 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | javac |
[jira] [Commented] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491416#comment-16491416 ] Wei-Chiu Chuang commented on HDFS-13174: Sorry for my really late review: To summarize, after the patch, mover ignores the per-iteration timeout. For balancer, the timeout is configurable via dfs.balancer.max-iteration-time, default 20 minutes. I've been pondering if it makes sense to ignore the timeout at all for mover. That means the mover may be prone to hung nodes. h2. TestMover A duplicate line in the test conf.setLong(DFSConfigKeys.DFS_BALANCER_MAX_ITERATION_TIME_KEY, 200L); file.toString() is redundant. file is already a string {code} new String[]{"-p", file.toString()}); {code} The test set up can be simplified with just two DataNodes: {code} final MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf) .numDataNodes(2) .storageTypes( new StorageType[][] {{StorageType.DISK, StorageType.ARCHIVE}, {StorageType.DISK, StorageType.ARCHIVE}}) .build(); {code} h2. TestBalancer I am concerned this may become a source of flaky tests in the future. {code} assertTrue("Unexpected iteration runtime: " + runtime + "ms > 3.5s", runtime < 3500); {code} On my laptop it took a little more than 3 seconds. In a busy box this could take longer .. > hdfs mover -p /path times out after 20 min > -- > > Key: HDFS-13174 > URL: https://issues.apache.org/jira/browse/HDFS-13174 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2 >Reporter: Istvan Fajth >Assignee: Istvan Fajth >Priority: Major > Attachments: HDFS-13174.001.patch > > > In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source > class, that is checked during dispatching the moves that the Balancer and the > Mover does. This timeout is hardwired to 20 minutes. > In the Balancer we have iterations, and even if an iteration is timing out > the Balancer runs further and does an other iteration before it fails if > there were no moves happened in a few iterations. > The Mover on the other hand does not have iterations, so if moving a path > runs for more than 20 minutes, and there are moves decided and enqueued > between two DataNode, after 20 minutes Mover will stop with the following > exception reported to the console (lines might differ as this exception came > from a CDH5.12.1 installation). > java.io.IOException: Block move timed out > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Note that this issue is not coming up if all blocks can be moved inside the > DataNodes without having to move the block to an other DataNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466611#comment-16466611 ] Wei-Chiu Chuang commented on HDFS-13174: Thanks for raising the issue, [~pifta]. The description makes sense to me. I'll review the patch. > hdfs mover -p /path times out after 20 min > -- > > Key: HDFS-13174 > URL: https://issues.apache.org/jira/browse/HDFS-13174 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2 >Reporter: Istvan Fajth >Assignee: Istvan Fajth >Priority: Major > Attachments: HDFS-13174.001.patch > > > In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source > class, that is checked during dispatching the moves that the Balancer and the > Mover does. This timeout is hardwired to 20 minutes. > In the Balancer we have iterations, and even if an iteration is timing out > the Balancer runs further and does an other iteration before it fails if > there were no moves happened in a few iterations. > The Mover on the other hand does not have iterations, so if moving a path > runs for more than 20 minutes, and there are moves decided and enqueued > between two DataNode, after 20 minutes Mover will stop with the following > exception reported to the console (lines might differ as this exception came > from a CDH5.12.1 installation). > java.io.IOException: Block move timed out > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Note that this issue is not coming up if all blocks can be moved inside the > DataNodes without having to move the block to an other DataNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460906#comment-16460906 ] genericqa commented on HDFS-13174: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 47s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 48s{color} | {color:red} hadoop-hdfs-project_hadoop-hdfs generated 2 new + 531 unchanged - 0 fixed = 533 total (was 531) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 51s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 5 new + 752 unchanged - 1 fixed = 757 total (was 753) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 52s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}111m 31s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}167m 14s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.client.impl.TestBlockReaderLocal | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | HDFS-13174 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12921555/HDFS-13174.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 98292cd14a30 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e07156e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | javac |
[jira] [Commented] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460711#comment-16460711 ] Istvan Fajth commented on HDFS-13174: - Attaching a patch for review. The patch contains some refactoring to make the iteration time configurable. I have added a configuration for the Balancer to control the maximum iteration time, it seemed reasonable, however that might not need to be exposed, in this initial patch I have exposed it. Added a test for Balancer to test the max iteration time is respected, in the test to make it run in a reasonable timeframe with reasonable amount of resources used, I had to use the deprecated DFSConfigKeys.DFS_CLIENT_SOCKET_TIMEOUT_KEY, I am not sure but if there are any better way to control how often the DN gets back to the client to keepalive the connection, I would be glad to know that, this was the only way to affect that, and the newly introduced HdfsClientConfigKeys.DFS_CLIENT_SOCKET_TIMEOUT_KEY is not visible in the test package, and I did not find a way to tune the same in the DN. Added a test for Balancer, if in Dispatcher you set the newly added constructor parameter to a value higher than 0 like for example 200L the test fails because no blocks were moved as the block moves were timed out, this was the case with the previous constant. Updating the Jira description as well as I learned a few things about the issue. > hdfs mover -p /path times out after 20 min > -- > > Key: HDFS-13174 > URL: https://issues.apache.org/jira/browse/HDFS-13174 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2 >Reporter: Istvan Fajth >Assignee: Istvan Fajth >Priority: Major > Attachments: HDFS-13174.001.patch > > > In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source > class, that is checked during dispatching the moves that the Balancer and the > Mover does. This timeout is hardwired to 20 minutes. > In the Balancer we have iterations, and even if an iteration is timing out > the Balancer runs further and does an other iteration before it fails if > there were no moves happened in a few iterations. > The Mover on the other hand does not have iterations, so if moving a path > runs for more than 20 minutes, after 20 minutes Mover will stop with the > following exception reported to the console (lines might differ as this > exception came from a CDH5.12.1 installation): > java.io.IOException: Block move timed out > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405746#comment-16405746 ] Junping Du commented on HDFS-13174: --- drop fixed version as no patch get committed yet. > hdfs mover -p /path times out after 20 min > -- > > Key: HDFS-13174 > URL: https://issues.apache.org/jira/browse/HDFS-13174 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2 >Reporter: Istvan Fajth >Assignee: Istvan Fajth >Priority: Major > > In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source > class, that is checked during dispatching the moves that the Balancer and the > Mover does. This timeout is hardwired to 20 minutes. > In the Balancer we have iterations, and even if an iteration is timing out > the Balancer runs further and does an other iteration before it fails if > there were no moves happened in a few iterations. > The Mover on the other hand does not have iterations, so if moving a path > runs for more than 20 minutes, after 20 minutes Mover will stop with the > following exception reported to the console (lines might differ as this > exception came from a CDH5.12.1 installation): > java.io.IOException: Block move timed out > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org