[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.7.patch AM should implement Resync with the ApplicationMasterService instead of shutting down - Key: YARN-1366 URL: https://issues.apache.org/jira/browse/YARN-1366 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Rohith Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The AM behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047072#comment-14047072 ] Hadoop QA commented on YARN-1366: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653045/YARN-1366.7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4132//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4132//console This message is automatically generated. AM should implement Resync with the ApplicationMasterService instead of shutting down - Key: YARN-1366 URL: https://issues.apache.org/jira/browse/YARN-1366 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Rohith Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The AM behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2231) Provide feature to limit MRJob's stdout/stderr size
huozhanfeng created YARN-2231: - Summary: Provide feature to limit MRJob's stdout/stderr size Key: YARN-2231 URL: https://issues.apache.org/jira/browse/YARN-2231 Project: Hadoop YARN Issue Type: Improvement Components: log-aggregation, nodemanager Affects Versions: 2.3.0 Environment: CentOS release 5.8 (Final) Reporter: huozhanfeng When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild 10.106.24.108 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take effect. And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging NodeManager, I find when I set the BreakPoints at org.apache.hadoop.util.Shell(line 450:process = builder.start()) and org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line 161:ListString newCmds = new ArrayListString(command.size())) the cmd will work. I doubt there's concurrency problem caused with pipe shell will not perform properly. It matters, and I need your help. my email: huozhanf...@gmail.com thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2231) Provide feature to limit MRJob's stdout/stderr size
[ https://issues.apache.org/jira/browse/YARN-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huozhanfeng updated YARN-2231: -- Description: When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild 10.106.24.108 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take effect. And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging NodeManager, I find when I set the BreakPoints at org.apache.hadoop.util.Shell(line 450:process = builder.start()) and org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line 161:ListString newCmds = new ArrayListString(command.size())) the cmd will work. I doubt there's concurrency problem caused pipe shell will not perform properly. It matters, and I need your help. my email: huozhanf...@gmail.com thanks was: When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild 10.106.24.108 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take effect. And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging NodeManager, I find when I set the BreakPoints at org.apache.hadoop.util.Shell(line 450:process = builder.start()) and org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line 161:ListString newCmds = new ArrayListString(command.size())) the cmd will work. I doubt there's concurrency problem caused with pipe shell will not perform properly. It matters, and I need your help. my email: huozhanf...@gmail.com thanks Provide feature to limit MRJob's stdout/stderr size Key: YARN-2231 URL: https://issues.apache.org/jira/browse/YARN-2231 Project: Hadoop YARN Issue Type: Improvement Components: log-aggregation, nodemanager Affects Versions: 2.3.0 Environment: CentOS release 5.8 (Final) Reporter: huozhanfeng Labels: features When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild 10.106.24.108 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take
[jira] [Commented] (YARN-2231) Provide feature to limit MRJob's stdout/stderr size
[ https://issues.apache.org/jira/browse/YARN-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047081#comment-14047081 ] huozhanfeng commented on YARN-2231: --- Index: MapReduceChildJVM.java === --- MapReduceChildJVM.java (revision 1387) +++ MapReduceChildJVM.java (revision 1388) @@ -37,6 +37,7 @@ @SuppressWarnings(deprecation) public class MapReduceChildJVM { + private static final String tailCommand = tail; private static String getTaskLogFile(LogName filter) { return ApplicationConstants.LOG_DIR_EXPANSION_VAR + Path.SEPARATOR + filter.toString(); @@ -161,9 +162,12 @@ TaskAttemptID attemptID = task.getTaskID(); JobConf conf = task.conf; - +long logSize = TaskLog.getTaskLogLength(conf); + VectorString vargs = new VectorString(8); - +if(logSize 0){ + vargs.add((); +} vargs.add(Environment.JAVA_HOME.$() + /bin/java); // Add child (task) java-vm options. @@ -206,7 +210,6 @@ vargs.add(-Djava.io.tmpdir= + childTmpDir); // Setup the log4j prop -long logSize = TaskLog.getTaskLogLength(conf); setupLog4jProperties(task, vargs, logSize); if (conf.getProfileEnabled()) { @@ -229,8 +232,22 @@ // Finally add the jvmID vargs.add(String.valueOf(jvmID.getId())); -vargs.add(1 + getTaskLogFile(TaskLog.LogName.STDOUT)); -vargs.add(2 + getTaskLogFile(TaskLog.LogName.STDERR)); +if (logSize 0) { + vargs.add(|); +vargs.add(tailCommand); +vargs.add(-c); +vargs.add(String.valueOf(logSize)); +vargs.add(+getTaskLogFile(TaskLog.LogName.STDOUT)); +vargs.add(; exit $PIPESTATUS ) 21 | ); +vargs.add(tailCommand); +vargs.add(-c); +vargs.add(String.valueOf(logSize)); +vargs.add(+getTaskLogFile(TaskLog.LogName.STDERR)); +vargs.add(; exit $PIPESTATUS); + } else { + vargs.add(1 + getTaskLogFile(TaskLog.LogName.STDOUT)); + vargs.add(2 + getTaskLogFile(TaskLog.LogName.STDERR)); + } // Final commmand StringBuilder mergedCommand = new StringBuilder(); Provide feature to limit MRJob's stdout/stderr size Key: YARN-2231 URL: https://issues.apache.org/jira/browse/YARN-2231 Project: Hadoop YARN Issue Type: Improvement Components: log-aggregation, nodemanager Affects Versions: 2.3.0 Environment: CentOS release 5.8 (Final) Reporter: huozhanfeng Labels: features When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild 10.106.24.108 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take effect. And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging NodeManager, I find when I set the BreakPoints at org.apache.hadoop.util.Shell(line 450:process = builder.start()) and org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line 161:ListString newCmds = new ArrayListString(command.size())) the cmd will work. I doubt there's concurrency problem caused pipe shell will not perform properly. It matters, and I need your help. my email: huozhanf...@gmail.com thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2231) Provide feature to limit MRJob's stdout/stderr size
[ https://issues.apache.org/jira/browse/YARN-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huozhanfeng updated YARN-2231: -- Description: When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild ${test_IP} 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take effect. And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging NodeManager, I find when I set the BreakPoints at org.apache.hadoop.util.Shell(line 450:process = builder.start()) and org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line 161:ListString newCmds = new ArrayListString(command.size())) the cmd will work. I doubt there's concurrency problem caused pipe shell will not perform properly. It matters, and I need your help. my email: huozhanf...@gmail.com thanks was: When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild 10.106.24.108 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take effect. And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging NodeManager, I find when I set the BreakPoints at org.apache.hadoop.util.Shell(line 450:process = builder.start()) and org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line 161:ListString newCmds = new ArrayListString(command.size())) the cmd will work. I doubt there's concurrency problem caused pipe shell will not perform properly. It matters, and I need your help. my email: huozhanf...@gmail.com thanks Provide feature to limit MRJob's stdout/stderr size Key: YARN-2231 URL: https://issues.apache.org/jira/browse/YARN-2231 Project: Hadoop YARN Issue Type: Improvement Components: log-aggregation, nodemanager Affects Versions: 2.3.0 Environment: CentOS release 5.8 (Final) Reporter: huozhanfeng Labels: features When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild ${test_IP} 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take effect.
[jira] [Updated] (YARN-2231) Provide feature to limit MRJob's stdout/stderr size
[ https://issues.apache.org/jira/browse/YARN-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huozhanfeng updated YARN-2231: -- Description: When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild $test_IP 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take effect. And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging NodeManager, I find when I set the BreakPoints at org.apache.hadoop.util.Shell(line 450:process = builder.start()) and org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line 161:ListString newCmds = new ArrayListString(command.size())) the cmd will work. I doubt there's concurrency problem caused pipe shell will not perform properly. It matters, and I need your help. my email: huozhanf...@gmail.com thanks was: When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild ${test_IP} 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take effect. And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging NodeManager, I find when I set the BreakPoints at org.apache.hadoop.util.Shell(line 450:process = builder.start()) and org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line 161:ListString newCmds = new ArrayListString(command.size())) the cmd will work. I doubt there's concurrency problem caused pipe shell will not perform properly. It matters, and I need your help. my email: huozhanf...@gmail.com thanks Provide feature to limit MRJob's stdout/stderr size Key: YARN-2231 URL: https://issues.apache.org/jira/browse/YARN-2231 Project: Hadoop YARN Issue Type: Improvement Components: log-aggregation, nodemanager Affects Versions: 2.3.0 Environment: CentOS release 5.8 (Final) Reporter: huozhanfeng Labels: features When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild $test_IP 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take effect. And
[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047096#comment-14047096 ] Hudson commented on YARN-614: - FAILURE: Integrated in Hadoop-Yarn-trunk #598 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/598/]) YARN-614. Changed ResourceManager to not count disk failure, node loss and RM restart towards app failures. Contributed by Xuan Gong (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1606407) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java Separate AM failures from hardware failure or YARN error and do not count them to AM retry count Key: YARN-614 URL: https://issues.apache.org/jira/browse/YARN-614 Project: Hadoop YARN Issue Type: Improvement Reporter: Bikas Saha Assignee: Xuan Gong Fix For: 2.5.0 Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.13.patch, YARN-614.7.patch, YARN-614.8.patch, YARN-614.9.patch Attempts can fail due to a large number of user errors and they should not be retried unnecessarily. The only reason YARN should retry an attempt is when the hardware fails or YARN has an error. NM failing, lost NM and NM disk errors are the hardware errors that come to mind. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-679) add an entry point that can start any Yarn service
[ https://issues.apache.org/jira/browse/YARN-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-679: Attachment: YARN-679-003.patch Revised patch # uses reflection to load the HDFS and YARN configurations if present -so forcing in their resources # uses {{GenericOptionsParser}} to parse the options -so the command line is now consistent with ToolRunner. (There's one extra constraint -that all configs resolve to valid paths in the filesystem) # {{GenericOptionsParser}} adds a flag to indicate whether or not the parse worked...until now it looks like an invalid set of generic options could still get handed down to the tool add an entry point that can start any Yarn service -- Key: YARN-679 URL: https://issues.apache.org/jira/browse/YARN-679 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.4.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-679-001.patch, YARN-679-002.patch, YARN-679-002.patch, YARN-679-003.patch, org.apache.hadoop.servic...mon 3.0.0-SNAPSHOT API).pdf Time Spent: 72h Remaining Estimate: 0h There's no need to write separate .main classes for every Yarn service, given that the startup mechanism should be identical: create, init, start, wait for stopped -with an interrupt handler to trigger a clean shutdown on a control-c interrrupt. Provide one that takes any classname, and a list of config files/options -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used
[ https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047111#comment-14047111 ] Steve Loughran commented on YARN-2065: -- I'll try to run my code against this patch this week AM cannot create new containers after restart-NM token from previous attempt used - Key: YARN-2065 URL: https://issues.apache.org/jira/browse/YARN-2065 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Steve Loughran Assignee: Jian He Attachments: YARN-2065.1.patch Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot create new containers. The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it kills the AM, then kills a container while the AM is down, which triggers a reallocation of a container, leading to this failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-679) add an entry point that can start any Yarn service
[ https://issues.apache.org/jira/browse/YARN-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047118#comment-14047118 ] Hadoop QA commented on YARN-679: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653051/YARN-679-003.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 31 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1267 javac compiler warnings (more than the trunk's current 1258 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 5 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/4133//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.service.launcher.TestServiceLaunchedRunning org.apache.hadoop.service.launcher.TestServiceLaunchNoArgsAllowed {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4133//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4133//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4133//console This message is automatically generated. add an entry point that can start any Yarn service -- Key: YARN-679 URL: https://issues.apache.org/jira/browse/YARN-679 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.4.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-679-001.patch, YARN-679-002.patch, YARN-679-002.patch, YARN-679-003.patch, org.apache.hadoop.servic...mon 3.0.0-SNAPSHOT API).pdf Time Spent: 72h Remaining Estimate: 0h There's no need to write separate .main classes for every Yarn service, given that the startup mechanism should be identical: create, init, start, wait for stopped -with an interrupt handler to trigger a clean shutdown on a control-c interrrupt. Provide one that takes any classname, and a list of config files/options -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047127#comment-14047127 ] Hudson commented on YARN-614: - FAILURE: Integrated in Hadoop-Hdfs-trunk #1789 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1789/]) YARN-614. Changed ResourceManager to not count disk failure, node loss and RM restart towards app failures. Contributed by Xuan Gong (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1606407) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java Separate AM failures from hardware failure or YARN error and do not count them to AM retry count Key: YARN-614 URL: https://issues.apache.org/jira/browse/YARN-614 Project: Hadoop YARN Issue Type: Improvement Reporter: Bikas Saha Assignee: Xuan Gong Fix For: 2.5.0 Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.13.patch, YARN-614.7.patch, YARN-614.8.patch, YARN-614.9.patch Attempts can fail due to a large number of user errors and they should not be retried unnecessarily. The only reason YARN should retry an attempt is when the hardware fails or YARN has an error. NM failing, lost NM and NM disk errors are the hardware errors that come to mind. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047137#comment-14047137 ] Hudson commented on YARN-614: - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1816 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1816/]) YARN-614. Changed ResourceManager to not count disk failure, node loss and RM restart towards app failures. Contributed by Xuan Gong (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1606407) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java Separate AM failures from hardware failure or YARN error and do not count them to AM retry count Key: YARN-614 URL: https://issues.apache.org/jira/browse/YARN-614 Project: Hadoop YARN Issue Type: Improvement Reporter: Bikas Saha Assignee: Xuan Gong Fix For: 2.5.0 Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.13.patch, YARN-614.7.patch, YARN-614.8.patch, YARN-614.9.patch Attempts can fail due to a large number of user errors and they should not be retried unnecessarily. The only reason YARN should retry an attempt is when the hardware fails or YARN has an error. NM failing, lost NM and NM disk errors are the hardware errors that come to mind. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used
[ https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2065: - Attachment: YARN-2065-002.patch This is the previous patch, in sync with trunk AM cannot create new containers after restart-NM token from previous attempt used - Key: YARN-2065 URL: https://issues.apache.org/jira/browse/YARN-2065 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Steve Loughran Assignee: Jian He Attachments: YARN-2065-002.patch, YARN-2065.1.patch Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot create new containers. The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it kills the AM, then kills a container while the AM is down, which triggers a reallocation of a container, leading to this failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used
[ https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047171#comment-14047171 ] Hadoop QA commented on YARN-2065: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653066/YARN-2065-002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4134//console This message is automatically generated. AM cannot create new containers after restart-NM token from previous attempt used - Key: YARN-2065 URL: https://issues.apache.org/jira/browse/YARN-2065 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Steve Loughran Assignee: Jian He Attachments: YARN-2065-002.patch, YARN-2065.1.patch Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot create new containers. The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it kills the AM, then kills a container while the AM is down, which triggers a reallocation of a container, leading to this failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047202#comment-14047202 ] Hudson commented on YARN-2052: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5799 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5799/]) YARN-2052. Embedded an epoch number in container id to ensure the uniqueness of container id after RM restarts. Contributed by Tsuyoshi OZAWA (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1606557) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/Epoch.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/EpochPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerApplicationAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestMaxRunningAppsEnforcer.java ContainerId creation after work preserving restart is broken Key: YARN-2052 URL: https://issues.apache.org/jira/browse/YARN-2052 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047207#comment-14047207 ] Vinod Kumar Vavilapalli commented on YARN-2052: --- Shouldn't epoch have a default value in the proto file? What is the default if it isn't provided? Thinking from backwards compatibility point of view.. ContainerId creation after work preserving restart is broken Key: YARN-2052 URL: https://issues.apache.org/jira/browse/YARN-2052 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.5.0 Attachments: YARN-2052.1.patch, YARN-2052.10.patch, YARN-2052.11.patch, YARN-2052.12.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch Container ids are made unique by using the app identifier and appending a monotonically increasing sequence number to it. Since container creation is a high churn activity the RM does not store the sequence number per app. So after restart it does not know what the new sequence number should be for new allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047210#comment-14047210 ] Jian He commented on YARN-2052: --- bq. For numeric types, the default value is zero. Copied from protobuffer guide. In this case, it should be fine. we can explicitly add too if needed. ContainerId creation after work preserving restart is broken Key: YARN-2052 URL: https://issues.apache.org/jira/browse/YARN-2052 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.5.0 Attachments: YARN-2052.1.patch, YARN-2052.10.patch, YARN-2052.11.patch, YARN-2052.12.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch Container ids are made unique by using the app identifier and appending a monotonically increasing sequence number to it. Since container creation is a high churn activity the RM does not store the sequence number per app. So after restart it does not know what the new sequence number should be for new allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status
[ https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anders updated YARN-2142: - Attachment: trust002.patch modified the xml file,for testing Add one service to check the nodes' TRUST status - Key: YARN-2142 URL: https://issues.apache.org/jira/browse/YARN-2142 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager, scheduler Affects Versions: 2.2.0 Environment: OS:Ubuntu 13.04; JAVA:OpenJDK 7u51-2.4.4-0 Reporter: anders Priority: Minor Labels: patch Fix For: 2.2.0 Attachments: test.patch, trust.patch, trust.patch, trust.patch, trust001.patch, trust002.patch Original Estimate: 1m Remaining Estimate: 1m Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's health checkservice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2142) Add one service to check the nodes' TRUST status
[ https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047348#comment-14047348 ] Hadoop QA commented on YARN-2142: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653086/trust002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color}. The applied patch generated 1266 javac compiler warnings (more than the trunk's current 1258 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4135//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4135//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4135//console This message is automatically generated. Add one service to check the nodes' TRUST status - Key: YARN-2142 URL: https://issues.apache.org/jira/browse/YARN-2142 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager, scheduler Affects Versions: 2.2.0 Environment: OS:Ubuntu 13.04; JAVA:OpenJDK 7u51-2.4.4-0 Reporter: anders Priority: Minor Labels: patch Fix For: 2.2.0 Attachments: test.patch, trust.patch, trust.patch, trust.patch, trust001.patch, trust002.patch Original Estimate: 1m Remaining Estimate: 1m Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's health checkservice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2181) Add preemption info to RM Web UI
[ https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2181: - Description: We need add preemption info to RM web page to make administrator/user get more understanding about preemption happened on app, etc. (was: We need add preemption info to RM web page to make administrator/user get more understanding about preemption happened on app/queue, etc. ) Add preemption info to RM Web UI Key: YARN-2181 URL: https://issues.apache.org/jira/browse/YARN-2181 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page.png, queue page.png We need add preemption info to RM web page to make administrator/user get more understanding about preemption happened on app, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI
[ https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047353#comment-14047353 ] Wangda Tan commented on YARN-2181: -- The previous comment, bq. We can address preemption info in separated JIRA. should be We can address preemption info *of queues* in separated JIRA. Add preemption info to RM Web UI Key: YARN-2181 URL: https://issues.apache.org/jira/browse/YARN-2181 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page.png, queue page.png We need add preemption info to RM web page to make administrator/user get more understanding about preemption happened on app, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2181) Add preemption info to RM Web UI
[ https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2181: - Attachment: YARN-2181.patch Offline discussed with [~jianhe], we decided to remove queue metrics from RM web UI because we cannot make metrics info consistent on queue page / app page, (it is possible that sum of preempted resource from apps under a queue is not equal to preempted resource in a queue). We can address preemption info in separated JIRA. Add preemption info to RM Web UI Key: YARN-2181 URL: https://issues.apache.org/jira/browse/YARN-2181 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page.png, queue page.png We need add preemption info to RM web page to make administrator/user get more understanding about preemption happened on app, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI
[ https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047354#comment-14047354 ] Hadoop QA commented on YARN-2181: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653089/YARN-2181.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4136//console This message is automatically generated. Add preemption info to RM Web UI Key: YARN-2181 URL: https://issues.apache.org/jira/browse/YARN-2181 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page.png, queue page.png We need add preemption info to RM web page to make administrator/user get more understanding about preemption happened on app, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2181) Add preemption info to RM Web UI
[ https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2181: - Attachment: YARN-2181.patch Rebased against trunk Add preemption info to RM Web UI Key: YARN-2181 URL: https://issues.apache.org/jira/browse/YARN-2181 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page.png, queue page.png We need add preemption info to RM web page to make administrator/user get more understanding about preemption happened on app, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2209) Replace allocate#resync command with ApplicationMasterNotRegisteredException to indicate AM to re-register on RM restart
[ https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2209: -- Attachment: YARN-2209.1.patch Patch to replace the AM_RESYNC command with the ApplicationMasterNotRegisteredException. and AM_SHUTDOWN command with ApplicationNotFoundException. Replace allocate#resync command with ApplicationMasterNotRegisteredException to indicate AM to re-register on RM restart Key: YARN-2209 URL: https://issues.apache.org/jira/browse/YARN-2209 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Attachments: YARN-2209.1.patch YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate application to re-register on RM restart. we should do the same for AMS#allocate call also. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047366#comment-14047366 ] Jian He commented on YARN-1366: --- Thanks for working on the patch ! some comments: - isApplicationMasterRegistered is actually not an argument, may be throw ApplicationMasterNotRegsiteredException in this case ? {code} Preconditions.checkArgument(isApplicationMasterRegistered, Application Master is trying to unregister before registering.); {code} - pom.xml format: use spaces instead of tabs {code} +dependency + groupIdorg.apache.hadoop/groupId + artifactIdhadoop-yarn-common/artifactId + typetest-jar/type + scopetest/scope + /dependency {code} - testAMRMClientResendsRequestsOnRMRestart seems not testing re-sending pendingReleases across RM restart, because the pending releases seems already decremented to zero before restart happens. - Not related to this jira. Current ApplicationMasterService does not allow multiple registers. Application may want to update its tracking url etc. Should we make AMS accept multiple registers ? {code} Preconditions.checkArgument(!isApplicationMasterRegistered, ApplicationMaster is already registered); {code} AM should implement Resync with the ApplicationMasterService instead of shutting down - Key: YARN-1366 URL: https://issues.apache.org/jira/browse/YARN-1366 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Rohith Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The AM behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status
[ https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anders updated YARN-2142: - Component/s: webapp Add one service to check the nodes' TRUST status - Key: YARN-2142 URL: https://issues.apache.org/jira/browse/YARN-2142 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager, scheduler, webapp Affects Versions: 2.2.0 Environment: OS:Ubuntu 13.04; JAVA:OpenJDK 7u51-2.4.4-0 Reporter: anders Priority: Minor Labels: patch Fix For: 2.2.0 Attachments: test.patch, trust.patch, trust.patch, trust.patch, trust001.patch, trust002.patch Original Estimate: 1m Remaining Estimate: 1m Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's health checkservice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI
[ https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047384#comment-14047384 ] Hadoop QA commented on YARN-2181: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653092/YARN-2181.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4137//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4137//console This message is automatically generated. Add preemption info to RM Web UI Key: YARN-2181 URL: https://issues.apache.org/jira/browse/YARN-2181 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page.png, queue page.png We need add preemption info to RM web page to make administrator/user get more understanding about preemption happened on app, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2209) Replace allocate#resync command with ApplicationMasterNotRegisteredException to indicate AM to re-register on RM restart
[ https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047402#comment-14047402 ] Hadoop QA commented on YARN-2209: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653095/YARN-2209.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1261 javac compiler warnings (more than the trunk's current 1258 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4138//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4138//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4138//console This message is automatically generated. Replace allocate#resync command with ApplicationMasterNotRegisteredException to indicate AM to re-register on RM restart Key: YARN-2209 URL: https://issues.apache.org/jira/browse/YARN-2209 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Attachments: YARN-2209.1.patch YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate application to re-register on RM restart. we should do the same for AMS#allocate call also. -- This message was sent by Atlassian JIRA (v6.2#6252)