[jira] [Created] (YARN-2231) Provide feature to limit MRJob's stdout/stderr size
huozhanfeng created YARN-2231: - Summary: Provide feature to limit MRJob's stdout/stderr size Key: YARN-2231 URL: https://issues.apache.org/jira/browse/YARN-2231 Project: Hadoop YARN Issue Type: Improvement Components: log-aggregation, nodemanager Affects Versions: 2.3.0 Environment: CentOS release 5.8 (Final) Reporter: huozhanfeng When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild 10.106.24.108 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take effect. And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging NodeManager, I find when I set the BreakPoints at org.apache.hadoop.util.Shell(line 450:process = builder.start()) and org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line 161:ListString newCmds = new ArrayListString(command.size())) the cmd will work. I doubt there's concurrency problem caused with pipe shell will not perform properly. It matters, and I need your help. my email: huozhanf...@gmail.com thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2231) Provide feature to limit MRJob's stdout/stderr size
[ https://issues.apache.org/jira/browse/YARN-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huozhanfeng updated YARN-2231: -- Description: When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild 10.106.24.108 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take effect. And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging NodeManager, I find when I set the BreakPoints at org.apache.hadoop.util.Shell(line 450:process = builder.start()) and org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line 161:ListString newCmds = new ArrayListString(command.size())) the cmd will work. I doubt there's concurrency problem caused pipe shell will not perform properly. It matters, and I need your help. my email: huozhanf...@gmail.com thanks was: When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild 10.106.24.108 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take effect. And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging NodeManager, I find when I set the BreakPoints at org.apache.hadoop.util.Shell(line 450:process = builder.start()) and org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line 161:ListString newCmds = new ArrayListString(command.size())) the cmd will work. I doubt there's concurrency problem caused with pipe shell will not perform properly. It matters, and I need your help. my email: huozhanf...@gmail.com thanks Provide feature to limit MRJob's stdout/stderr size Key: YARN-2231 URL: https://issues.apache.org/jira/browse/YARN-2231 Project: Hadoop YARN Issue Type: Improvement Components: log-aggregation, nodemanager Affects Versions: 2.3.0 Environment: CentOS release 5.8 (Final) Reporter: huozhanfeng Labels: features When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild 10.106.24.108 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take
[jira] [Commented] (YARN-2231) Provide feature to limit MRJob's stdout/stderr size
[ https://issues.apache.org/jira/browse/YARN-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047081#comment-14047081 ] huozhanfeng commented on YARN-2231: --- Index: MapReduceChildJVM.java === --- MapReduceChildJVM.java (revision 1387) +++ MapReduceChildJVM.java (revision 1388) @@ -37,6 +37,7 @@ @SuppressWarnings(deprecation) public class MapReduceChildJVM { + private static final String tailCommand = tail; private static String getTaskLogFile(LogName filter) { return ApplicationConstants.LOG_DIR_EXPANSION_VAR + Path.SEPARATOR + filter.toString(); @@ -161,9 +162,12 @@ TaskAttemptID attemptID = task.getTaskID(); JobConf conf = task.conf; - +long logSize = TaskLog.getTaskLogLength(conf); + VectorString vargs = new VectorString(8); - +if(logSize 0){ + vargs.add((); +} vargs.add(Environment.JAVA_HOME.$() + /bin/java); // Add child (task) java-vm options. @@ -206,7 +210,6 @@ vargs.add(-Djava.io.tmpdir= + childTmpDir); // Setup the log4j prop -long logSize = TaskLog.getTaskLogLength(conf); setupLog4jProperties(task, vargs, logSize); if (conf.getProfileEnabled()) { @@ -229,8 +232,22 @@ // Finally add the jvmID vargs.add(String.valueOf(jvmID.getId())); -vargs.add(1 + getTaskLogFile(TaskLog.LogName.STDOUT)); -vargs.add(2 + getTaskLogFile(TaskLog.LogName.STDERR)); +if (logSize 0) { + vargs.add(|); +vargs.add(tailCommand); +vargs.add(-c); +vargs.add(String.valueOf(logSize)); +vargs.add(+getTaskLogFile(TaskLog.LogName.STDOUT)); +vargs.add(; exit $PIPESTATUS ) 21 | ); +vargs.add(tailCommand); +vargs.add(-c); +vargs.add(String.valueOf(logSize)); +vargs.add(+getTaskLogFile(TaskLog.LogName.STDERR)); +vargs.add(; exit $PIPESTATUS); + } else { + vargs.add(1 + getTaskLogFile(TaskLog.LogName.STDOUT)); + vargs.add(2 + getTaskLogFile(TaskLog.LogName.STDERR)); + } // Final commmand StringBuilder mergedCommand = new StringBuilder(); Provide feature to limit MRJob's stdout/stderr size Key: YARN-2231 URL: https://issues.apache.org/jira/browse/YARN-2231 Project: Hadoop YARN Issue Type: Improvement Components: log-aggregation, nodemanager Affects Versions: 2.3.0 Environment: CentOS release 5.8 (Final) Reporter: huozhanfeng Labels: features When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild 10.106.24.108 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take effect. And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging NodeManager, I find when I set the BreakPoints at org.apache.hadoop.util.Shell(line 450:process = builder.start()) and org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line 161:ListString newCmds = new ArrayListString(command.size())) the cmd will work. I doubt there's concurrency problem caused pipe shell will not perform properly. It matters, and I need your help. my email: huozhanf...@gmail.com thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2231) Provide feature to limit MRJob's stdout/stderr size
[ https://issues.apache.org/jira/browse/YARN-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huozhanfeng updated YARN-2231: -- Description: When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild ${test_IP} 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take effect. And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging NodeManager, I find when I set the BreakPoints at org.apache.hadoop.util.Shell(line 450:process = builder.start()) and org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line 161:ListString newCmds = new ArrayListString(command.size())) the cmd will work. I doubt there's concurrency problem caused pipe shell will not perform properly. It matters, and I need your help. my email: huozhanf...@gmail.com thanks was: When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild 10.106.24.108 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take effect. And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging NodeManager, I find when I set the BreakPoints at org.apache.hadoop.util.Shell(line 450:process = builder.start()) and org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line 161:ListString newCmds = new ArrayListString(command.size())) the cmd will work. I doubt there's concurrency problem caused pipe shell will not perform properly. It matters, and I need your help. my email: huozhanf...@gmail.com thanks Provide feature to limit MRJob's stdout/stderr size Key: YARN-2231 URL: https://issues.apache.org/jira/browse/YARN-2231 Project: Hadoop YARN Issue Type: Improvement Components: log-aggregation, nodemanager Affects Versions: 2.3.0 Environment: CentOS release 5.8 (Final) Reporter: huozhanfeng Labels: features When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild ${test_IP} 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take effect.
[jira] [Updated] (YARN-2231) Provide feature to limit MRJob's stdout/stderr size
[ https://issues.apache.org/jira/browse/YARN-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huozhanfeng updated YARN-2231: -- Description: When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild $test_IP 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take effect. And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging NodeManager, I find when I set the BreakPoints at org.apache.hadoop.util.Shell(line 450:process = builder.start()) and org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line 161:ListString newCmds = new ArrayListString(command.size())) the cmd will work. I doubt there's concurrency problem caused pipe shell will not perform properly. It matters, and I need your help. my email: huozhanf...@gmail.com thanks was: When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild ${test_IP} 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take effect. And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging NodeManager, I find when I set the BreakPoints at org.apache.hadoop.util.Shell(line 450:process = builder.start()) and org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line 161:ListString newCmds = new ArrayListString(command.size())) the cmd will work. I doubt there's concurrency problem caused pipe shell will not perform properly. It matters, and I need your help. my email: huozhanf...@gmail.com thanks Provide feature to limit MRJob's stdout/stderr size Key: YARN-2231 URL: https://issues.apache.org/jira/browse/YARN-2231 Project: Hadoop YARN Issue Type: Improvement Components: log-aggregation, nodemanager Affects Versions: 2.3.0 Environment: CentOS release 5.8 (Final) Reporter: huozhanfeng Labels: features When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild $test_IP 53911 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout ; exit $PIPESTATUS ) 21 | tail -c 10240 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr ; exit $PIPESTATUS But it doesn't take effect. And