[ https://issues.apache.org/jira/browse/YARN-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
huozhanfeng updated YARN-2231: ------------------------------ Description: When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c "( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild $test_IP 53911 attempt_1403930653208_0003_m_000000_0 2 | tail -c 102 >/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stdout ; exit $PIPESTATUS ) 2>&1 | tail -c 10240 >/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stderr ; exit $PIPESTATUS " But it doesn't take effect. And then, when I use "export YARN_NODEMANAGER_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y" for debuging NodeManager, I find when I set the BreakPoints at org.apache.hadoop.util.Shell(line 450:process = builder.start()) and org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line 161:List<String> newCmds = new ArrayList<String>(command.size())) the cmd will work. I doubt there's concurrency problem caused pipe shell will not perform properly. It matters, and I need your help. my email: huozhanf...@gmail.com thanks was: When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows: exec /bin/bash -c "( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild ${test_IP} 53911 attempt_1403930653208_0003_m_000000_0 2 | tail -c 102 >/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stdout ; exit $PIPESTATUS ) 2>&1 | tail -c 10240 >/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stderr ; exit $PIPESTATUS " But it doesn't take effect. And then, when I use "export YARN_NODEMANAGER_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y" for debuging NodeManager, I find when I set the BreakPoints at org.apache.hadoop.util.Shell(line 450:process = builder.start()) and org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line 161:List<String> newCmds = new ArrayList<String>(command.size())) the cmd will work. I doubt there's concurrency problem caused pipe shell will not perform properly. It matters, and I need your help. my email: huozhanf...@gmail.com thanks > Provide feature to limit MRJob's stdout/stderr size > ---------------------------------------------------- > > Key: YARN-2231 > URL: https://issues.apache.org/jira/browse/YARN-2231 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, nodemanager > Affects Versions: 2.3.0 > Environment: CentOS release 5.8 (Final) > Reporter: huozhanfeng > Labels: features > > When a MRJob print too much stdout or stderr log, the disk will be filled. > Now it has influence our platform management. > I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come > from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd > as follows: > exec /bin/bash -c "( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true > -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp > -Dlog4j.configuration=container-log4j.properties > -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002 > -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA > org.apache.hadoop.mapred.YarnChild $test_IP 53911 > attempt_1403930653208_0003_m_000000_0 2 | tail -c 102 > >/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stdout > ; exit $PIPESTATUS ) 2>&1 | tail -c 10240 > >/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stderr > ; exit $PIPESTATUS " > But it doesn't take effect. > And then, when I use "export YARN_NODEMANAGER_OPTS=-Xdebug > -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y" for debuging > NodeManager, I find when I set the BreakPoints at > org.apache.hadoop.util.Shell(line 450:process = builder.start()) and > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line > 161:List<String> newCmds = new ArrayList<String>(command.size())) the cmd > will work. > I doubt there's concurrency problem caused pipe shell will not perform > properly. It matters, and I need your help. > my email: huozhanf...@gmail.com > thanks -- This message was sent by Atlassian JIRA (v6.2#6252)