huozhanfeng created YARN-2231:
---------------------------------
Summary: Provide feature to limit MRJob's stdout/stderr size
Key: YARN-2231
URL: https://issues.apache.org/jira/browse/YARN-2231
Project: Hadoop YARN
Issue Type: Improvement
Components: log-aggregation, nodemanager
Affects Versions: 2.3.0
Environment: CentOS release 5.8 (Final)
Reporter: huozhanfeng
When a MRJob print too much stdout or stderr log, the disk will be filled. Now
it has influence our platform management.
I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come
[email protected]) to generate the execute cmd
as follows:
exec /bin/bash -c "( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true
-Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002
-Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA
org.apache.hadoop.mapred.YarnChild 10.106.24.108 53911
attempt_1403930653208_0003_m_000000_0 2 | tail -c 102
>/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stdout
; exit $PIPESTATUS ) 2>&1 | tail -c 10240
>/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stderr
; exit $PIPESTATUS "
But it doesn't take effect.
And then, when I use "export YARN_NODEMANAGER_OPTS=-Xdebug
-Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y" for debuging
NodeManager, I find when I set the BreakPoints at
org.apache.hadoop.util.Shell(line 450:process = builder.start()) and
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line
161:List<String> newCmds = new ArrayList<String>(command.size())) the cmd will
work.
I doubt there's concurrency problem caused with pipe shell will not perform
properly. It matters, and I need your help.
my email: [email protected]
thanks
--
This message was sent by Atlassian JIRA
(v6.2#6252)