[jira] [Updated] (YARN-2231) Provide feature to limit MRJob's stdout/stderr size

2014-06-29 Thread huozhanfeng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huozhanfeng updated YARN-2231:
--

Description: 
When a MRJob print too much stdout or stderr log, the disk will be filled. Now 
it has influence our platform management.

I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come 
from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd
as follows:
exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true 
-Dhadoop.metrics.log.level=WARN  -Xmx1024m -Djava.io.tmpdir=$PWD/tmp 
-Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02
 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA 
org.apache.hadoop.mapred.YarnChild 10.106.24.108 53911 
attempt_1403930653208_0003_m_00_0 2 | tail -c 102 
/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout
 ; exit $PIPESTATUS ) 21 |  tail -c 10240 
/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr
 ; exit $PIPESTATUS 

But it doesn't take effect.

And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug 
-Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging 
NodeManager, I find when I set the BreakPoints at 
org.apache.hadoop.util.Shell(line 450:process = builder.start()) and 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line
 161:ListString newCmds = new ArrayListString(command.size())) the cmd will 
work.

I doubt there's concurrency problem caused  pipe shell will not perform 
properly. It matters, and I need your help.

my email: huozhanf...@gmail.com

thanks

  was:
When a MRJob print too much stdout or stderr log, the disk will be filled. Now 
it has influence our platform management.

I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come 
from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd
as follows:
exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true 
-Dhadoop.metrics.log.level=WARN  -Xmx1024m -Djava.io.tmpdir=$PWD/tmp 
-Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02
 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA 
org.apache.hadoop.mapred.YarnChild 10.106.24.108 53911 
attempt_1403930653208_0003_m_00_0 2 | tail -c 102 
/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout
 ; exit $PIPESTATUS ) 21 |  tail -c 10240 
/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr
 ; exit $PIPESTATUS 

But it doesn't take effect.

And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug 
-Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging 
NodeManager, I find when I set the BreakPoints at 
org.apache.hadoop.util.Shell(line 450:process = builder.start()) and 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line
 161:ListString newCmds = new ArrayListString(command.size())) the cmd will 
work.

I doubt there's concurrency problem caused with pipe shell will not perform 
properly. It matters, and I need your help.

my email: huozhanf...@gmail.com

thanks


 Provide feature  to limit MRJob's stdout/stderr size
 

 Key: YARN-2231
 URL: https://issues.apache.org/jira/browse/YARN-2231
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: log-aggregation, nodemanager
Affects Versions: 2.3.0
 Environment: CentOS release 5.8 (Final)
Reporter: huozhanfeng
  Labels: features

 When a MRJob print too much stdout or stderr log, the disk will be filled. 
 Now it has influence our platform management.
 I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come 
 from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd
 as follows:
 exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true 
 -Dhadoop.metrics.log.level=WARN  -Xmx1024m -Djava.io.tmpdir=$PWD/tmp 
 -Dlog4j.configuration=container-log4j.properties 
 -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02
  -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA 
 org.apache.hadoop.mapred.YarnChild 10.106.24.108 53911 
 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 
 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout
  ; exit $PIPESTATUS ) 21 |  tail -c 10240 
 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr
  ; exit $PIPESTATUS 
 But it doesn't take 

[jira] [Updated] (YARN-2231) Provide feature to limit MRJob's stdout/stderr size

2014-06-29 Thread huozhanfeng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huozhanfeng updated YARN-2231:
--

Description: 
When a MRJob print too much stdout or stderr log, the disk will be filled. Now 
it has influence our platform management.

I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come 
from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd
as follows:
exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true 
-Dhadoop.metrics.log.level=WARN  -Xmx1024m -Djava.io.tmpdir=$PWD/tmp 
-Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02
 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA 
org.apache.hadoop.mapred.YarnChild ${test_IP} 53911 
attempt_1403930653208_0003_m_00_0 2 | tail -c 102 
/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout
 ; exit $PIPESTATUS ) 21 |  tail -c 10240 
/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr
 ; exit $PIPESTATUS 

But it doesn't take effect.

And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug 
-Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging 
NodeManager, I find when I set the BreakPoints at 
org.apache.hadoop.util.Shell(line 450:process = builder.start()) and 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line
 161:ListString newCmds = new ArrayListString(command.size())) the cmd will 
work.

I doubt there's concurrency problem caused  pipe shell will not perform 
properly. It matters, and I need your help.

my email: huozhanf...@gmail.com

thanks

  was:
When a MRJob print too much stdout or stderr log, the disk will be filled. Now 
it has influence our platform management.

I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come 
from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd
as follows:
exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true 
-Dhadoop.metrics.log.level=WARN  -Xmx1024m -Djava.io.tmpdir=$PWD/tmp 
-Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02
 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA 
org.apache.hadoop.mapred.YarnChild 10.106.24.108 53911 
attempt_1403930653208_0003_m_00_0 2 | tail -c 102 
/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout
 ; exit $PIPESTATUS ) 21 |  tail -c 10240 
/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr
 ; exit $PIPESTATUS 

But it doesn't take effect.

And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug 
-Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging 
NodeManager, I find when I set the BreakPoints at 
org.apache.hadoop.util.Shell(line 450:process = builder.start()) and 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line
 161:ListString newCmds = new ArrayListString(command.size())) the cmd will 
work.

I doubt there's concurrency problem caused  pipe shell will not perform 
properly. It matters, and I need your help.

my email: huozhanf...@gmail.com

thanks


 Provide feature  to limit MRJob's stdout/stderr size
 

 Key: YARN-2231
 URL: https://issues.apache.org/jira/browse/YARN-2231
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: log-aggregation, nodemanager
Affects Versions: 2.3.0
 Environment: CentOS release 5.8 (Final)
Reporter: huozhanfeng
  Labels: features

 When a MRJob print too much stdout or stderr log, the disk will be filled. 
 Now it has influence our platform management.
 I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come 
 from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd
 as follows:
 exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true 
 -Dhadoop.metrics.log.level=WARN  -Xmx1024m -Djava.io.tmpdir=$PWD/tmp 
 -Dlog4j.configuration=container-log4j.properties 
 -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02
  -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA 
 org.apache.hadoop.mapred.YarnChild ${test_IP} 53911 
 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 
 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout
  ; exit $PIPESTATUS ) 21 |  tail -c 10240 
 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr
  ; exit $PIPESTATUS 
 But it doesn't take effect.
 

[jira] [Updated] (YARN-2231) Provide feature to limit MRJob's stdout/stderr size

2014-06-29 Thread huozhanfeng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huozhanfeng updated YARN-2231:
--

Description: 
When a MRJob print too much stdout or stderr log, the disk will be filled. Now 
it has influence our platform management.

I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come 
from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd
as follows:
exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true 
-Dhadoop.metrics.log.level=WARN  -Xmx1024m -Djava.io.tmpdir=$PWD/tmp 
-Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02
 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA 
org.apache.hadoop.mapred.YarnChild $test_IP 53911 
attempt_1403930653208_0003_m_00_0 2 | tail -c 102 
/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout
 ; exit $PIPESTATUS ) 21 |  tail -c 10240 
/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr
 ; exit $PIPESTATUS 

But it doesn't take effect.

And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug 
-Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging 
NodeManager, I find when I set the BreakPoints at 
org.apache.hadoop.util.Shell(line 450:process = builder.start()) and 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line
 161:ListString newCmds = new ArrayListString(command.size())) the cmd will 
work.

I doubt there's concurrency problem caused  pipe shell will not perform 
properly. It matters, and I need your help.

my email: huozhanf...@gmail.com

thanks

  was:
When a MRJob print too much stdout or stderr log, the disk will be filled. Now 
it has influence our platform management.

I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come 
from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd
as follows:
exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true 
-Dhadoop.metrics.log.level=WARN  -Xmx1024m -Djava.io.tmpdir=$PWD/tmp 
-Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02
 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA 
org.apache.hadoop.mapred.YarnChild ${test_IP} 53911 
attempt_1403930653208_0003_m_00_0 2 | tail -c 102 
/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout
 ; exit $PIPESTATUS ) 21 |  tail -c 10240 
/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr
 ; exit $PIPESTATUS 

But it doesn't take effect.

And then, when I use export YARN_NODEMANAGER_OPTS=-Xdebug 
-Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y for debuging 
NodeManager, I find when I set the BreakPoints at 
org.apache.hadoop.util.Shell(line 450:process = builder.start()) and 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line
 161:ListString newCmds = new ArrayListString(command.size())) the cmd will 
work.

I doubt there's concurrency problem caused  pipe shell will not perform 
properly. It matters, and I need your help.

my email: huozhanf...@gmail.com

thanks


 Provide feature  to limit MRJob's stdout/stderr size
 

 Key: YARN-2231
 URL: https://issues.apache.org/jira/browse/YARN-2231
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: log-aggregation, nodemanager
Affects Versions: 2.3.0
 Environment: CentOS release 5.8 (Final)
Reporter: huozhanfeng
  Labels: features

 When a MRJob print too much stdout or stderr log, the disk will be filled. 
 Now it has influence our platform management.
 I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come 
 from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd
 as follows:
 exec /bin/bash -c ( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true 
 -Dhadoop.metrics.log.level=WARN  -Xmx1024m -Djava.io.tmpdir=$PWD/tmp 
 -Dlog4j.configuration=container-log4j.properties 
 -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02
  -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA 
 org.apache.hadoop.mapred.YarnChild $test_IP 53911 
 attempt_1403930653208_0003_m_00_0 2 | tail -c 102 
 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stdout
  ; exit $PIPESTATUS ) 21 |  tail -c 10240 
 /logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_02/stderr
  ; exit $PIPESTATUS 
 But it doesn't take effect.
 And