Varun Vasudev updated YARN-4309:
    Attachment: YARN-4309.006.patch

Uploaded a new patch to address [~sidharta-s]'s comments.

[~leftnoteasy] - 
bq. Since debug information fetch script (like copy script and list files) is 
at the end of launch_container.sh, is it possible that a container is killed so 
such script cannot be executed?

It's not at the end - it's just before the actually container process is 
launched so if we reach a stage where we are ready to call launch_container.sh 
it should almost always be run. This is what the relevant lines from 
launch_container.sh look like with the patch:

echo "broken symlinks(find -L . -maxdepth 5 -type l -ls):" 
find -L . -maxdepth 5 -type l -ls 
exec /bin/bash -c "$JAVA_HOME/bin/java -Djava.io.tmpdir=$PWD/tmp 
 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA 
-Dhadoop.root.logfile=syslog  -Xmx1024m 

> Add debug information to application logs when a container fails
> ----------------------------------------------------------------
>                 Key: YARN-4309
>                 URL: https://issues.apache.org/jira/browse/YARN-4309
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Varun Vasudev
>            Assignee: Varun Vasudev
>         Attachments: YARN-4309.001.patch, YARN-4309.002.patch, 
> YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch, 
> YARN-4309.006.patch
> Sometimes when a container fails, it can be pretty hard to figure out why it 
> failed.
> My proposal is that if a container fails, we collect information about the 
> container local dir and dump it into the container log dir. Ideally, I'd like 
> to tar up the directory entirely, but I'm not sure of the security and space 
> implications of such a approach. At the very least, we can list all the files 
> in the container local dir, and dump the contents of launch_container.sh(into 
> the container log dir).
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such failures much easier.

This message was sent by Atlassian JIRA

Reply via email to