[ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035122#comment-15035122
 ] 

Sidharta Seethana commented on YARN-4309:
-----------------------------------------

[~vvasudev] , I took a look at the patch. Couple of comments : 

* Could you clarify why the debugging information gathering in 
DockerContainerExecutor.writeLaunchEnv is not guarded by a config check? The 
new test you added uses DefaultContainerExecutor so it looks like this was 
missed. 
* There seem to be minor inconsistent line spacing issues in the new test 
function in TestContainerLaunch.java 

Apart from these, assuming it is safe to list user directory contents (as 
already discussed on this JIRA), the patch seems good to me.  Thanks for this 
patch - I expect the launch_container.sh copy to be particularly useful for 
debugging purposes.

> Add debug information to application logs when a container fails
> ----------------------------------------------------------------
>
>                 Key: YARN-4309
>                 URL: https://issues.apache.org/jira/browse/YARN-4309
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Varun Vasudev
>            Assignee: Varun Vasudev
>         Attachments: YARN-4309.001.patch, YARN-4309.002.patch, 
> YARN-4309.003.patch
>
>
> Sometimes when a container fails, it can be pretty hard to figure out why it 
> failed.
> My proposal is that if a container fails, we collect information about the 
> container local dir and dump it into the container log dir. Ideally, I'd like 
> to tar up the directory entirely, but I'm not sure of the security and space 
> implications of such a approach. At the very least, we can list all the files 
> in the container local dir, and dump the contents of launch_container.sh(into 
> the container log dir).
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to