Jason Lowe commented on YARN-4309:

bq.  It's definitely private-to-the-user data though and full of information 
Yeah, agree with Allen that it's dicey to publish everything there.  For 
example, the security tokens for the container are stored in one of the local 
container files, and we do not want that stored in HDFS and accessible by the 
jobhistoryserver user nor the ATS user.  The nodemanager goes out of its way, 
via the container-executor, to make sure user-private files are not visible 
even to the nodemanager user.

The launch script should be OK and is really the most valuable thing there for 
debugging startup failures.  Almost everything in that script is derived from 
what's in the configs, and the configs are already stored in HDFS or the ATS.

> Add debug information to application logs when a container fails
> ----------------------------------------------------------------
>                 Key: YARN-4309
>                 URL: https://issues.apache.org/jira/browse/YARN-4309
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>            Reporter: Varun Vasudev
>            Assignee: Varun Vasudev
> Sometimes when a container fails, it can be pretty hard to figure out why it 
> failed.
> My proposal is that if a container fails, we collect information about the 
> container local dir and dump it into the container log dir. Ideally, I'd like 
> to tar up the directory entirely, but I'm not sure of the security and space 
> implications of such a approach. At the very least, we can list all the files 
> in the container local dir, and dump the contents of launch_container.sh(into 
> the container log dir).
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such failures much easier.

This message was sent by Atlassian JIRA

Reply via email to