Varun Vasudev created YARN-4309:
-----------------------------------
Summary: Add debug information to application logs when a
container fails
Key: YARN-4309
URL: https://issues.apache.org/jira/browse/YARN-4309
Project: Hadoop YARN
Issue Type: Improvement
Components: nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Sometimes when a container fails, it can be pretty hard to figure out why it
failed.
My proposal is that if a container fails, we collect information about the
container local dir and dump it into the container log dir. Ideally, I'd like
to tar up the directory entirely, but I'm not sure of the security and space
implications of such a approach. At the very least, we can list all the files
in the container local dir, and dump the contents of launch_container.sh(into
the container log dir).
When log aggregation occurs, all this information will automatically get
collected and make debugging such failures much easier.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)