[
https://issues.apache.org/jira/browse/YARN-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734060#comment-15734060
]
Miklos Szegedi commented on YARN-5987:
--------------------------------------
The way I would implement this is to let the administrator specify
NM_SAVE_DEBUG_INFO_COMMAND and NM_SAVE_DEBUG_INFO_TIMEOUT_SEC. The command is
called, when a container is preempted. If the timer expires before the command
finishes, the command is cancelled. The command can have {{PID}}, and
{{LOG_DIR}} replaced with the actual values. The container executor needs to
impersonate, in case YARN is running as a different user than the container.
The ideal solution also specifies a flag in the container launch context,
whether to apply the feature to the current running application, so that we do
not collect dumps for all applications unnecessarily.
> NM configured command to collect heap dump of preempted container
> -----------------------------------------------------------------
>
> Key: YARN-5987
> URL: https://issues.apache.org/jira/browse/YARN-5987
> Project: Hadoop YARN
> Issue Type: Improvement
> Reporter: Miklos Szegedi
> Assignee: Miklos Szegedi
>
> The node manager can kill a container, if it exceeds the assigned memory
> limits. It would be nice to have a configuration entry to set up a command
> that can collect additional debug information, if needed. The collected
> information can be used for root cause analysis.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]