[
https://issues.apache.org/jira/browse/YARN-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Miklos Szegedi updated YARN-5987:
---------------------------------
Attachment: YARN-5987.000.patch
Since this is a new feature it applies only, if the container is killed due to
resource constraints.
We do not hard code the command to execute but use a property instead to
specify a custom one. This gives the customers more flexibility, since YARN
applications may be native or Java, etc. The command to collect debug
information varies depending on the environment.
The name of the property reflects the decision that currently only kill
transitions trigger the call. Save debug information may be called from other
locations in the future.
We do not change Linux container executor. We have to create a configuration
property for default container executor anyways. If we used this property in
LCE, we could not just pass the value to a SUID enabled app to execute for
security reasons. The option in this case would be to add the command as a
whitelist to the LCE configuration file. This means that the command is listed
twice in the configuration. Specifying a separate shell script with suid
enabled is less work for the administrator than changing LCE to support custom
commands I believe.
There is a question, whether we want to restrict the feature by an application
setting. The command property is just an indicator to enable the functionality,
the command is not carried out by default. Dump files are potentially very big
and may contain sensitive customer data. Because of this reason it is better to
do the actual call only, if the application client requested to do so. What do
you think?
> NM configured command to collect heap dump of preempted container
> -----------------------------------------------------------------
>
> Key: YARN-5987
> URL: https://issues.apache.org/jira/browse/YARN-5987
> Project: Hadoop YARN
> Issue Type: Improvement
> Reporter: Miklos Szegedi
> Assignee: Miklos Szegedi
> Attachments: YARN-5987.000.patch
>
>
> The node manager can kill a container, if it exceeds the assigned memory
> limits. It would be nice to have a configuration entry to set up a command
> that can collect additional debug information, if needed. The collected
> information can be used for root cause analysis.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]