[ 
https://issues.apache.org/jira/browse/YARN-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-5987:
---------------------------------
    Attachment: YARN-5987.000.patch

Since this is a new feature it applies only, if the container is killed due to 
resource constraints.
We do not hard code the command to execute but use a property instead to 
specify a custom one. This gives the customers more flexibility, since YARN 
applications may be native or Java, etc. The command to collect debug 
information varies depending on the environment.
The name of the property reflects the decision that currently only kill 
transitions trigger the call. Save debug information may be called from other 
locations in the future.
We do not change Linux container executor. We have to create a configuration 
property for default container executor anyways. If we used this property in 
LCE, we could not just pass the value to a SUID enabled app to execute for 
security reasons. The option in this case would be to add the command as a 
whitelist to the LCE configuration file. This means that the command is listed 
twice in the configuration. Specifying a separate shell script with suid 
enabled is less work for the administrator than changing LCE to support custom 
commands I believe.
There is a question, whether we want to restrict the feature by an application 
setting. The command property is just an indicator to enable the functionality, 
the command is not carried out by default. Dump files are potentially very big 
and may contain sensitive customer data. Because of this reason it is better to 
do the actual call only, if the application client requested to do so. What do 
you think?

> NM configured command to collect heap dump of preempted container
> -----------------------------------------------------------------
>
>                 Key: YARN-5987
>                 URL: https://issues.apache.org/jira/browse/YARN-5987
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Miklos Szegedi
>            Assignee: Miklos Szegedi
>         Attachments: YARN-5987.000.patch
>
>
> The node manager can kill a container, if it exceeds the assigned memory 
> limits. It would be nice to have a configuration entry to set up a command 
> that can collect additional debug information, if needed. The collected 
> information can be used for root cause analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to