[
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13764007#comment-13764007
]
Xuan Gong commented on YARN-867:
--------------------------------
bq. I think we should handle AuxServicesEventType.APPLICATION_INIT and the stop
event in Application and not container. That should simplify THIS patch a lot.
I did not see the benefits.
So, when there is any auxServices fail in a container, we need to fail this
container. If we handle the AuxServicesEventType in Application, eventually,
from Application, we need to inform that certain container(not all the
containers) to exit_with_failure. It will go to the same process as that we
handle the it from container directly. If there is no difference, why do we
increase the traffic (more events) for application ?
> Isolation of failures in aux services
> --------------------------------------
>
> Key: YARN-867
> URL: https://issues.apache.org/jira/browse/YARN-867
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Reporter: Hitesh Shah
> Assignee: Xuan Gong
> Priority: Critical
> Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch,
> YARN-867.sampleCode.2.patch
>
>
> Today, a malicious application can bring down the NM by sending bad data to a
> service. For example, sending data to the ShuffleService such that it results
> any non-IOException will cause the NM's async dispatcher to exit as the
> service's INIT APP event is not handled properly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira