[
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785373#comment-13785373
]
Bikas Saha commented on YARN-867:
---------------------------------
bq. Using the above, at the very least, we can catch issues related to
mis-configured NMs where the shuffle service is not configured. This is way
simpler as it could be done a simple synchronous check when handling the
startContainers rpc call. This could be targeted to 2.1.2/2.2.0
@hitesh, I agree. In that case shall we leave re-target this jira to 2.3 and
use YARN-1256 to fix the misconfigured service and exception logging?
> Isolation of failures in aux services
> --------------------------------------
>
> Key: YARN-867
> URL: https://issues.apache.org/jira/browse/YARN-867
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Reporter: Hitesh Shah
> Assignee: Xuan Gong
> Priority: Critical
> Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch,
> YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch,
> YARN-867.sampleCode.2.patch
>
>
> Today, a malicious application can bring down the NM by sending bad data to a
> service. For example, sending data to the ShuffleService such that it results
> any non-IOException will cause the NM's async dispatcher to exit as the
> service's INIT APP event is not handled properly.
--
This message was sent by Atlassian JIRA
(v6.1#6144)