[
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784717#comment-13784717
]
Xuan Gong commented on YARN-867:
--------------------------------
bq.Probably have 1 try catch instead of multiple.
Fixed. Use only one big try catch block
bq.Can we rename AUXSERVICE_FAIL to AUXSERVICE_ERROR since the service probably
hasnt failed.
Done
bq.TestAuxService needs an addition for the new code
Added a new test case in TestAuxService
bq.TestContainer - new test can be made simpler by not mocking
AuxServiceHandler and instead sending the failed event directly like its done
for other tests there.
Fixed
bq.In AuxService.handle(APPLICATION_INIT) and other places like that, where the
service does not exist then we should fail too.
Done
bq.Probably we can ignore the error here since the container has already failed.
I think we still need this transition. The container can go to
ContainerState.LOCALIZATION_FAILED from new state, and AuxService is triggered
to do the Application_init at that time. If there is any exception, we will
send the ContainerExitEvent with
ContainerEventType.CONTAINER_EXITED_WITH_FAILURE to the Container. And It is
very possible that container will start to process this event when it is in the
LOCALIZATION_FAILED state. So, we should handle it.
> Isolation of failures in aux services
> --------------------------------------
>
> Key: YARN-867
> URL: https://issues.apache.org/jira/browse/YARN-867
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Reporter: Hitesh Shah
> Assignee: Xuan Gong
> Priority: Critical
> Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch,
> YARN-867.4.patch, YARN-867.sampleCode.2.patch
>
>
> Today, a malicious application can bring down the NM by sending bad data to a
> service. For example, sending data to the ShuffleService such that it results
> any non-IOException will cause the NM's async dispatcher to exit as the
> service's INIT APP event is not handled properly.
--
This message was sent by Atlassian JIRA
(v6.1#6144)