[
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe updated YARN-4051:
-----------------------------
Summary: ContainerKillEvent lost when container is still recovering and
application finishes (was: ContainerKillEvent is lost when container is In
New State and is recovering)
Thanks for updating the patch!
I'm OK with fixing the lost kill-from-AM event in a separate JIRA, but I
adjusted the headline of this one to avoid confusion.
Should we use NMNotYetReadyException in the case where the AM tries to kill a
container still recovering? We already throw it in similar situations where
the NM isn't ready to handle the request.
Nits:
- ",because " should be " because "
- ContainerImpl#isRecovering should check recoveredStatus before container
state since recoveredStatus is the cheaper check and likely to avoid a
subsequent state check and corresponding lock acquisition.
> ContainerKillEvent lost when container is still recovering and application
> finishes
> -----------------------------------------------------------------------------------
>
> Key: YARN-4051
> URL: https://issues.apache.org/jira/browse/YARN-4051
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Reporter: sandflee
> Assignee: sandflee
> Priority: Critical
> Attachments: YARN-4051.01.patch, YARN-4051.02.patch,
> YARN-4051.03.patch, YARN-4051.04.patch, YARN-4051.05.patch,
> YARN-4051.06.patch, YARN-4051.07.patch
>
>
> As in YARN-4050, NM event dispatcher is blocked, and container is in New
> state, when we finish application, the container still alive even after NM
> event dispatcher is unblocked.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]