[
https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277731#comment-16277731
]
Wangda Tan commented on YARN-7591:
----------------------------------
Thanks [~Tao Yang], make sense to me,
Only one minor suggestion:
{code}
323 if (allocation.getAllocateFromReservedContainer() == null) {
324 return false;
325 }
{code}
Could you add comments above this {{if}} check so in the future we can easier
remember why this check if needed.
> NPE in async-scheduling mode of CapacityScheduler
> -------------------------------------------------
>
> Key: YARN-7591
> URL: https://issues.apache.org/jira/browse/YARN-7591
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacityscheduler
> Affects Versions: 3.0.0-alpha4, 2.9.1
> Reporter: Tao Yang
> Assignee: Tao Yang
> Priority: Critical
> Attachments: YARN-7591.001.patch
>
>
> Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in
> special scenarios as below.
> (1) The user should be removed after its last application finished, NPE may
> be raised if getting something from user object without the null check in
> async-scheduling threads.
> (2) NPE may be raised when trying fulfill reservation for a finished
> application in {{CapacityScheduler#allocateContainerOnSingleNode}}.
> {code}
> RMContainer reservedContainer = node.getReservedContainer();
> if (reservedContainer != null) {
> FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer(
> reservedContainer.getContainerId());
> // NPE here: reservedApplication could be null after this application
> finished
> // Try to fulfill the reservation
> LOG.info(
> "Trying to fulfill reservation for application " +
> reservedApplication
> .getApplicationId() + " on node: " + node.getNodeID());
> {code}
> (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve
> containerY on node1) were generated by different async-scheduling threads
> around the same time and proposal2 was submitted in front of proposal1, NPE
> is raised when trying to submit proposal2 in
> {{FiCaSchedulerApp#commonCheckContainerAllocation}}.
> {code}
> if (reservedContainerOnNode != null) {
> // NPE here: allocation.getAllocateFromReservedContainer() should be
> null for proposal2 in this case
> RMContainer fromReservedContainer =
> allocation.getAllocateFromReservedContainer().getRmContainer();
> if (fromReservedContainer != reservedContainerOnNode) {
> if (LOG.isDebugEnabled()) {
> LOG.debug(
> "Try to allocate from a non-existed reserved container");
> }
> return false;
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]