Tao Yang created YARN-8575:
------------------------------

             Summary: CapacityScheduler should check node state before 
committing reserve/allocate proposals
                 Key: YARN-8575
                 URL: https://issues.apache.org/jira/browse/YARN-8575
             Project: Hadoop YARN
          Issue Type: Bug
          Components: capacityscheduler
    Affects Versions: 3.2.0, 3.1.2
            Reporter: Tao Yang
            Assignee: Tao Yang


Recently we found a new error as follows: 
{noformat}
ERROR 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
 node to unreserve doesn't exist, nodeid: host1:45454
{noformat}
Reproduce this problem:
(1) Create a reserve proposal for app1 on node1
(2) node1 is successfully decommissioned and removed from node tracker
(3) Try to commit this outdated reserve proposal, it will be accepted and 
applied.
This error may be occurred after decommissioning some NMs. The application who 
print the error log will always have a reserved container on non-exist 
(decommissioned) NM and the pending request will never be satisfied.
To solve this problem, scheduler should check node state in 
FiCaSchedulerApp#accept to avoid committing outdated proposals on unusable 
nodes. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to