[ 
https://issues.apache.org/jira/browse/YARN-10831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhengbo Li updated YARN-10831:
------------------------------
    Priority: Minor  (was: Major)

> Allow checking over-commit after reconnect event
> ------------------------------------------------
>
>                 Key: YARN-10831
>                 URL: https://issues.apache.org/jira/browse/YARN-10831
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Zhengbo Li
>            Priority: Minor
>
> Currently the container over-commit check is skipped after a node re-connect 
> event, because the "timeout" period is always default to -1, which makes the 
> `signalContainersIfOvercommitted` method skip the check:
> [line 
> link]([https://github.com/apache/hadoop/blob/03cfc852791c14fad39db4e5b14104a276c08e59/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java#L1260)]
> However in our case, because of the type of VM we use a node's resource could 
> change after a re-connect event. That means its CPU core / memory could be 
> less then causing container overcommit. Therefore we should allow configuring 
> the timeout period for reconnect event to be non -1 value to perform suck 
> overcommit check.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to