[ https://issues.apache.org/jira/browse/YARN-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Konstantin Shvachko updated YARN-6568: -------------------------------------- Fix Version/s: (was: 2.7.4) > A queue which runs a long time job couldn't acquire any container for long > time. > -------------------------------------------------------------------------------- > > Key: YARN-6568 > URL: https://issues.apache.org/jira/browse/YARN-6568 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Affects Versions: 2.7.1 > Environment: CentOS 7.1 > Reporter: zhengchenyu > Original Estimate: 1m > Remaining Estimate: 1m > > In our cluster, we find some applications couldn't acquire any container for > long time. (Note: we use FairSharePolicy and FairScheduler) > First, I found some unreasonable configuration, we set minRes=maxRes. So some > application keep pending for long time, we kill some large applicaiton to > solve this problem. Then we changed this configuration, this problem > relieves. > But this problem is not completely solved. In our cluster, I found > applications in some queue which request few container keep pending for long > time. > I simulate in test cluster. I submit DistributedShell application which run > many loo applications to queueA, then I submit my own yarn application which > request container and release container constantly to queueB. At this time, > any applicaitons which are submmited to queueA keep pending! > We know this is the problem of FairSharePolicy, it consider the request of > queue. So after sort the queues, some queues which have few request are > ordered last all time. > We know if the AM container is launched, then the request will increase, But > FairSharePolicy can't distinguish which request is AM request. I think if am > container is assigned, the problem is solved. > Our companion discuss this problem. we recommend set a timeout for queue, it > means the time length of a queue is not assigned. If timeout, we set this > queue to the first place of queues list. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org