[
https://issues.apache.org/jira/browse/YARN-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin Shvachko updated YARN-6568:
--------------------------------------
Fix Version/s: (was: 2.7.4)
> A queue which runs a long time job couldn't acquire any container for long
> time.
> --------------------------------------------------------------------------------
>
> Key: YARN-6568
> URL: https://issues.apache.org/jira/browse/YARN-6568
> Project: Hadoop YARN
> Issue Type: Bug
> Components: fairscheduler
> Affects Versions: 2.7.1
> Environment: CentOS 7.1
> Reporter: zhengchenyu
> Original Estimate: 1m
> Remaining Estimate: 1m
>
> In our cluster, we find some applications couldn't acquire any container for
> long time. (Note: we use FairSharePolicy and FairScheduler)
> First, I found some unreasonable configuration, we set minRes=maxRes. So some
> application keep pending for long time, we kill some large applicaiton to
> solve this problem. Then we changed this configuration, this problem
> relieves.
> But this problem is not completely solved. In our cluster, I found
> applications in some queue which request few container keep pending for long
> time.
> I simulate in test cluster. I submit DistributedShell application which run
> many loo applications to queueA, then I submit my own yarn application which
> request container and release container constantly to queueB. At this time,
> any applicaitons which are submmited to queueA keep pending!
> We know this is the problem of FairSharePolicy, it consider the request of
> queue. So after sort the queues, some queues which have few request are
> ordered last all time.
> We know if the AM container is launched, then the request will increase, But
> FairSharePolicy can't distinguish which request is AM request. I think if am
> container is assigned, the problem is solved.
> Our companion discuss this problem. we recommend set a timeout for queue, it
> means the time length of a queue is not assigned. If timeout, we set this
> queue to the first place of queues list.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]