[ 
https://issues.apache.org/jira/browse/YARN-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16002155#comment-16002155
 ] 

zhengchenyu commented on YARN-6568:
-----------------------------------

[~yufeigu]
First, in FairSharePolicy.FairShareComparator.compare, weight only take effect 
when two queue are both not hungry or between two applicaitons.  Even though 
weights take effect, it's terrible for admin to change the weight dynamically. 

Seconds, I said queueA has only one request, but it is am request. queueB has 
many request, but not any am request. If two queue are also hungry (means two 
queue's minRes are big enough),  it sorts the queues by this variable 
'minShareRatio'( in FairSharePolicy.java). At this time, minShareRatio = 
resourceUsage / (resourceUsage + request) . Here minShareRatio of queueA is 
always bigger than minShareRatio of queueB. So queueA can't acquire resource 
for long time, queueA keep pending for long time. But we know if AM container 
could launch, more requests would arrive. So I think we need to handle the AM 
request specially, or set timeout for pending queue. But I tend to the latter, 
it's easy. 



> A queue which runs a long time job couldn't acquire any container for long 
> time.
> --------------------------------------------------------------------------------
>
>                 Key: YARN-6568
>                 URL: https://issues.apache.org/jira/browse/YARN-6568
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.7.1
>         Environment: CentOS 7.1
>            Reporter: zhengchenyu
>             Fix For: 2.7.4
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> In our cluster, we find some applications couldn't acquire any container for 
> long time. (Note: we use FairSharePolicy and FairScheduler)
> First, I found some unreasonable configuration, we set minRes=maxRes. So some 
> application keep pending for long time, we kill some large applicaiton to 
> solve this problem. Then we changed this configuration, this problem 
> relieves. 
> But this problem is not completely solved. In our cluster, I found 
> applications in  some queue which request few container keep pending for long 
> time. 
> I simulate in test cluster. I submit DistributedShell application which run 
> many loo applications to queueA, then I submit my own yarn application which 
> request container and release container constantly to queueB.  At this time, 
> any applicaitons which are submmited to queueA keep pending!
> We know this is the problem of FairSharePolicy, it consider the request of 
> queue. So after sort the queues, some queues which have few request are 
> ordered last all time.
> We know if the AM container is launched, then the request will increase, But 
> FairSharePolicy can't distinguish which request is AM request. I think if am 
> container is assigned, the problem is solved. 
> Our companion discuss this problem. we recommend set a timeout for queue, it 
> means the time length of a queue is not assigned. If timeout, we set this 
> queue to the first place of queues list. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to