[ 
https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4104:
------------------------------
    Description: 
We have more than 1 thousand queues and several hundreds of tenants in a busy 
cluster. We get a lot of complains/questions from owner/operator of queues 
about "Why my queue/app can't get resource for a long while? "

It's really hard to answer such questions.

So we added a diagnostic REST endpoint 
"/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted 
list of it's children according to it's SchedulingPolicy.getComparator().  All 
scheduling parameters of the children are also displayed, such as minShare, 
usage, demand, weight, priority etc.
Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result 
self-explains to the questions.
I feel it's really useful for multi-tenant clusters, and hope it could be 
merged into the mainline.

  was:
We have more than 1 thousand queues and several hundreds of tenants in a busy 
cluster. We get a lot of complains/questions from owner/operator of queues 
about "Why my queue/app can't get resource for a long while? "

It's really hard to answer such questions.

So we added a diagnostic REST endpoint 
"/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted 
list of it's children according to it's SchedulingPolicy.getComparator().  All 
scheduling parameters of the children are also displayed, such as minShare, 
usage, demand, weight, priority etc.
Usually we just call "/ws/v1/cluster/schedule/root", and the result 
self-explains to the questions.
I feel it's really useful for multi-tenant clusters, and hope it could be 
merged into the mainline.


> dryrun of schedule for diagnostic and tenant's complain
> -------------------------------------------------------
>
>                 Key: YARN-4104
>                 URL: https://issues.apache.org/jira/browse/YARN-4104
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: scheduler
>            Reporter: Hong Zhiguo
>            Assignee: Hong Zhiguo
>            Priority: Minor
>
> We have more than 1 thousand queues and several hundreds of tenants in a busy 
> cluster. We get a lot of complains/questions from owner/operator of queues 
> about "Why my queue/app can't get resource for a long while? "
> It's really hard to answer such questions.
> So we added a diagnostic REST endpoint 
> "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted 
> list of it's children according to it's SchedulingPolicy.getComparator().  
> All scheduling parameters of the children are also displayed, such as 
> minShare, usage, demand, weight, priority etc.
> Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result 
> self-explains to the questions.
> I feel it's really useful for multi-tenant clusters, and hope it could be 
> merged into the mainline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to