Hong Zhiguo created YARN-4104:
---------------------------------
Summary: dryrun of schedule for diagnostic and tenant's complain
Key: YARN-4104
URL: https://issues.apache.org/jira/browse/YARN-4104
Project: Hadoop YARN
Issue Type: Improvement
Components: scheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
We have more than 1 thousand queues and several handreds of tenants in a busy
cluster. We get a lot of complains/questions from owner/operator of queues
about "Why my queue/app can't get resource for a long while? "
It's realy hard to answer such questions.
So we added an diagnostic REST endpoint
"/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted
list of it's children according to it's SchedulingPolicy.getComparator(). All
scheduling parameters of the chidren are also displayed, such as minShare,
usage, demand, weight, priority etc.
Usually we just call "/ws/v1/cluster/schedule/root", and the result
self-explains to the questions.
I feel it's really usefull for multi-tenant clusters, and hope it could be
merged into the mainline.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)