[jira] [Commented] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain

2018-07-11 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541087#comment-16541087
 ] 

Weiwei Yang commented on YARN-4104:
---

I like this idea too, but seems this one gets to obsolete :(

 

> dryrun of schedule for diagnostic and tenant's complain
> ---
>
> Key: YARN-4104
> URL: https://issues.apache.org/jira/browse/YARN-4104
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
>
> We have more than 1 thousand queues and several hundreds of tenants in a busy 
> cluster. We get a lot of complains/questions from owner/operator of queues 
> about "Why my queue/app can't get resource for a long while? "
> It's really hard to answer such questions.
> So we added a diagnostic REST endpoint 
> "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted 
> list of it's children according to it's SchedulingPolicy.getComparator().  
> All scheduling parameters of the children are also displayed, such as 
> minShare, usage, demand, weight, priority etc.
> Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result 
> self-explains to the questions.
> I feel it's really useful for multi-tenant clusters, and hope it could be 
> merged into the mainline.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain

2015-09-01 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726646#comment-14726646
 ] 

Rohith Sharma K S commented on YARN-4104:
-

Thanks [~zhiguohong] for brining up the customer pain point. In YARN-4091 
trying to make debuggabiliy easy. Making subtask of the YARN-4091.

> dryrun of schedule for diagnostic and tenant's complain
> ---
>
> Key: YARN-4104
> URL: https://issues.apache.org/jira/browse/YARN-4104
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
>
> We have more than 1 thousand queues and several hundreds of tenants in a busy 
> cluster. We get a lot of complains/questions from owner/operator of queues 
> about "Why my queue/app can't get resource for a long while? "
> It's really hard to answer such questions.
> So we added a diagnostic REST endpoint 
> "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted 
> list of it's children according to it's SchedulingPolicy.getComparator().  
> All scheduling parameters of the children are also displayed, such as 
> minShare, usage, demand, weight, priority etc.
> Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result 
> self-explains to the questions.
> I feel it's really useful for multi-tenant clusters, and hope it could be 
> merged into the mainline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain

2015-09-01 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726780#comment-14726780
 ] 

Hong Zhiguo commented on YARN-4104:
---

For better human readability, it's plain text.
{code}
0001 root.g_isd_999 min(6143,1) max(12288,4) 
dem(12288,4) use(52194816,15652) weight=1
0002 root.g_ieg_ttlz_ttlz_import_tdbank min(61439,19) max(1228800,400) 
dem(13056,6) use(1536,1) weight=800
0003 root.g_ieg_wegalaxy_wegalaxy_import_tdbank min(61439,19) max(1228800,400) 
dem(13056,6) use(1536,1) weight=800
0004 root.safety_cloud min(18432000,6000) max(18432000,6000) dem(18432000,6000) 
use(10585088,5169) weight=6000
0005 root.g_teg_datacompress min(6144000,2000) max(12288000,4000) dem(52224,27) 
use(32256,18) weight=2400
0006 root.g_input_output_hlw min(368639,119) max(1474560,480) dem(20480,19) 
use(13312,12) weight=800
0007 root.g_ecc_express_ecc_express min(1474559,479) max(5898240,1920) 
dem(9472,4) use(6272,3) weight=832
0008 root.g_iegv2_datacompress min(6144000,2000) max(12288000,4000) 
dem(46080,24) use(34560,19) weight=2400
0009 root.g_raid_datacompress min(6144000,2000) max(12288000,4000) 
dem(65280,35) use(52992,29) weight=2400
0010 root.g_input_output_ieg_tdbank min(2764799,899) max(11059200,3600) 
dem(177408,90) use(145152,76) weight=1210
0011 root.g_ieg_iegpdata_idata_subject_analysis min(1228799,459) 
max(9830400,3680) dem(1372928,601) use(1022720,449) weight=814
...
{code}

> dryrun of schedule for diagnostic and tenant's complain
> ---
>
> Key: YARN-4104
> URL: https://issues.apache.org/jira/browse/YARN-4104
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
>
> We have more than 1 thousand queues and several hundreds of tenants in a busy 
> cluster. We get a lot of complains/questions from owner/operator of queues 
> about "Why my queue/app can't get resource for a long while? "
> It's really hard to answer such questions.
> So we added a diagnostic REST endpoint 
> "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted 
> list of it's children according to it's SchedulingPolicy.getComparator().  
> All scheduling parameters of the children are also displayed, such as 
> minShare, usage, demand, weight, priority etc.
> Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result 
> self-explains to the questions.
> I feel it's really useful for multi-tenant clusters, and hope it could be 
> merged into the mainline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain

2015-09-01 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726697#comment-14726697
 ] 

Hong Zhiguo commented on YARN-4104:
---

It only works for fair scheduler at this moment because we use fair scheduler 
only. But it would be easy to support other schedulers.
Can I make several 3rd level of subtasks of this one?

> dryrun of schedule for diagnostic and tenant's complain
> ---
>
> Key: YARN-4104
> URL: https://issues.apache.org/jira/browse/YARN-4104
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
>
> We have more than 1 thousand queues and several hundreds of tenants in a busy 
> cluster. We get a lot of complains/questions from owner/operator of queues 
> about "Why my queue/app can't get resource for a long while? "
> It's really hard to answer such questions.
> So we added a diagnostic REST endpoint 
> "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted 
> list of it's children according to it's SchedulingPolicy.getComparator().  
> All scheduling parameters of the children are also displayed, such as 
> minShare, usage, demand, weight, priority etc.
> Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result 
> self-explains to the questions.
> I feel it's really useful for multi-tenant clusters, and hope it could be 
> merged into the mainline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain

2015-09-01 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726688#comment-14726688
 ] 

Karthik Kambatla commented on YARN-4104:


Sounds very useful. Does this work for any scheduler or is it specific to one 
of the schedulers? 

> dryrun of schedule for diagnostic and tenant's complain
> ---
>
> Key: YARN-4104
> URL: https://issues.apache.org/jira/browse/YARN-4104
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
>
> We have more than 1 thousand queues and several hundreds of tenants in a busy 
> cluster. We get a lot of complains/questions from owner/operator of queues 
> about "Why my queue/app can't get resource for a long while? "
> It's really hard to answer such questions.
> So we added a diagnostic REST endpoint 
> "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted 
> list of it's children according to it's SchedulingPolicy.getComparator().  
> All scheduling parameters of the children are also displayed, such as 
> minShare, usage, demand, weight, priority etc.
> Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result 
> self-explains to the questions.
> I feel it's really useful for multi-tenant clusters, and hope it could be 
> merged into the mainline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain

2015-09-01 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726701#comment-14726701
 ] 

Rohith Sharma K S commented on YARN-4104:
-

[~zhiguohong] would you mind attaching the sample REST output i.e 
{{"/ws/v1/cluster/schedule/dryrun/root}}?

> dryrun of schedule for diagnostic and tenant's complain
> ---
>
> Key: YARN-4104
> URL: https://issues.apache.org/jira/browse/YARN-4104
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
>
> We have more than 1 thousand queues and several hundreds of tenants in a busy 
> cluster. We get a lot of complains/questions from owner/operator of queues 
> about "Why my queue/app can't get resource for a long while? "
> It's really hard to answer such questions.
> So we added a diagnostic REST endpoint 
> "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted 
> list of it's children according to it's SchedulingPolicy.getComparator().  
> All scheduling parameters of the children are also displayed, such as 
> minShare, usage, demand, weight, priority etc.
> Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result 
> self-explains to the questions.
> I feel it's really useful for multi-tenant clusters, and hope it could be 
> merged into the mainline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)