[jira] [Commented] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain
[ https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541087#comment-16541087 ] Weiwei Yang commented on YARN-4104: --- I like this idea too, but seems this one gets to obsolete :( > dryrun of schedule for diagnostic and tenant's complain > --- > > Key: YARN-4104 > URL: https://issues.apache.org/jira/browse/YARN-4104 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > > We have more than 1 thousand queues and several hundreds of tenants in a busy > cluster. We get a lot of complains/questions from owner/operator of queues > about "Why my queue/app can't get resource for a long while? " > It's really hard to answer such questions. > So we added a diagnostic REST endpoint > "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted > list of it's children according to it's SchedulingPolicy.getComparator(). > All scheduling parameters of the children are also displayed, such as > minShare, usage, demand, weight, priority etc. > Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result > self-explains to the questions. > I feel it's really useful for multi-tenant clusters, and hope it could be > merged into the mainline. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain
[ https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726646#comment-14726646 ] Rohith Sharma K S commented on YARN-4104: - Thanks [~zhiguohong] for brining up the customer pain point. In YARN-4091 trying to make debuggabiliy easy. Making subtask of the YARN-4091. > dryrun of schedule for diagnostic and tenant's complain > --- > > Key: YARN-4104 > URL: https://issues.apache.org/jira/browse/YARN-4104 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > > We have more than 1 thousand queues and several hundreds of tenants in a busy > cluster. We get a lot of complains/questions from owner/operator of queues > about "Why my queue/app can't get resource for a long while? " > It's really hard to answer such questions. > So we added a diagnostic REST endpoint > "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted > list of it's children according to it's SchedulingPolicy.getComparator(). > All scheduling parameters of the children are also displayed, such as > minShare, usage, demand, weight, priority etc. > Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result > self-explains to the questions. > I feel it's really useful for multi-tenant clusters, and hope it could be > merged into the mainline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain
[ https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726780#comment-14726780 ] Hong Zhiguo commented on YARN-4104: --- For better human readability, it's plain text. {code} 0001 root.g_isd_999 min(6143,1) max(12288,4) dem(12288,4) use(52194816,15652) weight=1 0002 root.g_ieg_ttlz_ttlz_import_tdbank min(61439,19) max(1228800,400) dem(13056,6) use(1536,1) weight=800 0003 root.g_ieg_wegalaxy_wegalaxy_import_tdbank min(61439,19) max(1228800,400) dem(13056,6) use(1536,1) weight=800 0004 root.safety_cloud min(18432000,6000) max(18432000,6000) dem(18432000,6000) use(10585088,5169) weight=6000 0005 root.g_teg_datacompress min(6144000,2000) max(12288000,4000) dem(52224,27) use(32256,18) weight=2400 0006 root.g_input_output_hlw min(368639,119) max(1474560,480) dem(20480,19) use(13312,12) weight=800 0007 root.g_ecc_express_ecc_express min(1474559,479) max(5898240,1920) dem(9472,4) use(6272,3) weight=832 0008 root.g_iegv2_datacompress min(6144000,2000) max(12288000,4000) dem(46080,24) use(34560,19) weight=2400 0009 root.g_raid_datacompress min(6144000,2000) max(12288000,4000) dem(65280,35) use(52992,29) weight=2400 0010 root.g_input_output_ieg_tdbank min(2764799,899) max(11059200,3600) dem(177408,90) use(145152,76) weight=1210 0011 root.g_ieg_iegpdata_idata_subject_analysis min(1228799,459) max(9830400,3680) dem(1372928,601) use(1022720,449) weight=814 ... {code} > dryrun of schedule for diagnostic and tenant's complain > --- > > Key: YARN-4104 > URL: https://issues.apache.org/jira/browse/YARN-4104 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > > We have more than 1 thousand queues and several hundreds of tenants in a busy > cluster. We get a lot of complains/questions from owner/operator of queues > about "Why my queue/app can't get resource for a long while? " > It's really hard to answer such questions. > So we added a diagnostic REST endpoint > "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted > list of it's children according to it's SchedulingPolicy.getComparator(). > All scheduling parameters of the children are also displayed, such as > minShare, usage, demand, weight, priority etc. > Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result > self-explains to the questions. > I feel it's really useful for multi-tenant clusters, and hope it could be > merged into the mainline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain
[ https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726697#comment-14726697 ] Hong Zhiguo commented on YARN-4104: --- It only works for fair scheduler at this moment because we use fair scheduler only. But it would be easy to support other schedulers. Can I make several 3rd level of subtasks of this one? > dryrun of schedule for diagnostic and tenant's complain > --- > > Key: YARN-4104 > URL: https://issues.apache.org/jira/browse/YARN-4104 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > > We have more than 1 thousand queues and several hundreds of tenants in a busy > cluster. We get a lot of complains/questions from owner/operator of queues > about "Why my queue/app can't get resource for a long while? " > It's really hard to answer such questions. > So we added a diagnostic REST endpoint > "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted > list of it's children according to it's SchedulingPolicy.getComparator(). > All scheduling parameters of the children are also displayed, such as > minShare, usage, demand, weight, priority etc. > Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result > self-explains to the questions. > I feel it's really useful for multi-tenant clusters, and hope it could be > merged into the mainline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain
[ https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726688#comment-14726688 ] Karthik Kambatla commented on YARN-4104: Sounds very useful. Does this work for any scheduler or is it specific to one of the schedulers? > dryrun of schedule for diagnostic and tenant's complain > --- > > Key: YARN-4104 > URL: https://issues.apache.org/jira/browse/YARN-4104 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > > We have more than 1 thousand queues and several hundreds of tenants in a busy > cluster. We get a lot of complains/questions from owner/operator of queues > about "Why my queue/app can't get resource for a long while? " > It's really hard to answer such questions. > So we added a diagnostic REST endpoint > "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted > list of it's children according to it's SchedulingPolicy.getComparator(). > All scheduling parameters of the children are also displayed, such as > minShare, usage, demand, weight, priority etc. > Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result > self-explains to the questions. > I feel it's really useful for multi-tenant clusters, and hope it could be > merged into the mainline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain
[ https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726701#comment-14726701 ] Rohith Sharma K S commented on YARN-4104: - [~zhiguohong] would you mind attaching the sample REST output i.e {{"/ws/v1/cluster/schedule/dryrun/root}}? > dryrun of schedule for diagnostic and tenant's complain > --- > > Key: YARN-4104 > URL: https://issues.apache.org/jira/browse/YARN-4104 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > > We have more than 1 thousand queues and several hundreds of tenants in a busy > cluster. We get a lot of complains/questions from owner/operator of queues > about "Why my queue/app can't get resource for a long while? " > It's really hard to answer such questions. > So we added a diagnostic REST endpoint > "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted > list of it's children according to it's SchedulingPolicy.getComparator(). > All scheduling parameters of the children are also displayed, such as > minShare, usage, demand, weight, priority etc. > Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result > self-explains to the questions. > I feel it's really useful for multi-tenant clusters, and hope it could be > merged into the mainline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)