[jira] [Commented] (YARN-5448) Resource in Cluster Metrics is not sum of resources in all nodes of all partitions
[ https://issues.apache.org/jira/browse/YARN-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405320#comment-15405320 ] Naganarasimha G R commented on YARN-5448: - Thanks for sharing your thoughts [~wangda], bq. Sorry I may not quite sure about this. Could you explain? What i meant was, these additional non-usable resources.columns as part of cluster metrics table will be use full only when there is a configuration error and once corrected these columns are not of much use, basically these columns purpose will be almost nill if configured correctly. One alternative i can think of is show these columns only when partitions are not mapped to queues. and if value is zero then dont show, thoughts ? bq. which can help answering questions like "why I cannot fully utilize the cluster". One view point what i had for this was captured in the above [comment | https://issues.apache.org/jira/browse/YARN-5448?focusedCommentId=15399248=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15399248], but well again its a view point and debatable so dont have any hard restrictions on having it. bq. It's better to add a non-usable nodes as a separate col, but to me it may not a fully replacement of total non-usable resources. May be i did not get the rationale behind {{"total non-usable resources"}} would be better than {{"non-usable nodes"}}, can elaborate more on your view on this ? > Resource in Cluster Metrics is not sum of resources in all nodes of all > partitions > -- > > Key: YARN-5448 > URL: https://issues.apache.org/jira/browse/YARN-5448 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager, webapp >Affects Versions: 2.7.2 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: NodesPage.png, schedulerPage.png > > > Currently Resource info from Cluster Metrics are got from Queue Metrics's > *available resource + allocated resource*. Hence if there are some nodes > which belongs to partition but if that partition is not associated with any > queue then in the capacity scheduler partition hierarchy shows this nodes > resources under its partition but Cluster metrics doesn't show. > Apart from this in the Nodes page too Metrics overview table is shown. So if > we show Resource info from Queue Metrics User will not be able to co relate > it. (have attached the images for the same) > IIUC idea of not showing in the *Metrics overview table* is to highlight that > configuration is not proper. This needs to be some how conveyed through > parititon-by-queue-hierarchy chart. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5448) Resource in Cluster Metrics is not sum of resources in all nodes of all partitions
[ https://issues.apache.org/jira/browse/YARN-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402616#comment-15402616 ] Wangda Tan commented on YARN-5448: -- [~Naganarasimha], bq. 1. This is one off wrong configuration scenario for this if we add additional columns, then in general usage(after correction) its not of much use. Sorry I may not quite sure about this. Could you explain? bq. 2. Currently there are only two resources which are getting monitored (cpu and memory), what about other resources which in future we want to add? so adding multiple columns for each resource for this purpose doesn't seem good. They're all legacy issues of existing UI, in the new UI it will be easier to show: a. For each resource type, we can show a pie chaart of (used / available / non-usable), OR b. Show vector of (used / available / non-usable). I think a. should be more intuitive to me. To be simple, we can only consider mem / vcores for now, adding to cols doesn't sound like a problem to me. bq. 3. Admin can immediately notice that there are some label which are not-configured or part of cluster resource : Well its anyway related to Scheduler page if admin sees a warning in parititon-by-queue-hierarchy chart IMHO it should be sufficient. To me it's not sufficient, we should have a way to show total resource of all states (used / available-and-usable / non-usable) in a single place, which can help answering questions like "why I cannot fully utilize the cluster". bq. One other approach i could think of is instead of mentioning quantity of total resource under each type(usable and unusable) how about number of unusable(/ unallocatable) nodes ? It's better to add a non-usable nodes as a separate col, but to me it may not a fully replacement of total non-usable resources. Thoughts? > Resource in Cluster Metrics is not sum of resources in all nodes of all > partitions > -- > > Key: YARN-5448 > URL: https://issues.apache.org/jira/browse/YARN-5448 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager, webapp >Affects Versions: 2.7.2 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: NodesPage.png, schedulerPage.png > > > Currently Resource info from Cluster Metrics are got from Queue Metrics's > *available resource + allocated resource*. Hence if there are some nodes > which belongs to partition but if that partition is not associated with any > queue then in the capacity scheduler partition hierarchy shows this nodes > resources under its partition but Cluster metrics doesn't show. > Apart from this in the Nodes page too Metrics overview table is shown. So if > we show Resource info from Queue Metrics User will not be able to co relate > it. (have attached the images for the same) > IIUC idea of not showing in the *Metrics overview table* is to highlight that > configuration is not proper. This needs to be some how conveyed through > parititon-by-queue-hierarchy chart. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5448) Resource in Cluster Metrics is not sum of resources in all nodes of all partitions
[ https://issues.apache.org/jira/browse/YARN-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401408#comment-15401408 ] Naganarasimha G R commented on YARN-5448: - Thanks for sharing your thoughts [~wangda], From my view i had few points which i mentioned in my earlier [comment|https://issues.apache.org/jira/browse/YARN-5448?focusedCommentId=15398991=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15398991], hope you have considered that. One other approach i could think of is instead of mentioning quantity of total resource under each type(usable and unusable) how about number of unusable(/ unallocatable) nodes ? > Resource in Cluster Metrics is not sum of resources in all nodes of all > partitions > -- > > Key: YARN-5448 > URL: https://issues.apache.org/jira/browse/YARN-5448 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager, webapp >Affects Versions: 2.7.2 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: NodesPage.png, schedulerPage.png > > > Currently Resource info from Cluster Metrics are got from Queue Metrics's > *available resource + allocated resource*. Hence if there are some nodes > which belongs to partition but if that partition is not associated with any > queue then in the capacity scheduler partition hierarchy shows this nodes > resources under its partition but Cluster metrics doesn't show. > Apart from this in the Nodes page too Metrics overview table is shown. So if > we show Resource info from Queue Metrics User will not be able to co relate > it. (have attached the images for the same) > IIUC idea of not showing in the *Metrics overview table* is to highlight that > configuration is not proper. This needs to be some how conveyed through > parititon-by-queue-hierarchy chart. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5448) Resource in Cluster Metrics is not sum of resources in all nodes of all partitions
[ https://issues.apache.org/jira/browse/YARN-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15399248#comment-15399248 ] Naganarasimha G R commented on YARN-5448: - [~sunilg] bq. But resource allocation is only some % of X. (not for full cluster). Suppose the node is mapped to a Exclusive Partition and users are asking for *other partitions* then irrespective of the configuration of accesibiltiy of this partition to any queue, *resource allocation is only some % of X*. And suppose users tries to submit the app when the partition is *not* mapped then anyway Application Submission fails with appropriate exception. So it will not be like a surprice for the admin or its not something which will get unnoticed. bq. But I am not sure how these labels can be visible enough to user from scheduler UI to convey the issue. Could you share how it may come. Was planning to show the Partition information in different color (red) with a tooltip indicating the information "the partition is not assigned to any leaf queue" > Resource in Cluster Metrics is not sum of resources in all nodes of all > partitions > -- > > Key: YARN-5448 > URL: https://issues.apache.org/jira/browse/YARN-5448 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager, webapp >Affects Versions: 2.7.2 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: NodesPage.png, schedulerPage.png > > > Currently Resource info from Cluster Metrics are got from Queue Metrics's > *available resource + allocated resource*. Hence if there are some nodes > which belongs to partition but if that partition is not associated with any > queue then in the capacity scheduler partition hierarchy shows this nodes > resources under its partition but Cluster metrics doesn't show. > Apart from this in the Nodes page too Metrics overview table is shown. So if > we show Resource info from Queue Metrics User will not be able to co relate > it. (have attached the images for the same) > IIUC idea of not showing in the *Metrics overview table* is to highlight that > configuration is not proper. This needs to be some how conveyed through > parititon-by-queue-hierarchy chart. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5448) Resource in Cluster Metrics is not sum of resources in all nodes of all partitions
[ https://issues.apache.org/jira/browse/YARN-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15399069#comment-15399069 ] Sunil G commented on YARN-5448: --- >From use case perspective, there are 2 use case will pop from user # I have *N* nodes in my cluster configured with *M* GB of resource. But my cluster resource in web UI is not showing *N x M* resource. (Current behavior) # I have *X* resource in my cluster and cluster resource of web UI is displaying the same too. But resource allocation is only some % of *X*. (not for full cluster). 2nd question will pop on later after fixing in the proposed way here. I think these 2 are debatable topic and we can choose which one we can answer. As from the second part of your proposal, you are planning to show such labels in scheduler UI. But I am not sure how these labels can be visible enough to user from scheduler UI to convey the issue. Could you share how it may come. Hence I thought, we can have a column like "Non usable cluster resource" and we can hide if labels are not enabled too. As mentioned we can wait for other folks to pitch in too. > Resource in Cluster Metrics is not sum of resources in all nodes of all > partitions > -- > > Key: YARN-5448 > URL: https://issues.apache.org/jira/browse/YARN-5448 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager, webapp >Affects Versions: 2.7.2 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: NodesPage.png, schedulerPage.png > > > Currently Resource info from Cluster Metrics are got from Queue Metrics's > *available resource + allocated resource*. Hence if there are some nodes > which belongs to partition but if that partition is not associated with any > queue then in the capacity scheduler partition hierarchy shows this nodes > resources under its partition but Cluster metrics doesn't show. > Apart from this in the Nodes page too Metrics overview table is shown. So if > we show Resource info from Queue Metrics User will not be able to co relate > it. (have attached the images for the same) > IIUC idea of not showing in the *Metrics overview table* is to highlight that > configuration is not proper. This needs to be some how conveyed through > parititon-by-queue-hierarchy chart. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5448) Resource in Cluster Metrics is not sum of resources in all nodes of all partitions
[ https://issues.apache.org/jira/browse/YARN-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398991#comment-15398991 ] Naganarasimha G R commented on YARN-5448: - Thanks [~sunilg] for your thoughts, Well yes its based on perception and its debatable, but my two cents not in favor of your approach are : # This is one off wrong configuration scenario for this if we add additional columns, then in general usage(after correction) its not of much use. # Currently there are only two resources which are getting monitored (cpu and memory), what about other resources which in future we want to add? so adding multiple columns for each resource for this purpose doesn't seem good. # ??Admin can immediately notice that there are some label which are not-configured or part of cluster resource?? : Well its anyway related to Scheduler page if admin sees a warning in *parititon-by-queue-hierarchy chart* IMHO it should be sufficient. I would like to get the opinion of others too. > Resource in Cluster Metrics is not sum of resources in all nodes of all > partitions > -- > > Key: YARN-5448 > URL: https://issues.apache.org/jira/browse/YARN-5448 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager, webapp >Affects Versions: 2.7.2 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: NodesPage.png, schedulerPage.png > > > Currently Resource info from Cluster Metrics are got from Queue Metrics's > *available resource + allocated resource*. Hence if there are some nodes > which belongs to partition but if that partition is not associated with any > queue then in the capacity scheduler partition hierarchy shows this nodes > resources under its partition but Cluster metrics doesn't show. > Apart from this in the Nodes page too Metrics overview table is shown. So if > we show Resource info from Queue Metrics User will not be able to co relate > it. (have attached the images for the same) > IIUC idea of not showing in the *Metrics overview table* is to highlight that > configuration is not proper. This needs to be some how conveyed through > parititon-by-queue-hierarchy chart. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5448) Resource in Cluster Metrics is not sum of resources in all nodes of all partitions
[ https://issues.apache.org/jira/browse/YARN-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398962#comment-15398962 ] Sunil G commented on YARN-5448: --- bq.Metrics overview table needs to show the resources of all the active NM' [~Naganarasimha Garla]. I think its better to separate non-configured label resources as separate column in metrics table. Few advantages - Admin can immediately notice that there are some label which are not-configured or part of cluster resource , hence cannot be used. Any action can be taken with one look in metrics table. As I see, admins may choose not to configure any label due reasons like a) some labels need to be taken out of rotation b) may be a configuration mistake etc. - As per proposed approach, if we add up non-configured label resources to cluster resources, we need to go to scheduler page to get details. Yes, its a perception of seeing the data. I think its better if metrics are available separately in the very first place in cluster metrics table. - I think we can also have the approach in scheduler to have some info message for non-used or configured labels. It will definitely help to go for indepth analysis. Thoughts? > Resource in Cluster Metrics is not sum of resources in all nodes of all > partitions > -- > > Key: YARN-5448 > URL: https://issues.apache.org/jira/browse/YARN-5448 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager, webapp >Affects Versions: 2.7.2 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: NodesPage.png, schedulerPage.png > > > Currently Resource info from Cluster Metrics are got from Queue Metrics's > *available resource + allocated resource*. Hence if there are some nodes > which belongs to partition but if that partition is not associated with any > queue then in the capacity scheduler partition hierarchy shows this nodes > resources under its partition but Cluster metrics doesn't show. > Apart from this in the Nodes page too Metrics overview table is shown. So if > we show Resource info from Queue Metrics User will not be able to co relate > it. (have attached the images for the same) > IIUC idea of not showing in the *Metrics overview table* is to highlight that > configuration is not proper. This needs to be some how conveyed through > parititon-by-queue-hierarchy chart. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5448) Resource in Cluster Metrics is not sum of resources in all nodes of all partitions
[ https://issues.apache.org/jira/browse/YARN-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398927#comment-15398927 ] Naganarasimha G R commented on YARN-5448: - As per discussion with [~wangda], we had considered following options : # *Metrics overview table* needs to show the resources of all the active NM's # *parititon-by-queue-hierarchy chart*, we can show non-usable partition (wi) , and under the partition show "the partition is not assigned to any queue" Or other option is to show Partition resource in different font/color (red) and having tool tip as "the partition is not assigned to any queue" Thoughts ? CC [~sunilg],[~brahma], [~kanaka] & [~bibinchundatt] > Resource in Cluster Metrics is not sum of resources in all nodes of all > partitions > -- > > Key: YARN-5448 > URL: https://issues.apache.org/jira/browse/YARN-5448 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager, webapp >Affects Versions: 2.7.2 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: NodesPage.png, schedulerPage.png > > > Currently Resource info from Cluster Metrics are got from Queue Metrics's > *available resource + allocated resource*. Hence if there are some nodes > which belongs to partition but if that partition is not associated with any > queue then in the capacity scheduler partition hierarchy shows this nodes > resources under its partition but Cluster metrics doesn't show. > Apart from this in the Nodes page too Metrics overview table is shown. So if > we show Resource info from Queue Metrics User will not be able to co relate > it. (have attached the images for the same) > IIUC idea of not showing in the *Metrics overview table* is to highlight that > configuration is not proper. This needs to be some how conveyed through > parititon-by-queue-hierarchy chart. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org