[jira] [Commented] (YARN-2293) Scoring for NMs to identify a better candidate to launch AMs
[ https://issues.apache.org/jira/browse/YARN-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526441#comment-14526441 ] Sunil G commented on YARN-2293: --- Hi [~zjshen] This work is moved to YARN-2005, I will share a basic prototype soon in that. This can be made as duplicated to YARN-2005. Scoring for NMs to identify a better candidate to launch AMs Key: YARN-2293 URL: https://issues.apache.org/jira/browse/YARN-2293 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Reporter: Sunil G Assignee: Sunil G Container exit status from NM is giving indications of reasons for its failure. Some times, it may be because of container launching problems in NM. In a heterogeneous cluster, some machines with weak hardware may cause more failures. It will be better not to launch AMs there more often. Also I would like to clear that container failures because of buggy job should not result in decreasing score. As mentioned earlier, based on exit status if a scoring mechanism is added for NMs in RM, then NMs with better scores can be given for launching AMs. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2293) Scoring for NMs to identify a better candidate to launch AMs
[ https://issues.apache.org/jira/browse/YARN-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524229#comment-14524229 ] Zhijie Shen commented on YARN-2293: --- Are we still interested in this improvement? Otherwise, we can close this jira as won't fix. Scoring for NMs to identify a better candidate to launch AMs Key: YARN-2293 URL: https://issues.apache.org/jira/browse/YARN-2293 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Reporter: Sunil G Assignee: Sunil G Container exit status from NM is giving indications of reasons for its failure. Some times, it may be because of container launching problems in NM. In a heterogeneous cluster, some machines with weak hardware may cause more failures. It will be better not to launch AMs there more often. Also I would like to clear that container failures because of buggy job should not result in decreasing score. As mentioned earlier, based on exit status if a scoring mechanism is added for NMs in RM, then NMs with better scores can be given for launching AMs. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2293) Scoring for NMs to identify a better candidate to launch AMs
[ https://issues.apache.org/jira/browse/YARN-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182514#comment-14182514 ] sna commented on YARN-2293: --- Have you realized the target? Scoring for NMs to identify a better candidate to launch AMs Key: YARN-2293 URL: https://issues.apache.org/jira/browse/YARN-2293 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Reporter: Sunil G Assignee: Sunil G Container exit status from NM is giving indications of reasons for its failure. Some times, it may be because of container launching problems in NM. In a heterogeneous cluster, some machines with weak hardware may cause more failures. It will be better not to launch AMs there more often. Also I would like to clear that container failures because of buggy job should not result in decreasing score. As mentioned earlier, based on exit status if a scoring mechanism is added for NMs in RM, then NMs with better scores can be given for launching AMs. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2293) Scoring for NMs to identify a better candidate to launch AMs
[ https://issues.apache.org/jira/browse/YARN-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063772#comment-14063772 ] Sunil G commented on YARN-2293: --- Thank you [~jlowe] for update. Yes, it generically coming under YARN-2005 by considering this as blacklisting, but as you told eventually least scored NM can also get an AM as extreme case. I will continue share our thoughts on this part on YARN-2005 with approach. Scoring for NMs to identify a better candidate to launch AMs Key: YARN-2293 URL: https://issues.apache.org/jira/browse/YARN-2293 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Reporter: Sunil G Assignee: Sunil G Container exit status from NM is giving indications of reasons for its failure. Some times, it may be because of container launching problems in NM. In a heterogeneous cluster, some machines with weak hardware may cause more failures. It will be better not to launch AMs there more often. Also I would like to clear that container failures because of buggy job should not result in decreasing score. As mentioned earlier, based on exit status if a scoring mechanism is added for NMs in RM, then NMs with better scores can be given for launching AMs. Thoughts? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2293) Scoring for NMs to identify a better candidate to launch AMs
[ https://issues.apache.org/jira/browse/YARN-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062530#comment-14062530 ] Jason Lowe commented on YARN-2293: -- This sounds very similar to YARN-2005, if a bit more general. This approach sounds like it could support a gray area for NMs where it really doesn't like to launch AMs on a node but may choose to do so anyway if that's the only place it can find. It may be more fruitful to continue this discussion over on YARN-2005 and hash through how exit status would map to scoring adjustments, how the score would affect scheduling, and work through various corner cases. Scoring for NMs to identify a better candidate to launch AMs Key: YARN-2293 URL: https://issues.apache.org/jira/browse/YARN-2293 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Reporter: Sunil G Assignee: Sunil G Container exit status from NM is giving indications of reasons for its failure. Some times, it may be because of container launching problems in NM. In a heterogeneous cluster, some machines with weak hardware may cause more failures. It will be better not to launch AMs there more often. Also I would like to clear that container failures because of buggy job should not result in decreasing score. As mentioned earlier, based on exit status if a scoring mechanism is added for NMs in RM, then NMs with better scores can be given for launching AMs. Thoughts? -- This message was sent by Atlassian JIRA (v6.2#6252)