[jira] [Resolved] (YARN-11667) Federation: ResourceRequestComparator occurs NPE when using low version of hadoop submit application
[ https://issues.apache.org/jira/browse/YARN-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] qiuliang resolved YARN-11667. - Resolution: Won't Do > Federation: ResourceRequestComparator occurs NPE when using low version of > hadoop submit application > > > Key: YARN-11667 > URL: https://issues.apache.org/jira/browse/YARN-11667 > Project: Hadoop YARN > Issue Type: Bug > Components: amrmproxy >Affects Versions: 3.4.0 >Reporter: qiuliang >Priority: Major > Labels: pull-request-available > > When a application is submitted using a lower version of hadoop and the > Resource Request built by AM has no ExecutionTypeRequest. After the Resource > Request is submitted to AMRMProxy, the NPE occurs when AMRMProxy reconstructs > the Allocate Request to add Resource Request to its ask -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11667) Federation: ResourceRequestComparator occurs NPE when using low version of hadoop submit application
[ https://issues.apache.org/jira/browse/YARN-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] qiuliang updated YARN-11667: External issue URL: (was: https://github.com/apache/hadoop/pull/6648) > Federation: ResourceRequestComparator occurs NPE when using low version of > hadoop submit application > > > Key: YARN-11667 > URL: https://issues.apache.org/jira/browse/YARN-11667 > Project: Hadoop YARN > Issue Type: Bug > Components: amrmproxy >Affects Versions: 3.4.0 >Reporter: qiuliang >Priority: Major > > When a application is submitted using a lower version of hadoop and the > Resource Request built by AM has no ExecutionTypeRequest. After the Resource > Request is submitted to AMRMProxy, the NPE occurs when AMRMProxy reconstructs > the Allocate Request to add Resource Request to its ask -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11667) Federation: ResourceRequestComparator occurs NPE when using low version of hadoop submit application
[ https://issues.apache.org/jira/browse/YARN-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] qiuliang updated YARN-11667: External issue URL: https://github.com/apache/hadoop/pull/6648 > Federation: ResourceRequestComparator occurs NPE when using low version of > hadoop submit application > > > Key: YARN-11667 > URL: https://issues.apache.org/jira/browse/YARN-11667 > Project: Hadoop YARN > Issue Type: Bug > Components: amrmproxy >Affects Versions: 3.4.0 >Reporter: qiuliang >Priority: Major > > When a application is submitted using a lower version of hadoop and the > Resource Request built by AM has no ExecutionTypeRequest. After the Resource > Request is submitted to AMRMProxy, the NPE occurs when AMRMProxy reconstructs > the Allocate Request to add Resource Request to its ask -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11667) Federation: ResourceRequestComparator occurs NPE when using low version of hadoop submit application
qiuliang created YARN-11667: --- Summary: Federation: ResourceRequestComparator occurs NPE when using low version of hadoop submit application Key: YARN-11667 URL: https://issues.apache.org/jira/browse/YARN-11667 Project: Hadoop YARN Issue Type: Bug Components: amrmproxy Affects Versions: 3.4.0 Reporter: qiuliang When a application is submitted using a lower version of hadoop and the Resource Request built by AM has no ExecutionTypeRequest. After the Resource Request is submitted to AMRMProxy, the NPE occurs when AMRMProxy reconstructs the Allocate Request to add Resource Request to its ask -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7599) [GPG] ApplicationCleaner in Global Policy Generator
[ https://issues.apache.org/jira/browse/YARN-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936320#comment-16936320 ] qiuliang commented on YARN-7599: Hello [~botong], Thank you every much, the patch is helpful to us. Now, I have a problem. When the diagnostics of application contains illegal characters, the XML is returned by http://rm-http-address:port/ws/v1/cluster/apps which contains illegal characters. Router couldn't parse this XML, and caused GPG delete all applications in the federation state store. Could you please give us some suggestions on this problem? > [GPG] ApplicationCleaner in Global Policy Generator > --- > > Key: YARN-7599 > URL: https://issues.apache.org/jira/browse/YARN-7599 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Labels: federation, gpg > Attachments: YARN-7599-YARN-7402.v1.patch, > YARN-7599-YARN-7402.v2.patch, YARN-7599-YARN-7402.v3.patch, > YARN-7599-YARN-7402.v4.patch, YARN-7599-YARN-7402.v5.patch, > YARN-7599-YARN-7402.v6.patch, YARN-7599-YARN-7402.v7.patch, > YARN-7599-YARN-7402.v8.patch > > > In Federation, we need a cleanup service for StateStore as well as Yarn > Registry. For the former, we need to remove old application records. For the > latter, failed and killed applications might leave records in the Yarn > Registry (see YARN-6128). We plan to do both cleanup work in > ApplicationCleaner in GPG -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9384) FederationInterceptorREST should broadcast KillApplication to all sub clusters
[ https://issues.apache.org/jira/browse/YARN-9384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896096#comment-16896096 ] qiuliang commented on YARN-9384: Hi [~mzhang], how do you solve this problem when executing "yarn application -kill"? > FederationInterceptorREST should broadcast KillApplication to all sub clusters > -- > > Key: YARN-9384 > URL: https://issues.apache.org/jira/browse/YARN-9384 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Reporter: Zhang >Priority: Major > Labels: federation > Attachments: YARN-9384.2.patch, YARN-9384.patch > > > Today KillApplication request from user client only goes to home cluster. As > a result the containers in secondary clusters continue processing for 10~15 > minutes (UAM heartbeat timeout). > This is not an favorable user experience especially when user has a streaming > job and sometime needs to restart it (like updating config or resources). In > this case, containers created by new job and old job can run at the same time. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9586) [QA] Need more doc for yarn.federation.policy-manager-params when LoadBasedRouterPolicy is used
[ https://issues.apache.org/jira/browse/YARN-9586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895211#comment-16895211 ] qiuliang commented on YARN-9586: Hi [~shenyinjie], I still don't know how to write this json file. Can you give me an example? Thank you~ > [QA] Need more doc for yarn.federation.policy-manager-params when > LoadBasedRouterPolicy is used > --- > > Key: YARN-9586 > URL: https://issues.apache.org/jira/browse/YARN-9586 > Project: Hadoop YARN > Issue Type: Wish > Components: federation >Reporter: Shen Yinjie >Priority: Major > > We picked LoadBasedRouterPolicy for YARN federation, but had no idea what to > set to yarn.federation.policy-manager-params. Is there a demo config or more > detailed description for this. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9437) RMNodeImpls occupy too much memory and causes RM GC to take a long time
[ https://issues.apache.org/jira/browse/YARN-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833205#comment-16833205 ] qiuliang commented on YARN-9437: Hi,[~eepayne],[~cheersyang],Could you please check my earlier comment and share your thoughts. Thank you. > RMNodeImpls occupy too much memory and causes RM GC to take a long time > --- > > Key: YARN-9437 > URL: https://issues.apache.org/jira/browse/YARN-9437 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.1 >Reporter: qiuliang >Priority: Minor > Attachments: 1.png, 2.png, 3.png, YARN-9437-v1.txt > > > We use hadoop-2.9.1 in our production environment with 1600+ nodes. 95.63% of > RM memory is occupied by RMNodeImpl. Analysis of RM memory found that each > RMNodeImpl has approximately 14M. The reason is that there is a 130,000+ > completedcontainers in each RMNodeImpl that has not been released. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9437) RMNodeImpls occupy too much memory and causes RM GC to take a long time
[ https://issues.apache.org/jira/browse/YARN-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811738#comment-16811738 ] qiuliang edited comment on YARN-9437 at 5/5/19 3:35 AM: According to my understanding, there are two cases that may cause the completedContainers in RMNodeImpl to not be released. 1. When RMAppAttemptImpl receives the CONTAINER_FINISHED(not amContainer) event, it will add this container to justFinishedContainers. When processing the AM heartbeat, RMAppAttemptImpl first sends the container in finishedContainersSentToAM to NM, and RMNodeImpl also removes these containers from the completedContainers. Then transfer the containers in justFinishedContainers to finishedContainersSentToAM and wait for the next AM heartbeat to send these containers to NM. If RMAppAttemptImpl accepts the event of AM unregistration, justFinishedContainers is not empty, then the container in justFinishedContainers may not have the opportunity to transfer to finishedContainersSentToAM, so that these containers are not sent to NM, and RMNodeImpl does not release these containers. 2. When RMAppAttemptImpl is in the final state and receives the CONTAINER_FINISHED event, just add this container to justFinishedContainers and not send it to NM. For the first case, my idea is that when RMAppAttemptImpl handles the amContainer finished event, the container in justFinishedContainers is transferred to finishedContainersSentToAM and sent to NM along with amContainer. I am not sure if there is any other impact. For the second case, when RMAppAttemptImpl is in the final state and receives the CONTAINER_FINISHED event, these containers are sent directly to NM, but I am worried that this will generate many events. was (Author: qiuliang988): As I understand it, there are two cases that may cause the completedContainers in RMNodeImpl to not be released. 1. When RMAppAttemptImpl receives the CONTAINER_FINISHED(not amContainer) event, it will add this container to justFinishedContainers. When processing the AM heartbeat, RMAppAttemptImpl first sends the container in finishedContainersSentToAM to NM, and RMNodeImpl also removes these containers from the completedContainers. Then transfer the containers in justFinishedContainers to finishedContainersSentToAM and wait for the next AM heartbeat to send these containers to NM. If RMAppAttemptImpl accepts the event of AM unregistration, justFinishedContainers is not empty, then the container in justFinishedContainers may not have the opportunity to transfer to finishedContainersSentToAM, so that these containers are not sent to NM, and RMNodeImpl does not release these containers. 2. When RMAppAttemptImpl is in the final state and receives the CONTAINER_FINISHED event, just add this container to justFinishedContainers and not send it to NM. For the first case, my idea is that when RMAppAttemptImpl handles the amContainer finished event, the container in justFinishedContainers is transferred to finishedContainersSentToAM and sent to NM along with amContainer. I am not sure if there is any other impact. For the second case, when RMAppAttemptImpl is in the final state and receives the CONTAINER_FINISHED event, these containers are sent directly to NM, but I am worried that this will generate many events. > RMNodeImpls occupy too much memory and causes RM GC to take a long time > --- > > Key: YARN-9437 > URL: https://issues.apache.org/jira/browse/YARN-9437 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.1 >Reporter: qiuliang >Priority: Minor > Attachments: 1.png, 2.png, 3.png, YARN-9437-v1.txt > > > We use hadoop-2.9.1 in our production environment with 1600+ nodes. 95.63% of > RM memory is occupied by RMNodeImpl. Analysis of RM memory found that each > RMNodeImpl has approximately 14M. The reason is that there is a 130,000+ > completedcontainers in each RMNodeImpl that has not been released. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9437) RMNodeImpls occupy too much memory and causes RM GC to take a long time
[ https://issues.apache.org/jira/browse/YARN-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] qiuliang updated YARN-9437: --- Attachment: YARN-9437-v1.txt > RMNodeImpls occupy too much memory and causes RM GC to take a long time > --- > > Key: YARN-9437 > URL: https://issues.apache.org/jira/browse/YARN-9437 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.1 >Reporter: qiuliang >Priority: Minor > Attachments: 1.png, 2.png, 3.png, YARN-9437-v1.txt > > > We use hadoop-2.9.1 in our production environment with 1600+ nodes. 95.63% of > RM memory is occupied by RMNodeImpl. Analysis of RM memory found that each > RMNodeImpl has approximately 14M. The reason is that there is a 130,000+ > completedcontainers in each RMNodeImpl that has not been released. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9437) RMNodeImpls occupy too much memory and causes RM GC to take a long time
[ https://issues.apache.org/jira/browse/YARN-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] qiuliang updated YARN-9437: --- Attachment: (was: YARN_9437-v1.txt) > RMNodeImpls occupy too much memory and causes RM GC to take a long time > --- > > Key: YARN-9437 > URL: https://issues.apache.org/jira/browse/YARN-9437 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.1 >Reporter: qiuliang >Priority: Minor > Attachments: 1.png, 2.png, 3.png > > > We use hadoop-2.9.1 in our production environment with 1600+ nodes. 95.63% of > RM memory is occupied by RMNodeImpl. Analysis of RM memory found that each > RMNodeImpl has approximately 14M. The reason is that there is a 130,000+ > completedcontainers in each RMNodeImpl that has not been released. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9437) RMNodeImpls occupy too much memory and causes RM GC to take a long time
[ https://issues.apache.org/jira/browse/YARN-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] qiuliang updated YARN-9437: --- Attachment: YARN_9437-v1.txt > RMNodeImpls occupy too much memory and causes RM GC to take a long time > --- > > Key: YARN-9437 > URL: https://issues.apache.org/jira/browse/YARN-9437 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.1 >Reporter: qiuliang >Priority: Minor > Attachments: 1.png, 2.png, 3.png > > > We use hadoop-2.9.1 in our production environment with 1600+ nodes. 95.63% of > RM memory is occupied by RMNodeImpl. Analysis of RM memory found that each > RMNodeImpl has approximately 14M. The reason is that there is a 130,000+ > completedcontainers in each RMNodeImpl that has not been released. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9437) RMNodeImpls occupy too much memory and causes RM GC to take a long time
[ https://issues.apache.org/jira/browse/YARN-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811738#comment-16811738 ] qiuliang commented on YARN-9437: As I understand it, there are two cases that may cause the completedContainers in RMNodeImpl to not be released. 1. When RMAppAttemptImpl receives the CONTAINER_FINISHED(not amContainer) event, it will add this container to justFinishedContainers. When processing the AM heartbeat, RMAppAttemptImpl first sends the container in finishedContainersSentToAM to NM, and RMNodeImpl also removes these containers from the completedContainers. Then transfer the containers in justFinishedContainers to finishedContainersSentToAM and wait for the next AM heartbeat to send these containers to NM. If RMAppAttemptImpl accepts the event of AM unregistration, justFinishedContainers is not empty, then the container in justFinishedContainers may not have the opportunity to transfer to finishedContainersSentToAM, so that these containers are not sent to NM, and RMNodeImpl does not release these containers. 2. When RMAppAttemptImpl is in the final state and receives the CONTAINER_FINISHED event, just add this container to justFinishedContainers and not send it to NM. For the first case, my idea is that when RMAppAttemptImpl handles the amContainer finished event, the container in justFinishedContainers is transferred to finishedContainersSentToAM and sent to NM along with amContainer. I am not sure if there is any other impact. For the second case, when RMAppAttemptImpl is in the final state and receives the CONTAINER_FINISHED event, these containers are sent directly to NM, but I am worried that this will generate many events. > RMNodeImpls occupy too much memory and causes RM GC to take a long time > --- > > Key: YARN-9437 > URL: https://issues.apache.org/jira/browse/YARN-9437 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.1 >Reporter: qiuliang >Priority: Minor > Attachments: 1.png, 2.png, 3.png > > > We use hadoop-2.9.1 in our production environment with 1600+ nodes. 95.63% of > RM memory is occupied by RMNodeImpl. Analysis of RM memory found that each > RMNodeImpl has approximately 14M. The reason is that there is a 130,000+ > completedcontainers in each RMNodeImpl that has not been released. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9437) RMNodeImpls occupy too much memory and causes RM GC to take a long time
[ https://issues.apache.org/jira/browse/YARN-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] qiuliang updated YARN-9437: --- Description: We use hadoop-2.9.1 in our production environment with 1600+ nodes. 95.63% of RM memory is occupied by RMNodeImpl. Analysis of RM memory found that each RMNodeImpl has approximately 14M. The reason is that there is a 130,000+ completedcontainers in each RMNodeImpl that has not been released. (was: We use hadoop-2.9.1 in our production environment with 1600+ nodes. 95.63% of RM memory is occupied by RMNodeImpl. Analysis of RM memory found that each RMNodeImpl has approximately 14M. The reason is that there is a 13W+ completedcontainers in each RMNodeImpl that has not been released.) > RMNodeImpls occupy too much memory and causes RM GC to take a long time > --- > > Key: YARN-9437 > URL: https://issues.apache.org/jira/browse/YARN-9437 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.1 >Reporter: qiuliang >Priority: Blocker > Attachments: 1.png, 2.png, 3.png > > > We use hadoop-2.9.1 in our production environment with 1600+ nodes. 95.63% of > RM memory is occupied by RMNodeImpl. Analysis of RM memory found that each > RMNodeImpl has approximately 14M. The reason is that there is a 130,000+ > completedcontainers in each RMNodeImpl that has not been released. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9437) RMNodeImpls occupy too much memory and causes RM GC to take a long time
qiuliang created YARN-9437: -- Summary: RMNodeImpls occupy too much memory and causes RM GC to take a long time Key: YARN-9437 URL: https://issues.apache.org/jira/browse/YARN-9437 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.9.1 Reporter: qiuliang Attachments: 1.png, 2.png, 3.png We use hadoop-2.9.1 in our production environment with 1600+ nodes. 95.63% of RM memory is occupied by RMNodeImpl. Analysis of RM memory found that each RMNodeImpl has approximately 14M. The reason is that there is a 13W+ completedcontainers in each RMNodeImpl that has not been released. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-8978) For fair scheduler, application with higher priority should also get priority resources for running AM
[ https://issues.apache.org/jira/browse/YARN-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] qiuliang reopened YARN-8978: > For fair scheduler, application with higher priority should also get priority > resources for running AM > -- > > Key: YARN-8978 > URL: https://issues.apache.org/jira/browse/YARN-8978 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: qiuliang >Priority: Major > Attachments: YARN-8978.001.patch > > > In order to allow important applications to run earlier, we used priority > scheduling in the fair scheduler, and FairSharePolicy uses YARN-6307. > Considering this situation, there are two applications (with different > priorities) in the same queue and both are accepted. Both applications are > demanding and hungry when dispatched to the queue. Next, calculate the weight > ratio. Since the used resources of both applications are 0, the weight ratio > is also 0. The priority is invalid in this case. Low-priority applications > may get resources to run AM earlier than high-priority applications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8978) For fair scheduler, application with higher priority should also get priority resources for running AM
[ https://issues.apache.org/jira/browse/YARN-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] qiuliang updated YARN-8978: --- Release Note: (was: index 0ef90a1..d2b5ad7 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java @@ -169,8 +169,13 @@ private int compareFairShareUsage(Schedulable s1, Schedulable s2, double useToWeightRatio1; double useToWeightRatio2; if (weight1 > 0.0 && weight2 > 0.0) { -useToWeightRatio1 = resourceUsage1.getMemorySize() / weight1; -useToWeightRatio2 = resourceUsage2.getMemorySize() / weight2; +if (resourceUsage1.getMemorySize() == 0 && resourceUsage2.getMemorySize() == 0){ + useToWeightRatio1 = ONE.getMemorySize() / weight1; + useToWeightRatio2 = ONE.getMemorySize() / weight2; +} else { + useToWeightRatio1 = resourceUsage1.getMemorySize() / weight1; + useToWeightRatio2 = resourceUsage2.getMemorySize() / weight2; +} } else { // Either weight1 or weight2 equals to 0 if (weight1 == weight2) { // If they have same weight, just compare usage ) > For fair scheduler, application with higher priority should also get priority > resources for running AM > -- > > Key: YARN-8978 > URL: https://issues.apache.org/jira/browse/YARN-8978 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: qiuliang >Priority: Major > > In order to allow important applications to run earlier, we used priority > scheduling in the fair scheduler, and FairSharePolicy uses YARN-6307. > Considering this situation, there are two applications (with different > priorities) in the same queue and both are accepted. Both applications are > demanding and hungry when dispatched to the queue. Next, calculate the weight > ratio. Since the used resources of both applications are 0, the weight ratio > is also 0. The priority is invalid in this case. Low-priority applications > may get resources to run AM earlier than high-priority applications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8978) For fair scheduler, application with higher priority should also get priority resources for running AM
[ https://issues.apache.org/jira/browse/YARN-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] qiuliang updated YARN-8978: --- Flags: (was: Patch) Labels: (was: patch) Release Note: index 0ef90a1..d2b5ad7 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java @@ -169,8 +169,13 @@ private int compareFairShareUsage(Schedulable s1, Schedulable s2, double useToWeightRatio1; double useToWeightRatio2; if (weight1 > 0.0 && weight2 > 0.0) { -useToWeightRatio1 = resourceUsage1.getMemorySize() / weight1; -useToWeightRatio2 = resourceUsage2.getMemorySize() / weight2; +if (resourceUsage1.getMemorySize() == 0 && resourceUsage2.getMemorySize() == 0){ + useToWeightRatio1 = ONE.getMemorySize() / weight1; + useToWeightRatio2 = ONE.getMemorySize() / weight2; +} else { + useToWeightRatio1 = resourceUsage1.getMemorySize() / weight1; + useToWeightRatio2 = resourceUsage2.getMemorySize() / weight2; +} } else { // Either weight1 or weight2 equals to 0 if (weight1 == weight2) { // If they have same weight, just compare usage was: diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java index 0ef90a1..d2b5ad7 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java @@ -169,8 +169,13 @@ private int compareFairShareUsage(Schedulable s1, Schedulable s2, double useToWeightRatio1; double useToWeightRatio2; if (weight1 > 0.0 && weight2 > 0.0) { -useToWeightRatio1 = resourceUsage1.getMemorySize() / weight1; -useToWeightRatio2 = resourceUsage2.getMemorySize() / weight2; +if (resourceUsage1.getMemorySize() == 0 && resourceUsage2.getMemorySize() == 0){ + useToWeightRatio1 = ONE.getMemorySize() / weight1; + useToWeightRatio2 = ONE.getMemorySize() / weight2; +} else { + useToWeightRatio1 = resourceUsage1.getMemorySize() / weight1; + useToWeightRatio2 = resourceUsage2.getMemorySize() / weight2; +} } else { // Either weight1 or weight2 equals to 0 if (weight1 == weight2) { // If they have same weight, just compare usage > For fair scheduler, application with higher priority should also get priority > resources for running AM > -- > > Key: YARN-8978 > URL: https://issues.apache.org/jira/browse/YARN-8978 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: qiuliang >Priority: Major > > In order to allow important applications to run earlier, we used priority > scheduling in the fair scheduler, and FairSharePolicy uses YARN-6307. > Considering this situation, there are two applications (with different > priorities) in the same queue and both are accepted. Both applications are > demanding and hungry when dispatched to the queue. Next, calculate the weight > ratio. Since the used resources of both applications are 0, the weight ratio > is also 0. The priority is invalid in this case. Low-priority applications > may get resources to run AM earlier than high-priority applications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8978) For fair scheduler, application with higher priority should also get priority resources for running AM
qiuliang created YARN-8978: -- Summary: For fair scheduler, application with higher priority should also get priority resources for running AM Key: YARN-8978 URL: https://issues.apache.org/jira/browse/YARN-8978 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: qiuliang In order to allow important applications to run earlier, we used priority scheduling in the fair scheduler, and FairSharePolicy uses YARN-6307. Considering this situation, there are two applications (with different priorities) in the same queue and both are accepted. Both applications are demanding and hungry when dispatched to the queue. Next, calculate the weight ratio. Since the used resources of both applications are 0, the weight ratio is also 0. The priority is invalid in this case. Low-priority applications may get resources to run AM earlier than high-priority applications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4879) Enhance Allocate Protocol to Identify Requests Explicitly
[ https://issues.apache.org/jira/browse/YARN-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597011#comment-16597011 ] qiuliang commented on YARN-4879: Thanks for putting the doc together with all the details. And I have a question that why the number of Rack2 for Req2 is 3? Node1 and Node4 on Rack1, Node5 on Rack2, so the number of Rack1 is 4 and Rack2 is 2. Where am I wrong? > Enhance Allocate Protocol to Identify Requests Explicitly > - > > Key: YARN-4879 > URL: https://issues.apache.org/jira/browse/YARN-4879 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan >Priority: Major > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: SimpleAllocateProtocolProposal-v1.pdf, > SimpleAllocateProtocolProposal-v2.pdf > > > For legacy reasons, the current allocate protocol expects expanded requests > which represent the cumulative request for any change in resource > constraints. This is not only very difficult to comprehend but makes it > impossible for the scheduler to associate container allocations to the > original requests. This problem is amplified by the fact that the expansion > is managed by the AMRMClient which makes it cumbersome for non-Java clients > as they all have to replicate the non-trivial logic. In this JIRA, we are > proposing enhancement to the Allocate Protocol to allow AMs to identify > requests explicitly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org