[jira] [Resolved] (YARN-11667) Federation: ResourceRequestComparator occurs NPE when using low version of hadoop submit application

2024-03-21 Thread qiuliang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qiuliang resolved YARN-11667.
-
Resolution: Won't Do

> Federation: ResourceRequestComparator occurs NPE when using low version of 
> hadoop submit application
> 
>
> Key: YARN-11667
> URL: https://issues.apache.org/jira/browse/YARN-11667
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: amrmproxy
>Affects Versions: 3.4.0
>Reporter: qiuliang
>Priority: Major
>  Labels: pull-request-available
>
> When a application is submitted using a lower version of hadoop and the 
> Resource Request built by AM has no ExecutionTypeRequest. After the Resource 
> Request is submitted to AMRMProxy, the NPE occurs when AMRMProxy reconstructs 
> the Allocate Request to add Resource Request to its ask



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11667) Federation: ResourceRequestComparator occurs NPE when using low version of hadoop submit application

2024-03-21 Thread qiuliang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qiuliang updated YARN-11667:

External issue URL:   (was: https://github.com/apache/hadoop/pull/6648)

> Federation: ResourceRequestComparator occurs NPE when using low version of 
> hadoop submit application
> 
>
> Key: YARN-11667
> URL: https://issues.apache.org/jira/browse/YARN-11667
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: amrmproxy
>Affects Versions: 3.4.0
>Reporter: qiuliang
>Priority: Major
>
> When a application is submitted using a lower version of hadoop and the 
> Resource Request built by AM has no ExecutionTypeRequest. After the Resource 
> Request is submitted to AMRMProxy, the NPE occurs when AMRMProxy reconstructs 
> the Allocate Request to add Resource Request to its ask



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11667) Federation: ResourceRequestComparator occurs NPE when using low version of hadoop submit application

2024-03-21 Thread qiuliang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qiuliang updated YARN-11667:

External issue URL: https://github.com/apache/hadoop/pull/6648

> Federation: ResourceRequestComparator occurs NPE when using low version of 
> hadoop submit application
> 
>
> Key: YARN-11667
> URL: https://issues.apache.org/jira/browse/YARN-11667
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: amrmproxy
>Affects Versions: 3.4.0
>Reporter: qiuliang
>Priority: Major
>
> When a application is submitted using a lower version of hadoop and the 
> Resource Request built by AM has no ExecutionTypeRequest. After the Resource 
> Request is submitted to AMRMProxy, the NPE occurs when AMRMProxy reconstructs 
> the Allocate Request to add Resource Request to its ask



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11667) Federation: ResourceRequestComparator occurs NPE when using low version of hadoop submit application

2024-03-20 Thread qiuliang (Jira)
qiuliang created YARN-11667:
---

 Summary: Federation: ResourceRequestComparator occurs NPE when 
using low version of hadoop submit application
 Key: YARN-11667
 URL: https://issues.apache.org/jira/browse/YARN-11667
 Project: Hadoop YARN
  Issue Type: Bug
  Components: amrmproxy
Affects Versions: 3.4.0
Reporter: qiuliang


When a application is submitted using a lower version of hadoop and the 
Resource Request built by AM has no ExecutionTypeRequest. After the Resource 
Request is submitted to AMRMProxy, the NPE occurs when AMRMProxy reconstructs 
the Allocate Request to add Resource Request to its ask



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7599) [GPG] ApplicationCleaner in Global Policy Generator

2019-09-23 Thread qiuliang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936320#comment-16936320
 ] 

qiuliang commented on YARN-7599:


Hello [~botong], Thank you every much, the patch is helpful to us.
Now, I have a problem. When the diagnostics of application contains illegal 
characters, the XML is returned by 
http://rm-http-address:port/ws/v1/cluster/apps which contains illegal 
characters. Router couldn't parse this XML, and caused GPG delete all 
applications in the federation state store. Could you please give us some 
suggestions on this problem?

> [GPG] ApplicationCleaner in Global Policy Generator
> ---
>
> Key: YARN-7599
> URL: https://issues.apache.org/jira/browse/YARN-7599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
>  Labels: federation, gpg
> Attachments: YARN-7599-YARN-7402.v1.patch, 
> YARN-7599-YARN-7402.v2.patch, YARN-7599-YARN-7402.v3.patch, 
> YARN-7599-YARN-7402.v4.patch, YARN-7599-YARN-7402.v5.patch, 
> YARN-7599-YARN-7402.v6.patch, YARN-7599-YARN-7402.v7.patch, 
> YARN-7599-YARN-7402.v8.patch
>
>
> In Federation, we need a cleanup service for StateStore as well as Yarn 
> Registry. For the former, we need to remove old application records. For the 
> latter, failed and killed applications might leave records in the Yarn 
> Registry (see YARN-6128). We plan to do both cleanup work in 
> ApplicationCleaner in GPG



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9384) FederationInterceptorREST should broadcast KillApplication to all sub clusters

2019-07-30 Thread qiuliang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896096#comment-16896096
 ] 

qiuliang commented on YARN-9384:


Hi [~mzhang], how do you solve this problem when executing "yarn application 
-kill"?

> FederationInterceptorREST should broadcast KillApplication to all sub clusters
> --
>
> Key: YARN-9384
> URL: https://issues.apache.org/jira/browse/YARN-9384
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Reporter: Zhang
>Priority: Major
>  Labels: federation
> Attachments: YARN-9384.2.patch, YARN-9384.patch
>
>
> Today KillApplication request from user client only goes to home cluster. As 
> a result the containers in secondary clusters continue processing for 10~15 
> minutes (UAM heartbeat timeout).
> This is not an favorable user experience especially when user has a streaming 
> job and sometime needs to restart it (like updating config or resources). In 
> this case, containers created by new job and old job can run at the same time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9586) [QA] Need more doc for yarn.federation.policy-manager-params when LoadBasedRouterPolicy is used

2019-07-29 Thread qiuliang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895211#comment-16895211
 ] 

qiuliang commented on YARN-9586:


Hi [~shenyinjie], I still don't know how to write this json file. Can you give 
me an example? Thank you~

> [QA] Need more doc for yarn.federation.policy-manager-params when 
> LoadBasedRouterPolicy is used
> ---
>
> Key: YARN-9586
> URL: https://issues.apache.org/jira/browse/YARN-9586
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: federation
>Reporter: Shen Yinjie
>Priority: Major
>
> We picked LoadBasedRouterPolicy for YARN federation, but had no idea what to 
>  set to yarn.federation.policy-manager-params. Is there a demo config or more 
> detailed description for this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9437) RMNodeImpls occupy too much memory and causes RM GC to take a long time

2019-05-04 Thread qiuliang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833205#comment-16833205
 ] 

qiuliang commented on YARN-9437:


Hi,[~eepayne],[~cheersyang],Could you please check my earlier comment and share 
your thoughts.
Thank you.

> RMNodeImpls occupy too much memory and causes RM GC to take a long time
> ---
>
> Key: YARN-9437
> URL: https://issues.apache.org/jira/browse/YARN-9437
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.1
>Reporter: qiuliang
>Priority: Minor
> Attachments: 1.png, 2.png, 3.png, YARN-9437-v1.txt
>
>
> We use hadoop-2.9.1 in our production environment with 1600+ nodes. 95.63% of 
> RM memory is occupied by RMNodeImpl. Analysis of RM memory found that each 
> RMNodeImpl has approximately 14M. The reason is that there is a 130,000+ 
> completedcontainers in each RMNodeImpl that has not been released.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9437) RMNodeImpls occupy too much memory and causes RM GC to take a long time

2019-05-04 Thread qiuliang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811738#comment-16811738
 ] 

qiuliang edited comment on YARN-9437 at 5/5/19 3:35 AM:


According to my understanding, there are two cases that may cause the 
completedContainers in RMNodeImpl to not be released.
1. When RMAppAttemptImpl receives the CONTAINER_FINISHED(not amContainer) 
event, it will add this container to justFinishedContainers. When processing 
the AM heartbeat, RMAppAttemptImpl first sends the container in 
finishedContainersSentToAM to NM, and RMNodeImpl also removes these containers 
from the completedContainers. Then transfer the containers in 
justFinishedContainers to finishedContainersSentToAM and wait for the next AM 
heartbeat to send these containers to NM. If RMAppAttemptImpl accepts the event 
of AM unregistration, justFinishedContainers is not empty, then the container 
in justFinishedContainers may not have the opportunity to transfer to 
finishedContainersSentToAM, so that these containers are not sent to NM, and 
RMNodeImpl does not release these containers.
2. When RMAppAttemptImpl is in the final state and receives the 
CONTAINER_FINISHED event, just add this container to justFinishedContainers and 
not send it to NM.
For the first case, my idea is that when RMAppAttemptImpl handles the 
amContainer finished event, the container in justFinishedContainers is 
transferred to finishedContainersSentToAM and sent to NM along with 
amContainer. I am not sure if there is any other impact. For the second case, 
when RMAppAttemptImpl is in the final state and receives the CONTAINER_FINISHED 
event, these containers are sent directly to NM, but I am worried that this 
will generate many events.


was (Author: qiuliang988):
As I understand it, there are two cases that may cause the completedContainers 
in RMNodeImpl to not be released.
1. When RMAppAttemptImpl receives the CONTAINER_FINISHED(not amContainer) 
event, it will add this container to justFinishedContainers. When processing 
the AM heartbeat, RMAppAttemptImpl first sends the container in 
finishedContainersSentToAM to NM, and RMNodeImpl also removes these containers 
from the completedContainers. Then transfer the containers in 
justFinishedContainers to finishedContainersSentToAM and wait for the next AM 
heartbeat to send these containers to NM. If RMAppAttemptImpl accepts the event 
of AM unregistration, justFinishedContainers is not empty, then the container 
in justFinishedContainers may not have the opportunity to transfer to 
finishedContainersSentToAM, so that these containers are not sent to NM, and 
RMNodeImpl does not release these containers.
2. When RMAppAttemptImpl is in the final state and receives the 
CONTAINER_FINISHED event, just add this container to justFinishedContainers and 
not send it to NM.
For the first case, my idea is that when RMAppAttemptImpl handles the 
amContainer finished event, the container in justFinishedContainers is 
transferred to finishedContainersSentToAM and sent to NM along with 
amContainer. I am not sure if there is any other impact. For the second case, 
when RMAppAttemptImpl is in the final state and receives the CONTAINER_FINISHED 
event, these containers are sent directly to NM, but I am worried that this 
will generate many events.

> RMNodeImpls occupy too much memory and causes RM GC to take a long time
> ---
>
> Key: YARN-9437
> URL: https://issues.apache.org/jira/browse/YARN-9437
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.1
>Reporter: qiuliang
>Priority: Minor
> Attachments: 1.png, 2.png, 3.png, YARN-9437-v1.txt
>
>
> We use hadoop-2.9.1 in our production environment with 1600+ nodes. 95.63% of 
> RM memory is occupied by RMNodeImpl. Analysis of RM memory found that each 
> RMNodeImpl has approximately 14M. The reason is that there is a 130,000+ 
> completedcontainers in each RMNodeImpl that has not been released.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9437) RMNodeImpls occupy too much memory and causes RM GC to take a long time

2019-04-14 Thread qiuliang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qiuliang updated YARN-9437:
---
Attachment: YARN-9437-v1.txt

> RMNodeImpls occupy too much memory and causes RM GC to take a long time
> ---
>
> Key: YARN-9437
> URL: https://issues.apache.org/jira/browse/YARN-9437
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.1
>Reporter: qiuliang
>Priority: Minor
> Attachments: 1.png, 2.png, 3.png, YARN-9437-v1.txt
>
>
> We use hadoop-2.9.1 in our production environment with 1600+ nodes. 95.63% of 
> RM memory is occupied by RMNodeImpl. Analysis of RM memory found that each 
> RMNodeImpl has approximately 14M. The reason is that there is a 130,000+ 
> completedcontainers in each RMNodeImpl that has not been released.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9437) RMNodeImpls occupy too much memory and causes RM GC to take a long time

2019-04-14 Thread qiuliang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qiuliang updated YARN-9437:
---
Attachment: (was: YARN_9437-v1.txt)

> RMNodeImpls occupy too much memory and causes RM GC to take a long time
> ---
>
> Key: YARN-9437
> URL: https://issues.apache.org/jira/browse/YARN-9437
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.1
>Reporter: qiuliang
>Priority: Minor
> Attachments: 1.png, 2.png, 3.png
>
>
> We use hadoop-2.9.1 in our production environment with 1600+ nodes. 95.63% of 
> RM memory is occupied by RMNodeImpl. Analysis of RM memory found that each 
> RMNodeImpl has approximately 14M. The reason is that there is a 130,000+ 
> completedcontainers in each RMNodeImpl that has not been released.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9437) RMNodeImpls occupy too much memory and causes RM GC to take a long time

2019-04-14 Thread qiuliang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qiuliang updated YARN-9437:
---
Attachment: YARN_9437-v1.txt

> RMNodeImpls occupy too much memory and causes RM GC to take a long time
> ---
>
> Key: YARN-9437
> URL: https://issues.apache.org/jira/browse/YARN-9437
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.1
>Reporter: qiuliang
>Priority: Minor
> Attachments: 1.png, 2.png, 3.png
>
>
> We use hadoop-2.9.1 in our production environment with 1600+ nodes. 95.63% of 
> RM memory is occupied by RMNodeImpl. Analysis of RM memory found that each 
> RMNodeImpl has approximately 14M. The reason is that there is a 130,000+ 
> completedcontainers in each RMNodeImpl that has not been released.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9437) RMNodeImpls occupy too much memory and causes RM GC to take a long time

2019-04-07 Thread qiuliang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811738#comment-16811738
 ] 

qiuliang commented on YARN-9437:


As I understand it, there are two cases that may cause the completedContainers 
in RMNodeImpl to not be released.
1. When RMAppAttemptImpl receives the CONTAINER_FINISHED(not amContainer) 
event, it will add this container to justFinishedContainers. When processing 
the AM heartbeat, RMAppAttemptImpl first sends the container in 
finishedContainersSentToAM to NM, and RMNodeImpl also removes these containers 
from the completedContainers. Then transfer the containers in 
justFinishedContainers to finishedContainersSentToAM and wait for the next AM 
heartbeat to send these containers to NM. If RMAppAttemptImpl accepts the event 
of AM unregistration, justFinishedContainers is not empty, then the container 
in justFinishedContainers may not have the opportunity to transfer to 
finishedContainersSentToAM, so that these containers are not sent to NM, and 
RMNodeImpl does not release these containers.
2. When RMAppAttemptImpl is in the final state and receives the 
CONTAINER_FINISHED event, just add this container to justFinishedContainers and 
not send it to NM.
For the first case, my idea is that when RMAppAttemptImpl handles the 
amContainer finished event, the container in justFinishedContainers is 
transferred to finishedContainersSentToAM and sent to NM along with 
amContainer. I am not sure if there is any other impact. For the second case, 
when RMAppAttemptImpl is in the final state and receives the CONTAINER_FINISHED 
event, these containers are sent directly to NM, but I am worried that this 
will generate many events.

> RMNodeImpls occupy too much memory and causes RM GC to take a long time
> ---
>
> Key: YARN-9437
> URL: https://issues.apache.org/jira/browse/YARN-9437
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.1
>Reporter: qiuliang
>Priority: Minor
> Attachments: 1.png, 2.png, 3.png
>
>
> We use hadoop-2.9.1 in our production environment with 1600+ nodes. 95.63% of 
> RM memory is occupied by RMNodeImpl. Analysis of RM memory found that each 
> RMNodeImpl has approximately 14M. The reason is that there is a 130,000+ 
> completedcontainers in each RMNodeImpl that has not been released.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9437) RMNodeImpls occupy too much memory and causes RM GC to take a long time

2019-04-04 Thread qiuliang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qiuliang updated YARN-9437:
---
Description: We use hadoop-2.9.1 in our production environment with 1600+ 
nodes. 95.63% of RM memory is occupied by RMNodeImpl. Analysis of RM memory 
found that each RMNodeImpl has approximately 14M. The reason is that there is a 
130,000+ completedcontainers in each RMNodeImpl that has not been released.  
(was: We use hadoop-2.9.1 in our production environment with 1600+ nodes. 
95.63% of RM memory is occupied by RMNodeImpl. Analysis of RM memory found that 
each RMNodeImpl has approximately 14M. The reason is that there is a 13W+ 
completedcontainers in each RMNodeImpl that has not been released.)

> RMNodeImpls occupy too much memory and causes RM GC to take a long time
> ---
>
> Key: YARN-9437
> URL: https://issues.apache.org/jira/browse/YARN-9437
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.1
>Reporter: qiuliang
>Priority: Blocker
> Attachments: 1.png, 2.png, 3.png
>
>
> We use hadoop-2.9.1 in our production environment with 1600+ nodes. 95.63% of 
> RM memory is occupied by RMNodeImpl. Analysis of RM memory found that each 
> RMNodeImpl has approximately 14M. The reason is that there is a 130,000+ 
> completedcontainers in each RMNodeImpl that has not been released.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9437) RMNodeImpls occupy too much memory and causes RM GC to take a long time

2019-04-03 Thread qiuliang (JIRA)
qiuliang created YARN-9437:
--

 Summary: RMNodeImpls occupy too much memory and causes RM GC to 
take a long time
 Key: YARN-9437
 URL: https://issues.apache.org/jira/browse/YARN-9437
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.9.1
Reporter: qiuliang
 Attachments: 1.png, 2.png, 3.png

We use hadoop-2.9.1 in our production environment with 1600+ nodes. 95.63% of 
RM memory is occupied by RMNodeImpl. Analysis of RM memory found that each 
RMNodeImpl has approximately 14M. The reason is that there is a 13W+ 
completedcontainers in each RMNodeImpl that has not been released.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-8978) For fair scheduler, application with higher priority should also get priority resources for running AM

2018-11-08 Thread qiuliang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qiuliang reopened YARN-8978:


> For fair scheduler, application with higher priority should also get priority 
> resources for running AM
> --
>
> Key: YARN-8978
> URL: https://issues.apache.org/jira/browse/YARN-8978
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: qiuliang
>Priority: Major
> Attachments: YARN-8978.001.patch
>
>
> In order to allow important applications to run earlier, we used priority 
> scheduling in the fair scheduler, and FairSharePolicy uses YARN-6307. 
> Considering this situation, there are two applications (with different 
> priorities) in the same queue and both are accepted. Both applications are 
> demanding and hungry when dispatched to the queue. Next, calculate the weight 
> ratio. Since the used resources of both applications are 0, the weight ratio 
> is also 0. The priority is invalid in this case. Low-priority applications 
> may get resources to run AM earlier than high-priority applications.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8978) For fair scheduler, application with higher priority should also get priority resources for running AM

2018-11-06 Thread qiuliang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qiuliang updated YARN-8978:
---
Release Note:   (was: index 0ef90a1..d2b5ad7 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
@@ -169,8 +169,13 @@ private int compareFairShareUsage(Schedulable s1, 
Schedulable s2,
   double useToWeightRatio1;
   double useToWeightRatio2;
   if (weight1 > 0.0 && weight2 > 0.0) {
-useToWeightRatio1 = resourceUsage1.getMemorySize() / weight1;
-useToWeightRatio2 = resourceUsage2.getMemorySize() / weight2;
+if (resourceUsage1.getMemorySize() == 0 && 
resourceUsage2.getMemorySize() == 0){
+  useToWeightRatio1 = ONE.getMemorySize() / weight1;
+  useToWeightRatio2 = ONE.getMemorySize() / weight2;
+} else {
+  useToWeightRatio1 = resourceUsage1.getMemorySize() / weight1;
+  useToWeightRatio2 = resourceUsage2.getMemorySize() / weight2;
+}
   } else { // Either weight1 or weight2 equals to 0
 if (weight1 == weight2) {
   // If they have same weight, just compare usage
)

> For fair scheduler, application with higher priority should also get priority 
> resources for running AM
> --
>
> Key: YARN-8978
> URL: https://issues.apache.org/jira/browse/YARN-8978
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: qiuliang
>Priority: Major
>
> In order to allow important applications to run earlier, we used priority 
> scheduling in the fair scheduler, and FairSharePolicy uses YARN-6307. 
> Considering this situation, there are two applications (with different 
> priorities) in the same queue and both are accepted. Both applications are 
> demanding and hungry when dispatched to the queue. Next, calculate the weight 
> ratio. Since the used resources of both applications are 0, the weight ratio 
> is also 0. The priority is invalid in this case. Low-priority applications 
> may get resources to run AM earlier than high-priority applications.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8978) For fair scheduler, application with higher priority should also get priority resources for running AM

2018-11-06 Thread qiuliang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qiuliang updated YARN-8978:
---
   Flags:   (was: Patch)
  Labels:   (was: patch)
Release Note: 
index 0ef90a1..d2b5ad7 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
@@ -169,8 +169,13 @@ private int compareFairShareUsage(Schedulable s1, 
Schedulable s2,
   double useToWeightRatio1;
   double useToWeightRatio2;
   if (weight1 > 0.0 && weight2 > 0.0) {
-useToWeightRatio1 = resourceUsage1.getMemorySize() / weight1;
-useToWeightRatio2 = resourceUsage2.getMemorySize() / weight2;
+if (resourceUsage1.getMemorySize() == 0 && 
resourceUsage2.getMemorySize() == 0){
+  useToWeightRatio1 = ONE.getMemorySize() / weight1;
+  useToWeightRatio2 = ONE.getMemorySize() / weight2;
+} else {
+  useToWeightRatio1 = resourceUsage1.getMemorySize() / weight1;
+  useToWeightRatio2 = resourceUsage2.getMemorySize() / weight2;
+}
   } else { // Either weight1 or weight2 equals to 0
 if (weight1 == weight2) {
   // If they have same weight, just compare usage


  was:
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
index 0ef90a1..d2b5ad7 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
@@ -169,8 +169,13 @@ private int compareFairShareUsage(Schedulable s1, 
Schedulable s2,
   double useToWeightRatio1;
   double useToWeightRatio2;
   if (weight1 > 0.0 && weight2 > 0.0) {
-useToWeightRatio1 = resourceUsage1.getMemorySize() / weight1;
-useToWeightRatio2 = resourceUsage2.getMemorySize() / weight2;
+if (resourceUsage1.getMemorySize() == 0 && 
resourceUsage2.getMemorySize() == 0){
+  useToWeightRatio1 = ONE.getMemorySize() / weight1;
+  useToWeightRatio2 = ONE.getMemorySize() / weight2;
+} else {
+  useToWeightRatio1 = resourceUsage1.getMemorySize() / weight1;
+  useToWeightRatio2 = resourceUsage2.getMemorySize() / weight2;
+}
   } else { // Either weight1 or weight2 equals to 0
 if (weight1 == weight2) {
   // If they have same weight, just compare usage



> For fair scheduler, application with higher priority should also get priority 
> resources for running AM
> --
>
> Key: YARN-8978
> URL: https://issues.apache.org/jira/browse/YARN-8978
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: qiuliang
>Priority: Major
>
> In order to allow important applications to run earlier, we used priority 
> scheduling in the fair scheduler, and FairSharePolicy uses YARN-6307. 
> Considering this situation, there are two applications (with different 
> priorities) in the same queue and both are accepted. Both applications are 
> demanding and hungry when dispatched to the queue. Next, calculate the weight 
> ratio. Since the used resources of both applications are 0, the weight ratio 
> is also 0. The priority is invalid in this case. Low-priority applications 
> may get resources to run AM earlier than high-priority applications.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8978) For fair scheduler, application with higher priority should also get priority resources for running AM

2018-11-06 Thread qiuliang (JIRA)
qiuliang created YARN-8978:
--

 Summary: For fair scheduler, application with higher priority 
should also get priority resources for running AM
 Key: YARN-8978
 URL: https://issues.apache.org/jira/browse/YARN-8978
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: qiuliang


In order to allow important applications to run earlier, we used priority 
scheduling in the fair scheduler, and FairSharePolicy uses YARN-6307. 
Considering this situation, there are two applications (with different 
priorities) in the same queue and both are accepted. Both applications are 
demanding and hungry when dispatched to the queue. Next, calculate the weight 
ratio. Since the used resources of both applications are 0, the weight ratio is 
also 0. The priority is invalid in this case. Low-priority applications may get 
resources to run AM earlier than high-priority applications.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4879) Enhance Allocate Protocol to Identify Requests Explicitly

2018-08-29 Thread qiuliang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597011#comment-16597011
 ] 

qiuliang commented on YARN-4879:


Thanks for putting the doc together with all the details. And I have a question 
that why the number of Rack2 for Req2 is 3? Node1 and Node4 on Rack1, Node5 on 
Rack2, so the number of Rack1 is 4 and Rack2 is 2. Where am I wrong?

> Enhance Allocate Protocol to Identify Requests Explicitly
> -
>
> Key: YARN-4879
> URL: https://issues.apache.org/jira/browse/YARN-4879
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>Priority: Major
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: SimpleAllocateProtocolProposal-v1.pdf, 
> SimpleAllocateProtocolProposal-v2.pdf
>
>
> For legacy reasons, the current allocate protocol expects expanded requests 
> which represent the cumulative request for any change in resource 
> constraints. This is not only very difficult to comprehend but makes it 
> impossible for the scheduler to associate container allocations to the 
> original requests. This problem is amplified by the fact that the expansion 
> is managed by the AMRMClient which makes it cumbersome for non-Java clients 
> as they all have to replicate the non-trivial logic. In this JIRA, we are 
> proposing enhancement to the Allocate Protocol to allow AMs to identify 
> requests explicitly.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org