[jira] [Commented] (YARN-6740) Federation Router (hiding multiple RMs for ApplicationClientProtocol) phase 2

2019-06-28 Thread hunshenshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875351#comment-16875351
 ] 

hunshenshi commented on YARN-6740:
--

Thanks [~abmodi] [~giovanni.fumarola]

> Federation Router (hiding multiple RMs for ApplicationClientProtocol) phase 2
> -
>
> Key: YARN-6740
> URL: https://issues.apache.org/jira/browse/YARN-6740
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Abhishek Modi
>Priority: Major
>
> This JIRA tracks the implementation of the layer for routing 
> ApplicaitonClientProtocol requests to the appropriate RM(s) in a federated 
> YARN cluster.
> Under the YARN-3659 we only implemented getNewApplication, submitApplication, 
> forceKillApplication and getApplicationReport to execute applications E2E.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-06-28 Thread hunshenshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875350#comment-16875350
 ] 

hunshenshi commented on YARN-9655:
--

Sure,I add a UT in TestFederationInterceptor#testAllocateResponse。

Thanks for review [~cheersyang]

> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9562) Add Java changes for the new RuncContainerRuntime

2019-06-28 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875283#comment-16875283
 ] 

Eric Badger commented on YARN-9562:
---

Attaching patch 001 as an initial patch to give everyone a sense of how the 
patch will look for the most part. Currently, the {{RuncContainerRuntime}} is 
using Docker configs, but we have decided to split the config into docker and 
runc separately. So this will need to change in future versions of the patch. 
Additionally, there are currently no unit tests. An entire suite of new tests 
will need to be written before this can be committed. Because of this, I'm not 
going to submit the patch until a later revision.

Other than that, I'm happy to hear feedback from others while I fix up the 
patch, add unit tests, and etc. 

To try this out (along with the C code changes from YARN-9561), you will need 
to create squashfs layers from all of the layers of a docker image and upload 
them to the layers directory specified by the configs. The image config will go 
in the config directory, and the the manifest in the manifests directory. There 
is also some magic that needs to be done in relation to whiteout and opaque 
files in the docker image, but you can probably get your image to run without 
dealing with those. I have a tool that does the whole conversion, but that 
isn't yet ready to put up for review because there are some bits of code that 
rely on internal changes that haven't been made to the apache codebase. If 
you'd like, I could try and put that up before focusing on the unit tests for 
this JIRA as well as YARN-9561.

> Add Java changes for the new RuncContainerRuntime
> -
>
> Key: YARN-9562
> URL: https://issues.apache.org/jira/browse/YARN-9562
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9562.001.patch
>
>
> This JIRA will be used to add the Java changes for the new 
> RuncContainerRuntime. This will work off of YARN-9560 to use much of the 
> existing DockerLinuxContainerRuntime code once it is moved up into an 
> abstract class that can be extended. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9562) Add Java changes for the new RuncContainerRuntime

2019-06-28 Thread Eric Badger (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-9562:
--
Attachment: YARN-9562.001.patch

> Add Java changes for the new RuncContainerRuntime
> -
>
> Key: YARN-9562
> URL: https://issues.apache.org/jira/browse/YARN-9562
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9562.001.patch
>
>
> This JIRA will be used to add the Java changes for the new 
> RuncContainerRuntime. This will work off of YARN-9560 to use much of the 
> existing DockerLinuxContainerRuntime code once it is moved up into an 
> abstract class that can be extended. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-06-28 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875252#comment-16875252
 ] 

Eric Badger commented on YARN-9560:
---

Thanks [~eyang], [~Jim_Brennan], [~ccondit], [~shaneku...@gmail.com] for the 
patience and help with this patch! I'll put up an initial patch for YARN-9562 
soon

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Fix For: 3.3.0
>
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch, YARN-9560.007.patch, YARN-9560.008.patch, 
> YARN-9560.009.patch, YARN-9560.010.patch, YARN-9560.011.patch, 
> YARN-9560.012.patch, YARN-9560.013.patch
>
>
> Since the new RuncContainerRuntime will be using a lot of the same code as 
> DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - RuncContainerRuntime
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-06-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875235#comment-16875235
 ] 

Hudson commented on YARN-9560:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16838 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16838/])
YARN-9560. Restructure DockerLinuxContainerRuntime to extend (eyang: rev 
29465bf169a7e348a4f32265083450faf66d5631)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceHandlerImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerCleanup.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/OCIContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/deviceframework/DeviceResourceHandlerImpl.java


> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch, YARN-9560.007.patch, YARN-9560.008.patch, 
> YARN-9560.009.patch, YARN-9560.010.patch, YARN-9560.011.patch, 
> YARN-9560.012.patch, YARN-9560.013.patch
>
>
> Since the new RuncContainerRuntime will be using a lot of the same code as 
> DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - RuncContainerRuntime
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-06-28 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875217#comment-16875217
 ] 

Eric Yang commented on YARN-9560:
-

+1 on patch 013.  Will commit shortly.

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch, YARN-9560.007.patch, YARN-9560.008.patch, 
> YARN-9560.009.patch, YARN-9560.010.patch, YARN-9560.011.patch, 
> YARN-9560.012.patch, YARN-9560.013.patch
>
>
> Since the new RuncContainerRuntime will be using a lot of the same code as 
> DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - RuncContainerRuntime
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9581) Fix WebAppUtils#getRMWebAppURLWithScheme ignores rm2

2019-06-28 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875157#comment-16875157
 ] 

Eric Yang edited comment on YARN-9581 at 6/28/19 8:37 PM:
--

[~Prabhu Joseph] Thank you for the patch.  I just committed addendum patch 001 
to trunk and branch-3.2.  Fixed version remains the same.


was (Author: eyang):
[~Prabhu Joseph] Thank you for the patch.  I just committed addendum patch 001 
to trunk.

> Fix WebAppUtils#getRMWebAppURLWithScheme ignores rm2
> 
>
> Key: YARN-9581
> URL: https://issues.apache.org/jira/browse/YARN-9581
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: YARN-9581-001.patch, YARN-9581-002.patch, 
> YARN-9581-003.patch, YARN-9581-004.patch, YARN-9581-005.patch, 
> YARN-9581-006.patch, YARN-9581-007.patch, YARN-9581.addendum-001.patch
>
>
> Yarn Logs fails for a running job in case of RM HA with rm2 active and rm1 is 
> down.
> {code}
> hrt_qa@prabhuYarn:~> /usr/hdp/current/hadoop-yarn-client/bin/yarn  logs 
> -applicationId application_1558613472348_0004 -am 1
> 19/05/24 18:04:49 INFO client.AHSProxy: Connecting to Application History 
> server at prabhuYarn/172.27.23.55:10200
> 19/05/24 18:04:50 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Unable to get AM container informations for the 
> application:application_1558613472348_0004
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> https://prabhuYarn:8090/ws/v1/cluster/apps/application_1558613472348_0004/appattempts
> Can not get AMContainers logs for the 
> application:application_1558613472348_0004 with the appOwner:hrt_qa
> {code}
> LogsCli getRMWebAppURLWithoutScheme only checks the first one from the RM 
> list yarn.resourcemanager.ha.rm-ids.
> {code}
> yarnConfig.set(YarnConfiguration.RM_HA_ID, rmIds.get(0));
> {code}
> SchedConfCli also fails 
> {code}
> [ambari-qa@pjosephdocker-3 ~]$ yarn  schedulerconf -update 
> root.default:maximum-capacity=90
> Exception in thread "main" com.sun.jersey.api.client.ClientHandlerException: 
> java.net.ConnectException: Connection refused (Connection refused)
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
>   at com.sun.jersey.api.client.Client.handle(Client.java:652)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9581) Fix WebAppUtils#getRMWebAppURLWithScheme ignores rm2

2019-06-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875156#comment-16875156
 ] 

Hudson commented on YARN-9581:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16836 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16836/])
YARN-9581. Add support for get multiple RM webapp URLs.(eyang: rev 
f02b0e19940dc6fc1e19258a40db37d1eed89d21)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java


> Fix WebAppUtils#getRMWebAppURLWithScheme ignores rm2
> 
>
> Key: YARN-9581
> URL: https://issues.apache.org/jira/browse/YARN-9581
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: YARN-9581-001.patch, YARN-9581-002.patch, 
> YARN-9581-003.patch, YARN-9581-004.patch, YARN-9581-005.patch, 
> YARN-9581-006.patch, YARN-9581-007.patch, YARN-9581.addendum-001.patch
>
>
> Yarn Logs fails for a running job in case of RM HA with rm2 active and rm1 is 
> down.
> {code}
> hrt_qa@prabhuYarn:~> /usr/hdp/current/hadoop-yarn-client/bin/yarn  logs 
> -applicationId application_1558613472348_0004 -am 1
> 19/05/24 18:04:49 INFO client.AHSProxy: Connecting to Application History 
> server at prabhuYarn/172.27.23.55:10200
> 19/05/24 18:04:50 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Unable to get AM container informations for the 
> application:application_1558613472348_0004
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> https://prabhuYarn:8090/ws/v1/cluster/apps/application_1558613472348_0004/appattempts
> Can not get AMContainers logs for the 
> application:application_1558613472348_0004 with the appOwner:hrt_qa
> {code}
> LogsCli getRMWebAppURLWithoutScheme only checks the first one from the RM 
> list yarn.resourcemanager.ha.rm-ids.
> {code}
> yarnConfig.set(YarnConfiguration.RM_HA_ID, rmIds.get(0));
> {code}
> SchedConfCli also fails 
> {code}
> [ambari-qa@pjosephdocker-3 ~]$ yarn  schedulerconf -update 
> root.default:maximum-capacity=90
> Exception in thread "main" com.sun.jersey.api.client.ClientHandlerException: 
> java.net.ConnectException: Connection refused (Connection refused)
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
>   at com.sun.jersey.api.client.Client.handle(Client.java:652)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9581) Fix WebAppUtils#getRMWebAppURLWithScheme ignores rm2

2019-06-28 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875153#comment-16875153
 ] 

Eric Yang commented on YARN-9581:
-

+1 for addendum patch 001.

> Fix WebAppUtils#getRMWebAppURLWithScheme ignores rm2
> 
>
> Key: YARN-9581
> URL: https://issues.apache.org/jira/browse/YARN-9581
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: YARN-9581-001.patch, YARN-9581-002.patch, 
> YARN-9581-003.patch, YARN-9581-004.patch, YARN-9581-005.patch, 
> YARN-9581-006.patch, YARN-9581-007.patch, YARN-9581.addendum-001.patch
>
>
> Yarn Logs fails for a running job in case of RM HA with rm2 active and rm1 is 
> down.
> {code}
> hrt_qa@prabhuYarn:~> /usr/hdp/current/hadoop-yarn-client/bin/yarn  logs 
> -applicationId application_1558613472348_0004 -am 1
> 19/05/24 18:04:49 INFO client.AHSProxy: Connecting to Application History 
> server at prabhuYarn/172.27.23.55:10200
> 19/05/24 18:04:50 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Unable to get AM container informations for the 
> application:application_1558613472348_0004
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> https://prabhuYarn:8090/ws/v1/cluster/apps/application_1558613472348_0004/appattempts
> Can not get AMContainers logs for the 
> application:application_1558613472348_0004 with the appOwner:hrt_qa
> {code}
> LogsCli getRMWebAppURLWithoutScheme only checks the first one from the RM 
> list yarn.resourcemanager.ha.rm-ids.
> {code}
> yarnConfig.set(YarnConfiguration.RM_HA_ID, rmIds.get(0));
> {code}
> SchedConfCli also fails 
> {code}
> [ambari-qa@pjosephdocker-3 ~]$ yarn  schedulerconf -update 
> root.default:maximum-capacity=90
> Exception in thread "main" com.sun.jersey.api.client.ClientHandlerException: 
> java.net.ConnectException: Connection refused (Connection refused)
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
>   at com.sun.jersey.api.client.Client.handle(Client.java:652)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-06-28 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875124#comment-16875124
 ] 

Jim Brennan commented on YARN-9560:
---

Thanks for all the updates [~ebadger]!  I am also +1 on patch 013 (non-binding).


> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch, YARN-9560.007.patch, YARN-9560.008.patch, 
> YARN-9560.009.patch, YARN-9560.010.patch, YARN-9560.011.patch, 
> YARN-9560.012.patch, YARN-9560.013.patch
>
>
> Since the new RuncContainerRuntime will be using a lot of the same code as 
> DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - RuncContainerRuntime
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9656) Plugin to avoid scheduling jobs on node which are not in "schedulable" state, but are healthy otherwise.

2019-06-28 Thread Prashant Golash (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Golash updated YARN-9656:
--
Affects Version/s: 2.9.1

> Plugin to avoid scheduling jobs on node which are not in "schedulable" state, 
> but are healthy otherwise.
> 
>
> Key: YARN-9656
> URL: https://issues.apache.org/jira/browse/YARN-9656
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.9.1, 3.1.2
>Reporter: Prashant Golash
>Priority: Major
>
> Creating this Jira to get idea from the community if this is something 
> helpful which can be done in YARN. Some times the nodes go in a bad state for 
> e.g. (H/W problem: I/O is bad; Fan problem). In some other scenarios, if 
> CGroup is not enabled, nodes may be running very high on CPU and the jobs 
> scheduled on them will suffer.
>  
> The idea is three-fold:
>  # Gather relevant metrics from node-managers and put in some form (for e.g. 
> exclude file).
>  # RM loads the files and put the nodes as part of the blacklist.
>  # Once the node becomes good, they can again be put in the whitelist.
> Various optimizations can be done here, but I would like to understand if 
> this is something which could be helpful as an upstream feature in YARN.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9656) Plugin to avoid scheduling jobs on node which are not in "schedulable" state, but are healthy otherwise.

2019-06-28 Thread Prashant Golash (JIRA)
Prashant Golash created YARN-9656:
-

 Summary: Plugin to avoid scheduling jobs on node which are not in 
"schedulable" state, but are healthy otherwise.
 Key: YARN-9656
 URL: https://issues.apache.org/jira/browse/YARN-9656
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Affects Versions: 3.1.2
Reporter: Prashant Golash


Creating this Jira to get idea from the community if this is something helpful 
which can be done in YARN. Some times the nodes go in a bad state for e.g. (H/W 
problem: I/O is bad; Fan problem). In some other scenarios, if CGroup is not 
enabled, nodes may be running very high on CPU and the jobs scheduled on them 
will suffer.

 

The idea is three-fold:
 # Gather relevant metrics from node-managers and put in some form (for e.g. 
exclude file).
 # RM loads the files and put the nodes as part of the blacklist.
 # Once the node becomes good, they can again be put in the whitelist.

Various optimizations can be done here, but I would like to understand if this 
is something which could be helpful as an upstream feature in YARN.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6740) Federation Router (hiding multiple RMs for ApplicationClientProtocol) phase 2

2019-06-28 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875094#comment-16875094
 ] 

Giovanni Matteo Fumarola commented on YARN-6740:


Only 6-7 methods got implemented in the several jiras. There are still methods 
that need to be implemented.

> Federation Router (hiding multiple RMs for ApplicationClientProtocol) phase 2
> -
>
> Key: YARN-6740
> URL: https://issues.apache.org/jira/browse/YARN-6740
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Abhishek Modi
>Priority: Major
>
> This JIRA tracks the implementation of the layer for routing 
> ApplicaitonClientProtocol requests to the appropriate RM(s) in a federated 
> YARN cluster.
> Under the YARN-3659 we only implemented getNewApplication, submitApplication, 
> forceKillApplication and getApplicationReport to execute applications E2E.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-06-28 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875089#comment-16875089
 ] 

Shane Kumpf commented on YARN-9560:
---

Thanks for the patch and explanation, [~ebadger]. It is a similar pattern to 
what we do in the delegating runtime. I tested out the patch and it looks good 
to me. The unit test failing looks to be unrelated. I'm +1 on patch 013.

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch, YARN-9560.007.patch, YARN-9560.008.patch, 
> YARN-9560.009.patch, YARN-9560.010.patch, YARN-9560.011.patch, 
> YARN-9560.012.patch, YARN-9560.013.patch
>
>
> Since the new RuncContainerRuntime will be using a lot of the same code as 
> DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - RuncContainerRuntime
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-06-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875081#comment-16875081
 ] 

Hadoop QA commented on YARN-9560:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 51s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 0 new + 22 unchanged - 2 fixed = 22 total (was 24) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 31s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m  4s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 72m  5s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9560 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12973188/YARN-9560.013.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8d2acf393598 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / cbae241 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/24334/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24334/testReport/ |
| Max. process+thread count | 413 (vs. ulimit of 1) |
| modules | C: 

[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-06-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875040#comment-16875040
 ] 

Hudson commented on YARN-9655:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16833 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16833/])
Revert "YARN-9655. AllocateResponse in FederationInterceptor lost (wwei: rev 
f09c31a97e1646a1089e87d859040ebfe0c047f5)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/FederationInterceptor.java


> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-06-28 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875037#comment-16875037
 ] 

Weiwei Yang commented on YARN-9655:
---

Oops. The fix was simple and that makes me ignore there is no UT for this, let 
me revert the commit for now.

[~hunhun] can you help to add a UT to cover this NPE issue?

Thanks

> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-06-28 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875035#comment-16875035
 ] 

Weiwei Yang commented on YARN-9655:
---

+1. committing shortly.

> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-06-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875030#comment-16875030
 ] 

Hudson commented on YARN-9655:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16832 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16832/])
YARN-9655. AllocateResponse in FederationInterceptor lost (wwei: rev 
5e7caf128719aac7d16d0efc8334b3b5a4b01e89)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/FederationInterceptor.java


> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-06-28 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875020#comment-16875020
 ] 

Craig Condit commented on YARN-9560:


+1 on patch 013 (non-binding).

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch, YARN-9560.007.patch, YARN-9560.008.patch, 
> YARN-9560.009.patch, YARN-9560.010.patch, YARN-9560.011.patch, 
> YARN-9560.012.patch, YARN-9560.013.patch
>
>
> Since the new RuncContainerRuntime will be using a lot of the same code as 
> DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - RuncContainerRuntime
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-06-28 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875018#comment-16875018
 ] 

Eric Badger commented on YARN-9560:
---

Patch 013 addresses checkstyle in the meantime

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch, YARN-9560.007.patch, YARN-9560.008.patch, 
> YARN-9560.009.patch, YARN-9560.010.patch, YARN-9560.011.patch, 
> YARN-9560.012.patch, YARN-9560.013.patch
>
>
> Since the new RuncContainerRuntime will be using a lot of the same code as 
> DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - RuncContainerRuntime
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-06-28 Thread Eric Badger (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-9560:
--
Attachment: YARN-9560.013.patch

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch, YARN-9560.007.patch, YARN-9560.008.patch, 
> YARN-9560.009.patch, YARN-9560.010.patch, YARN-9560.011.patch, 
> YARN-9560.012.patch, YARN-9560.013.patch
>
>
> Since the new RuncContainerRuntime will be using a lot of the same code as 
> DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - RuncContainerRuntime
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-06-28 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875010#comment-16875010
 ] 

Eric Badger commented on YARN-9560:
---

The reason for having the {{isDockerContainerRequested()}} in 
{{OCIContainerRuntime}} is so that we can tell if either a Docker container 
_or_ a Runc container was requested. The classes that call 
{{isOCICompliantContainerRequested()}} are calling it from a static context. 
They don't know whether the container is docker, runc, or something else. For 
now since there are only Docker containers, the logic of 
{{isDockerContainerRequested()}} is identical to 
{{isOCICompliantContainerRequested()}}. However, once Runc is added, the logic 
of {{isOCICompliantContainerRequsted()}} will be a logical OR of 
{{isDockerContainerRequsted()}} and {{isRuncContainerRequested()}}. I thought 
this would be cleaner than changing all of the invocations of 
{{isDockerContainerRequested()}} and making those logical ORs. And to me it 
makes more sense to let the subclasses define the logic around whether a docker 
container or runc container is requested. If you have a betterr idea, let me 
know. 

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch, YARN-9560.007.patch, YARN-9560.008.patch, 
> YARN-9560.009.patch, YARN-9560.010.patch, YARN-9560.011.patch, 
> YARN-9560.012.patch
>
>
> Since the new RuncContainerRuntime will be using a lot of the same code as 
> DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - RuncContainerRuntime
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-06-28 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reassigned YARN-9655:
-

Assignee: hunshenshi

> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9623) Auto adjust max queue length of app activities to make sure activities on all nodes can be covered

2019-06-28 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874994#comment-16874994
 ] 

Weiwei Yang commented on YARN-9623:
---

Pushed to trunk, thanks for the contribution [~Tao Yang].

> Auto adjust max queue length of app activities to make sure activities on all 
> nodes can be covered
> --
>
> Key: YARN-9623
> URL: https://issues.apache.org/jira/browse/YARN-9623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9623.001.patch, YARN-9623.002.patch
>
>
> Currently we can use configuration entry 
> "yarn.resourcemanager.activities-manager.app-activities.max-queue-length" to 
> control max queue length of app activities, but in some scenarios , this 
> configuration may need to be updated in a growing cluster. Moreover, it's 
> better for users to ignore that conf therefor it should be auto adjusted 
> internally.
>  There are some differences among different scheduling modes:
>  * multi-node placement disabled
>  ** Heartbeat driven scheduling: max queue length of app activities should 
> not less than the number of nodes, considering nodes can not be always in 
> order, we should make some room for misorder, for example, we can guarantee 
> that max queue length should not be less than 1.2 * numNodes
>  ** Async scheduling: every async scheduling thread goes through all nodes in 
> order, in this mode, we should guarantee that max queue length should be 
> numThreads * numNodes.
>  * multi-node placement enabled: activities on all nodes can be involved in a 
> single app allocation, therefor there's no need to adjust for this mode.
> To sum up, we can adjust the max queue length of app activities like this:
> {code}
> int configuredMaxQueueLength;
> int maxQueueLength;
> serviceInit(){
>   ...
>   configuredMaxQueueLength = ...; //read configured max queue length
>   maxQueueLength = configuredMaxQueueLength; //take configured value as 
> default
> }
> CleanupThread#run(){
>   ...
>   if (multiNodeDisabled) {
> if (asyncSchedulingEnabled) {
>maxQueueLength = max(configuredMaxQueueLength, numSchedulingThreads * 
> numNodes);
> } else {
>maxQueueLength = max(configuredMaxQueueLength, 1.2 * numNodes);
> }
>   } else if (maxQueueLength != configuredMaxQueueLength) {
> maxQueueLength = configuredMaxQueueLength;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9623) Auto adjust max queue length of app activities to make sure activities on all nodes can be covered

2019-06-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874995#comment-16874995
 ] 

Hudson commented on YARN-9623:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16831 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16831/])
YARN-9623. Auto adjust max queue length of app activities to make sure (wwei: 
rev cbae2413201bc470b5f16421ea69d1cd9edb64a8)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/activities/ActivitiesManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/activities/TestActivitiesManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


> Auto adjust max queue length of app activities to make sure activities on all 
> nodes can be covered
> --
>
> Key: YARN-9623
> URL: https://issues.apache.org/jira/browse/YARN-9623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9623.001.patch, YARN-9623.002.patch
>
>
> Currently we can use configuration entry 
> "yarn.resourcemanager.activities-manager.app-activities.max-queue-length" to 
> control max queue length of app activities, but in some scenarios , this 
> configuration may need to be updated in a growing cluster. Moreover, it's 
> better for users to ignore that conf therefor it should be auto adjusted 
> internally.
>  There are some differences among different scheduling modes:
>  * multi-node placement disabled
>  ** Heartbeat driven scheduling: max queue length of app activities should 
> not less than the number of nodes, considering nodes can not be always in 
> order, we should make some room for misorder, for example, we can guarantee 
> that max queue length should not be less than 1.2 * numNodes
>  ** Async scheduling: every async scheduling thread goes through all nodes in 
> order, in this mode, we should guarantee that max queue length should be 
> numThreads * numNodes.
>  * multi-node placement enabled: activities on all nodes can be involved in a 
> single app allocation, therefor there's no need to adjust for this mode.
> To sum up, we can adjust the max queue length of app activities like this:
> {code}
> int configuredMaxQueueLength;
> int maxQueueLength;
> serviceInit(){
>   ...
>   configuredMaxQueueLength = ...; //read configured max queue length
>   maxQueueLength = configuredMaxQueueLength; //take configured value as 
> default
> }
> CleanupThread#run(){
>   ...
>   if (multiNodeDisabled) {
> if (asyncSchedulingEnabled) {
>maxQueueLength = max(configuredMaxQueueLength, numSchedulingThreads * 
> numNodes);
> } else {
>maxQueueLength = max(configuredMaxQueueLength, 1.2 * numNodes);
> }
>   } else if (maxQueueLength != configuredMaxQueueLength) {
> maxQueueLength = configuredMaxQueueLength;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9623) Auto adjust max queue length of app activities to make sure activities on all nodes can be covered

2019-06-28 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874991#comment-16874991
 ] 

Weiwei Yang commented on YARN-9623:
---

+1, committing shortly.

> Auto adjust max queue length of app activities to make sure activities on all 
> nodes can be covered
> --
>
> Key: YARN-9623
> URL: https://issues.apache.org/jira/browse/YARN-9623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9623.001.patch, YARN-9623.002.patch
>
>
> Currently we can use configuration entry 
> "yarn.resourcemanager.activities-manager.app-activities.max-queue-length" to 
> control max queue length of app activities, but in some scenarios , this 
> configuration may need to be updated in a growing cluster. Moreover, it's 
> better for users to ignore that conf therefor it should be auto adjusted 
> internally.
>  There are some differences among different scheduling modes:
>  * multi-node placement disabled
>  ** Heartbeat driven scheduling: max queue length of app activities should 
> not less than the number of nodes, considering nodes can not be always in 
> order, we should make some room for misorder, for example, we can guarantee 
> that max queue length should not be less than 1.2 * numNodes
>  ** Async scheduling: every async scheduling thread goes through all nodes in 
> order, in this mode, we should guarantee that max queue length should be 
> numThreads * numNodes.
>  * multi-node placement enabled: activities on all nodes can be involved in a 
> single app allocation, therefor there's no need to adjust for this mode.
> To sum up, we can adjust the max queue length of app activities like this:
> {code}
> int configuredMaxQueueLength;
> int maxQueueLength;
> serviceInit(){
>   ...
>   configuredMaxQueueLength = ...; //read configured max queue length
>   maxQueueLength = configuredMaxQueueLength; //take configured value as 
> default
> }
> CleanupThread#run(){
>   ...
>   if (multiNodeDisabled) {
> if (asyncSchedulingEnabled) {
>maxQueueLength = max(configuredMaxQueueLength, numSchedulingThreads * 
> numNodes);
> } else {
>maxQueueLength = max(configuredMaxQueueLength, 1.2 * numNodes);
> }
>   } else if (maxQueueLength != configuredMaxQueueLength) {
> maxQueueLength = configuredMaxQueueLength;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9623) Auto adjust max queue length of app activities to make sure activities on all nodes can be covered

2019-06-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874917#comment-16874917
 ] 

Hadoop QA commented on YARN-9623:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  5m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  8s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
44s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 38s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
56s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
46s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 48s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}176m 56s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9623 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12973152/YARN-9623.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 58c6b042f9e0 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 

[jira] [Commented] (YARN-9581) Fix WebAppUtils#getRMWebAppURLWithScheme ignores rm2

2019-06-28 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874881#comment-16874881
 ] 

Prabhu Joseph commented on YARN-9581:
-

[~eyang] This Jira fixes two rms in case of HA. Have Submitted an addendum 
patch for multiple RMs. Can you review the [^YARN-9581.addendum-001.patch] when 
you get time.

> Fix WebAppUtils#getRMWebAppURLWithScheme ignores rm2
> 
>
> Key: YARN-9581
> URL: https://issues.apache.org/jira/browse/YARN-9581
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: YARN-9581-001.patch, YARN-9581-002.patch, 
> YARN-9581-003.patch, YARN-9581-004.patch, YARN-9581-005.patch, 
> YARN-9581-006.patch, YARN-9581-007.patch, YARN-9581.addendum-001.patch
>
>
> Yarn Logs fails for a running job in case of RM HA with rm2 active and rm1 is 
> down.
> {code}
> hrt_qa@prabhuYarn:~> /usr/hdp/current/hadoop-yarn-client/bin/yarn  logs 
> -applicationId application_1558613472348_0004 -am 1
> 19/05/24 18:04:49 INFO client.AHSProxy: Connecting to Application History 
> server at prabhuYarn/172.27.23.55:10200
> 19/05/24 18:04:50 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Unable to get AM container informations for the 
> application:application_1558613472348_0004
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> https://prabhuYarn:8090/ws/v1/cluster/apps/application_1558613472348_0004/appattempts
> Can not get AMContainers logs for the 
> application:application_1558613472348_0004 with the appOwner:hrt_qa
> {code}
> LogsCli getRMWebAppURLWithoutScheme only checks the first one from the RM 
> list yarn.resourcemanager.ha.rm-ids.
> {code}
> yarnConfig.set(YarnConfiguration.RM_HA_ID, rmIds.get(0));
> {code}
> SchedConfCli also fails 
> {code}
> [ambari-qa@pjosephdocker-3 ~]$ yarn  schedulerconf -update 
> root.default:maximum-capacity=90
> Exception in thread "main" com.sun.jersey.api.client.ClientHandlerException: 
> java.net.ConnectException: Connection refused (Connection refused)
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
>   at com.sun.jersey.api.client.Client.handle(Client.java:652)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9625) UI2 - No link to a queue on the Queues page for Fair Scheduler

2019-06-28 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth reassigned YARN-9625:


Assignee: Zoltan Siegl

> UI2 - No link to a queue on the Queues page for Fair Scheduler
> --
>
> Key: YARN-9625
> URL: https://issues.apache.org/jira/browse/YARN-9625
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Charan Hebri
>Assignee: Zoltan Siegl
>Priority: Major
> Attachments: Capacity_scheduler_page.png, Fair_scheduler_page.png
>
>
> When the scheduler is set as 'Capacity Scheduler' the Queues page has a tab 
> on the right with a link to a certain queue which provides running app 
> information for the queue. But for 'Fair Scheduler' there is no such link. 
> Attached screenshots for both schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7621) Support submitting apps with queue path for CapacityScheduler

2019-06-28 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874840#comment-16874840
 ] 

Tao Yang commented on YARN-7621:


Attached v2 patch rebased from trunk. 
[~cheersyang], could you please help to review this patch? Thanks.

> Support submitting apps with queue path for CapacityScheduler
> -
>
> Key: YARN-7621
> URL: https://issues.apache.org/jira/browse/YARN-7621
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Attachments: YARN-7621.001.patch, YARN-7621.002.patch
>
>
> Currently there is a difference of queue definition in 
> ApplicationSubmissionContext between CapacityScheduler and FairScheduler. 
> FairScheduler needs queue path but CapacityScheduler needs queue name. There 
> is no doubt of the correction of queue definition for CapacityScheduler 
> because it does not allow duplicate leaf queue names, but it's hard to switch 
> between FairScheduler and CapacityScheduler. I propose to support submitting 
> apps with queue path for CapacityScheduler to make the interface clearer and 
> scheduler switch smoothly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7621) Support submitting apps with queue path for CapacityScheduler

2019-06-28 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-7621:
---
Attachment: YARN-7621.002.patch

> Support submitting apps with queue path for CapacityScheduler
> -
>
> Key: YARN-7621
> URL: https://issues.apache.org/jira/browse/YARN-7621
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Attachments: YARN-7621.001.patch, YARN-7621.002.patch
>
>
> Currently there is a difference of queue definition in 
> ApplicationSubmissionContext between CapacityScheduler and FairScheduler. 
> FairScheduler needs queue path but CapacityScheduler needs queue name. There 
> is no doubt of the correction of queue definition for CapacityScheduler 
> because it does not allow duplicate leaf queue names, but it's hard to switch 
> between FairScheduler and CapacityScheduler. I propose to support submitting 
> apps with queue path for CapacityScheduler to make the interface clearer and 
> scheduler switch smoothly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9623) Auto adjust max queue length of app activities to make sure activities on all nodes can be covered

2019-06-28 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874825#comment-16874825
 ] 

Tao Yang commented on YARN-9623:


Thanks [~cheersyang] for your comments.
{quote}
If this configuration is set, then the value should be enforced for the queue 
size and disable the auto-adjustment. Can you add that logic?
{quote}
Currently configuration 
{{yarn.resourcemanager.activities-manager.app-activities.max-queue-length}} is 
still there  and can be seem as the lowest limit, max queue length of app 
activities can only be updated to a larger value than that value. I think this 
should make sense to us as well.

Attached v2 patch with adding volatile modifier for appActivitiesMaxQueueLength 
to make it to be seen by other threads as soon as possible.

> Auto adjust max queue length of app activities to make sure activities on all 
> nodes can be covered
> --
>
> Key: YARN-9623
> URL: https://issues.apache.org/jira/browse/YARN-9623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9623.001.patch, YARN-9623.002.patch
>
>
> Currently we can use configuration entry 
> "yarn.resourcemanager.activities-manager.app-activities.max-queue-length" to 
> control max queue length of app activities, but in some scenarios , this 
> configuration may need to be updated in a growing cluster. Moreover, it's 
> better for users to ignore that conf therefor it should be auto adjusted 
> internally.
>  There are some differences among different scheduling modes:
>  * multi-node placement disabled
>  ** Heartbeat driven scheduling: max queue length of app activities should 
> not less than the number of nodes, considering nodes can not be always in 
> order, we should make some room for misorder, for example, we can guarantee 
> that max queue length should not be less than 1.2 * numNodes
>  ** Async scheduling: every async scheduling thread goes through all nodes in 
> order, in this mode, we should guarantee that max queue length should be 
> numThreads * numNodes.
>  * multi-node placement enabled: activities on all nodes can be involved in a 
> single app allocation, therefor there's no need to adjust for this mode.
> To sum up, we can adjust the max queue length of app activities like this:
> {code}
> int configuredMaxQueueLength;
> int maxQueueLength;
> serviceInit(){
>   ...
>   configuredMaxQueueLength = ...; //read configured max queue length
>   maxQueueLength = configuredMaxQueueLength; //take configured value as 
> default
> }
> CleanupThread#run(){
>   ...
>   if (multiNodeDisabled) {
> if (asyncSchedulingEnabled) {
>maxQueueLength = max(configuredMaxQueueLength, numSchedulingThreads * 
> numNodes);
> } else {
>maxQueueLength = max(configuredMaxQueueLength, 1.2 * numNodes);
> }
>   } else if (maxQueueLength != configuredMaxQueueLength) {
> maxQueueLength = configuredMaxQueueLength;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9623) Auto adjust max queue length of app activities to make sure activities on all nodes can be covered

2019-06-28 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874001#comment-16874001
 ] 

Tao Yang edited comment on YARN-9623 at 6/28/19 10:07 AM:
--

Thanks [~cheersyang] for the feedback.
{quote}
However, the activity manager should be a general service, it should not be 
depending on CS's configuration.
{quote}
Yes, I had this concern before, but required number of app activities is indeed 
decided by a specific scheduler and even a specific scheduling policy inside 
the scheduler.  So the patch did the same as some general services like 
QueueACLsManager/SchedulerPlacementProcessor/... (using {{if scheduler 
instanceof CapacityScheduler}}). The specific scheduler can be ignored unless 
we just set maxQueueLength to max(configuredMaxQueueLength, 1.2 * numOfNodes), 
this may somehow waste a lot in a large cluster with multi-nodes placement 
enabled. Thoughts?

{quote}
Another thing is appActivitiesMaxQueueLength, do we need to make it atomic 
because it is being modified in another thread.
{quote}
It's no need to make it atomic since there's no requirements for sequence or 
consistency, but volatile is necessary for this variable.


was (Author: tao yang):
Thanks [~cheersyang] for the feedback.
{quote}
However, the activity manager should be a general service, it should not be 
depending on CS's configuration.
{quote}
Yes, I had this concern before, but required number of app activities is indeed 
decided by a specific scheduler and even a specific scheduling policy inside 
the scheduler.  So the patch did the same as some general services like 
QueueACLsManager/SchedulerPlacementProcessor/... (using {{if scheduler 
instanceof CapacityScheduler}}). The specific scheduler can be ignored unless 
we just set maxQueueLength to max(configuredMaxQueueLength, 1.2 * numOfNodes), 
this may somehow waste a lot in a large cluster with multi-nodes placement 
enabled. Thoughts?

{quote}
Another thing is appActivitiesMaxQueueLength, do we need to make it atomic 
because it is being modified in another thread.
{quote}
It's no need to make it atomic since there's no requirements for sequence or 
consistency, but violate is necessary for this variable.

> Auto adjust max queue length of app activities to make sure activities on all 
> nodes can be covered
> --
>
> Key: YARN-9623
> URL: https://issues.apache.org/jira/browse/YARN-9623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9623.001.patch, YARN-9623.002.patch
>
>
> Currently we can use configuration entry 
> "yarn.resourcemanager.activities-manager.app-activities.max-queue-length" to 
> control max queue length of app activities, but in some scenarios , this 
> configuration may need to be updated in a growing cluster. Moreover, it's 
> better for users to ignore that conf therefor it should be auto adjusted 
> internally.
>  There are some differences among different scheduling modes:
>  * multi-node placement disabled
>  ** Heartbeat driven scheduling: max queue length of app activities should 
> not less than the number of nodes, considering nodes can not be always in 
> order, we should make some room for misorder, for example, we can guarantee 
> that max queue length should not be less than 1.2 * numNodes
>  ** Async scheduling: every async scheduling thread goes through all nodes in 
> order, in this mode, we should guarantee that max queue length should be 
> numThreads * numNodes.
>  * multi-node placement enabled: activities on all nodes can be involved in a 
> single app allocation, therefor there's no need to adjust for this mode.
> To sum up, we can adjust the max queue length of app activities like this:
> {code}
> int configuredMaxQueueLength;
> int maxQueueLength;
> serviceInit(){
>   ...
>   configuredMaxQueueLength = ...; //read configured max queue length
>   maxQueueLength = configuredMaxQueueLength; //take configured value as 
> default
> }
> CleanupThread#run(){
>   ...
>   if (multiNodeDisabled) {
> if (asyncSchedulingEnabled) {
>maxQueueLength = max(configuredMaxQueueLength, numSchedulingThreads * 
> numNodes);
> } else {
>maxQueueLength = max(configuredMaxQueueLength, 1.2 * numNodes);
> }
>   } else if (maxQueueLength != configuredMaxQueueLength) {
> maxQueueLength = configuredMaxQueueLength;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9623) Auto adjust max queue length of app activities to make sure activities on all nodes can be covered

2019-06-28 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9623:
---
Attachment: YARN-9623.002.patch

> Auto adjust max queue length of app activities to make sure activities on all 
> nodes can be covered
> --
>
> Key: YARN-9623
> URL: https://issues.apache.org/jira/browse/YARN-9623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9623.001.patch, YARN-9623.002.patch
>
>
> Currently we can use configuration entry 
> "yarn.resourcemanager.activities-manager.app-activities.max-queue-length" to 
> control max queue length of app activities, but in some scenarios , this 
> configuration may need to be updated in a growing cluster. Moreover, it's 
> better for users to ignore that conf therefor it should be auto adjusted 
> internally.
>  There are some differences among different scheduling modes:
>  * multi-node placement disabled
>  ** Heartbeat driven scheduling: max queue length of app activities should 
> not less than the number of nodes, considering nodes can not be always in 
> order, we should make some room for misorder, for example, we can guarantee 
> that max queue length should not be less than 1.2 * numNodes
>  ** Async scheduling: every async scheduling thread goes through all nodes in 
> order, in this mode, we should guarantee that max queue length should be 
> numThreads * numNodes.
>  * multi-node placement enabled: activities on all nodes can be involved in a 
> single app allocation, therefor there's no need to adjust for this mode.
> To sum up, we can adjust the max queue length of app activities like this:
> {code}
> int configuredMaxQueueLength;
> int maxQueueLength;
> serviceInit(){
>   ...
>   configuredMaxQueueLength = ...; //read configured max queue length
>   maxQueueLength = configuredMaxQueueLength; //take configured value as 
> default
> }
> CleanupThread#run(){
>   ...
>   if (multiNodeDisabled) {
> if (asyncSchedulingEnabled) {
>maxQueueLength = max(configuredMaxQueueLength, numSchedulingThreads * 
> numNodes);
> } else {
>maxQueueLength = max(configuredMaxQueueLength, 1.2 * numNodes);
> }
>   } else if (maxQueueLength != configuredMaxQueueLength) {
> maxQueueLength = configuredMaxQueueLength;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9625) UI2 - No link to a queue on the Queues page for Fair Scheduler

2019-06-28 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth reassigned YARN-9625:


Assignee: (was: Szilard Nemeth)

> UI2 - No link to a queue on the Queues page for Fair Scheduler
> --
>
> Key: YARN-9625
> URL: https://issues.apache.org/jira/browse/YARN-9625
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Charan Hebri
>Priority: Major
> Attachments: Capacity_scheduler_page.png, Fair_scheduler_page.png
>
>
> When the scheduler is set as 'Capacity Scheduler' the Queues page has a tab 
> on the right with a link to a certain queue which provides running app 
> information for the queue. But for 'Fair Scheduler' there is no such link. 
> Attached screenshots for both schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9626) UI2 - Fair scheduler queue apps page issues

2019-06-28 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth reassigned YARN-9626:


Assignee: (was: Szilard Nemeth)

> UI2 - Fair scheduler queue apps page issues
> ---
>
> Key: YARN-9626
> URL: https://issues.apache.org/jira/browse/YARN-9626
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Charan Hebri
>Priority: Major
> Attachments: Fair_scheduler_apps_page.png
>
>
> There are a few issues with the apps page for a queue when Fair Scheduler is 
> used.
>  * Labels like configured capacity, configured max capacity etc. (marked in 
> the attached image) are not needed as they are specific to Capacity Scheduler.
>  * Steady fair memory, used memory and maximum memory are actual values but 
> are shown as percentages.
>  * Formatting of Pending, Allocated, Reserved Containers values is not 
> correct (shown in the attached screenshot)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9640) Slow event processing could cause too many attempt unregister events

2019-06-28 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874723#comment-16874723
 ] 

Zhankun Tang commented on YARN-9640:


[~bibinchundatt], yeah. agree.

> Slow event processing could cause too many attempt unregister events
> 
>
> Key: YARN-9640
> URL: https://issues.apache.org/jira/browse/YARN-9640
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>  Labels: scalability
> Attachments: YARN-9640.001.patch, YARN-9640.002.patch, 
> YARN-9640.003.patch
>
>
> We found in one of our test cluster verification that the number attempt 
> unregister events is about 300k+.
>  # AM all containers completed.
>  # AMRMClientImpl send finishApplcationMaster
>  # AMRMClient check event 100ms the finish Status using 
> finishApplicationMaster request.
>  # AMRMClientImpl#unregisterApplicationMaster
> {code:java}
>   while (true) {
> FinishApplicationMasterResponse response =
> rmClient.finishApplicationMaster(request);
> if (response.getIsUnregistered()) {
>   break;
> }
> LOG.info("Waiting for application to be successfully unregistered.");
> Thread.sleep(100);
>   }
> {code}
>  # ApplicationMasterService finishApplicationMaster interface sends 
> unregister events on every status update.
> We should send unregister event only once and cache event send , ignore and 
> send not unregistered response back to AM not overloading the event queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9480) createAppDir() in LogAggregationService shouldn't block dispatcher thread of ContainerManagerImpl

2019-06-28 Thread liyakun (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874719#comment-16874719
 ] 

liyakun commented on YARN-9480:
---

[~tangzhankun] please help to make [~Yunyao Zhang] as a contributor, and he 
will contribute to this issue.

> createAppDir() in LogAggregationService shouldn't block dispatcher thread of 
> ContainerManagerImpl
> -
>
> Key: YARN-9480
> URL: https://issues.apache.org/jira/browse/YARN-9480
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: liyakun
>Assignee: liyakun
>Priority: Major
>
> At present, when startContainers(), if NM does not contain the application, 
> it will enter the step of INIT_APPLICATION. In the application init step, 
> createAppDir() will be executed, and it is a blocking operation.
> createAppDir() is an operation that needs to interact with an external file 
> system. This operation is affected by the SLA of the external file system. 
> Once the external file system has a high latency, the NM dispatcher thread of 
> ContainerManagerImpl will be stuck. (In fact, I have seen a scene that NM 
> stuck here for more than an hour.)
> I think it would be more reasonable to move createAppDir() to the actual time 
> of uploading log (in other threads). And according to the logRetentionPolicy, 
> many of the containers may not get to this step, which will save a lot of 
> interactions with external file system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9480) createAppDir() in LogAggregationService shouldn't block dispatcher thread of ContainerManagerImpl

2019-06-28 Thread Yunyao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874717#comment-16874717
 ] 

Yunyao Zhang commented on YARN-9480:


please assign to me.

> createAppDir() in LogAggregationService shouldn't block dispatcher thread of 
> ContainerManagerImpl
> -
>
> Key: YARN-9480
> URL: https://issues.apache.org/jira/browse/YARN-9480
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: liyakun
>Assignee: liyakun
>Priority: Major
>
> At present, when startContainers(), if NM does not contain the application, 
> it will enter the step of INIT_APPLICATION. In the application init step, 
> createAppDir() will be executed, and it is a blocking operation.
> createAppDir() is an operation that needs to interact with an external file 
> system. This operation is affected by the SLA of the external file system. 
> Once the external file system has a high latency, the NM dispatcher thread of 
> ContainerManagerImpl will be stuck. (In fact, I have seen a scene that NM 
> stuck here for more than an hour.)
> I think it would be more reasonable to move createAppDir() to the actual time 
> of uploading log (in other threads). And according to the logRetentionPolicy, 
> many of the containers may not get to this step, which will save a lot of 
> interactions with external file system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org