[jira] [Commented] (YARN-6160) Create an agent-less docker-less provider in the native services framework

2017-05-16 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013524#comment-16013524
 ] 

Jian He commented on YARN-6160:
---

lgtm, for the findbugs warnings, what is the doClientInstall supposed to do ? 
If that's not required any more, may be we can just remove these code. 

> Create an agent-less docker-less provider in the native services framework
> --
>
> Key: YARN-6160
> URL: https://issues.apache.org/jira/browse/YARN-6160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Fix For: yarn-native-services
>
> Attachments: YARN-6160-yarn-native-services.001.patch, 
> YARN-6160-yarn-native-services.002.patch, 
> YARN-6160-yarn-native-services.003.patch
>
>
> The goal of the agent-less docker-less provider is to be able to use the YARN 
> native services framework when Docker is not installed or other methods of 
> app resource installation are preferable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6535) Program needs to exit when SLS finishes.

2017-05-16 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013476#comment-16013476
 ] 

Yufei Gu commented on YARN-6535:


Thanks [~rkanter] for the review and commit. Thanks [~wangda] for the review.

> Program needs to exit when SLS finishes. 
> -
>
> Key: YARN-6535
> URL: https://issues.apache.org/jira/browse/YARN-6535
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6535.001.patch, YARN-6535.002.patch, 
> YARN-6535.003.patch
>
>
> Program need to exit when SLS finishes except in unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6611) ResourceTypes should be renamed

2017-05-16 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-6611:
--
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-3926

> ResourceTypes should be renamed
> ---
>
> Key: YARN-6611
> URL: https://issues.apache.org/jira/browse/YARN-6611
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>
> {{ResourceTypes}} is too close to the unrelated {{ResourceType}} class.  
> Maybe {{ResourceClass}} would be better?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6611) ResourceTypes should be renamed

2017-05-16 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013435#comment-16013435
 ] 

Sunil G commented on YARN-6611:
---

[~templedf]
{{ResourceTypes}} is present in YARN-3926 branch. I think you are mentioning 
this ticket related to resource-profile branch, correct ?
I am converting this ticket under YARN-3926. Pls revert if not correct.

{{ResourceClass}} is better, however suffixing Class seems different too. May 
be something like {{ResourceCategory}} or something similar.

> ResourceTypes should be renamed
> ---
>
> Key: YARN-6611
> URL: https://issues.apache.org/jira/browse/YARN-6611
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>
> {{ResourceTypes}} is too close to the unrelated {{ResourceType}} class.  
> Maybe {{ResourceClass}} would be better?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6409) RM does not blacklist node for AM launch failures

2017-05-16 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013424#comment-16013424
 ] 

Rohith Sharma K S commented on YARN-6409:
-

IMO, I have concern for blacklisting a AM node on SocketTimeoutException. It 
would simply lead to loose one of the *capable/healthy node* get blacklisted. 
The NM might be too busy at particular point of time, but it is capable for 
launching containers without any doubt. 
There is NO way to purge blacklisted nodes unless blacklisted node count 
reaches threshold. In socketTimeException, many healthy nodes get blacklisted. 

I would like to hear opinion from other community folks who worked on AM 
blacklisting feature. 
cc :/ [~jlowe] [~vvasudev] [~vinodkv] [~sunilg] [~djp]

> RM does not blacklist node for AM launch failures
> -
>
> Key: YARN-6409
> URL: https://issues.apache.org/jira/browse/YARN-6409
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-6409.00.patch, YARN-6409.01.patch, 
> YARN-6409.02.patch, YARN-6409.03.patch
>
>
> Currently, node blacklisting upon AM failures only handles failures that 
> happen after AM container is launched (see 
> RMAppAttemptImpl.shouldCountTowardsNodeBlacklisting()).  However, AM launch 
> can also fail if the NM, where the AM container is allocated, goes 
> unresponsive.  Because it is not handled, scheduler may continue to allocate 
> AM containers on that same NM for the following app attempts. 
> {code}
> Application application_1478721503753_0870 failed 2 times due to Error 
> launching appattempt_1478721503753_0870_02. Got exception: 
> java.io.IOException: Failed on local exception: java.io.IOException: 
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/17.111.179.113:46702 remote=*.me.com/17.111.178.125:8041]; Host 
> Details : local host is: "*.me.com/17.111.179.113"; destination host is: 
> "*.me.com":8041; 
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1475) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1408) 
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>  
> at com.sun.proxy.$Proxy86.startContainers(Unknown Source) 
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
>  
> at sun.reflect.GeneratedMethodAccessor155.invoke(Unknown Source) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
> at java.lang.reflect.Method.invoke(Method.java:497) 
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>  
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>  
> at com.sun.proxy.$Proxy87.startContainers(Unknown Source) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:120)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:256)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  
> at java.lang.Thread.run(Thread.java:745) 
> Caused by: java.io.IOException: java.net.SocketTimeoutException: 6 millis 
> timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/17.111.179.113:46702 
> remote=*.me.com/17.111.178.125:8041] 
> at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:687) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:422) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
>  
> at 
> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:650)
>  
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:738) 
> at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) 
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1524) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1447) 
> ... 15 more 
> Caused by: java.net.SocketTimeoutException: 6 millis timeout while 
> waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/17.111.179.113:46702 
> remote=*.me.com/17.111.178.125:8041] 
> at 
> 

[jira] [Commented] (YARN-6447) Provide container sandbox policies for groups

2017-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013375#comment-16013375
 ] 

Hudson commented on YARN-6447:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11740 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11740/])
YARN-6447. Provide container sandbox policies for groups (gphillips via 
(rkanter: rev 18c494a00c8ead768f3a868b450dceea485559df)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestJavaSandboxLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java


> Provide container sandbox policies for groups 
> --
>
> Key: YARN-6447
> URL: https://issues.apache.org/jira/browse/YARN-6447
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 3.0.0-alpha3
>Reporter: Greg Phillips
>Assignee: Greg Phillips
>Priority: Minor
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6447.001.patch, YARN-6447.002.patch, 
> YARN-6447.003.patch
>
>
> Currently the container sandbox feature 
> ([YARN-5280|https://issues.apache.org/jira/browse/YARN-5280]) allows YARN 
> administrators to use one Java Security Manager policy file to limit the 
> permissions granted to YARN containers.  It would be useful to allow for 
> different policy files to be used based on groups.
> For example, an administrator may want to ensure standard users who write 
> applications for the MapReduce or Tez frameworks are not allowed to open 
> arbitrary network connections within their data processing code.  Users who 
> are designing the ETL pipelines however may need to open sockets to extract 
> data from external sources.  By assigning these sets of users to different 
> groups and setting specific policies for each group you can assert fine 
> grained control over the permissions granted to each Java based container 
> across a YARN cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6535) Program needs to exit when SLS finishes.

2017-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013374#comment-16013374
 ] 

Hudson commented on YARN-6535:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11740 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11740/])
YARN-6535. Program needs to exit when SLS finishes. (yufeigu via (rkanter: rev 
101852ca11ed4a9c4d4664c6c797fa7173dc59ae)
* (edit) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java


> Program needs to exit when SLS finishes. 
> -
>
> Key: YARN-6535
> URL: https://issues.apache.org/jira/browse/YARN-6535
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6535.001.patch, YARN-6535.002.patch, 
> YARN-6535.003.patch
>
>
> Program need to exit when SLS finishes except in unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6608) Backport all SLS improvements from trunk to branch-2

2017-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013364#comment-16013364
 ] 

Hadoop QA commented on YARN-6608:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 16 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 5s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
21s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m  
8s{color} | {color:red} hadoop-sls in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m  
7s{color} | {color:red} hadoop-sls in the patch failed with JDK v1.8.0_131. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m  7s{color} 
| {color:red} hadoop-sls in the patch failed with JDK v1.8.0_131. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m  
8s{color} | {color:red} hadoop-sls in the patch failed with JDK v1.7.0_121. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m  8s{color} 
| {color:red} hadoop-sls in the patch failed with JDK v1.7.0_121. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 8s{color} | {color:green} hadoop-tools/hadoop-sls: The patch generated 0 new + 
0 unchanged - 238 fixed = 0 total (was 238) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m  
7s{color} | {color:red} hadoop-sls in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvneclipse {color} | {color:red}  0m  
6s{color} | {color:red} hadoop-sls in the patch failed. {color} |
| {color:red}-1{color} | {color:red} shellcheck {color} | {color:red}  0m  
8s{color} | {color:red} The patch generated 2 new + 487 unchanged - 22 fixed = 
489 total (was 509) {color} |
| {color:orange}-0{color} | {color:orange} shelldocs {color} | {color:orange}  
0m  9s{color} | {color:orange} The patch generated 16 new + 46 unchanged - 0 
fixed = 62 total (was 46) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
1s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
1s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m  
6s{color} | {color:red} hadoop-sls in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m  
6s{color} | {color:red} hadoop-sls in the patch failed with JDK v1.8.0_131. 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m  
6s{color} | {color:red} hadoop-sls in the patch failed with JDK v1.7.0_121. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m  6s{color} 
| {color:red} hadoop-sls in the patch failed with JDK v1.7.0_121. {color} |
| {color:blue}0{color} | {color:blue} asflicense {color} | {color:blue}  0m  
9s{color} | {color:blue} ASF License check generated no output? 

[jira] [Updated] (YARN-6613) Update json validation for new native services providers

2017-05-16 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-6613:
-
Attachment: YARN-6613-yarn-native-services.001.patch

This patch applies on top of YARN-6160.

> Update json validation for new native services providers
> 
>
> Key: YARN-6613
> URL: https://issues.apache.org/jira/browse/YARN-6613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Fix For: yarn-native-services
>
> Attachments: YARN-6613-yarn-native-services.001.patch
>
>
> YARN-6160 started some work enabling different validation for each native 
> services provider. The validation done in 
> ServiceApiUtil#validateApplicationPayload needs to updated accordingly. This 
> validation should also be updated to handle the APPLICATION artifact type, 
> which does not have an associated provider.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6447) Provide container sandbox policies for groups

2017-05-16 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013327#comment-16013327
 ] 

Robert Kanter commented on YARN-6447:
-

+1

> Provide container sandbox policies for groups 
> --
>
> Key: YARN-6447
> URL: https://issues.apache.org/jira/browse/YARN-6447
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 3.0.0-alpha3
>Reporter: Greg Phillips
>Assignee: Greg Phillips
>Priority: Minor
> Attachments: YARN-6447.001.patch, YARN-6447.002.patch, 
> YARN-6447.003.patch
>
>
> Currently the container sandbox feature 
> ([YARN-5280|https://issues.apache.org/jira/browse/YARN-5280]) allows YARN 
> administrators to use one Java Security Manager policy file to limit the 
> permissions granted to YARN containers.  It would be useful to allow for 
> different policy files to be used based on groups.
> For example, an administrator may want to ensure standard users who write 
> applications for the MapReduce or Tez frameworks are not allowed to open 
> arbitrary network connections within their data processing code.  Users who 
> are designing the ETL pipelines however may need to open sockets to extract 
> data from external sources.  By assigning these sets of users to different 
> groups and setting specific policies for each group you can assert fine 
> grained control over the permissions granted to each Java based container 
> across a YARN cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6409) RM does not blacklist node for AM launch failures

2017-05-16 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013321#comment-16013321
 ] 

Robert Kanter commented on YARN-6409:
-

[~rohithsharma], any comments?

> RM does not blacklist node for AM launch failures
> -
>
> Key: YARN-6409
> URL: https://issues.apache.org/jira/browse/YARN-6409
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-6409.00.patch, YARN-6409.01.patch, 
> YARN-6409.02.patch, YARN-6409.03.patch
>
>
> Currently, node blacklisting upon AM failures only handles failures that 
> happen after AM container is launched (see 
> RMAppAttemptImpl.shouldCountTowardsNodeBlacklisting()).  However, AM launch 
> can also fail if the NM, where the AM container is allocated, goes 
> unresponsive.  Because it is not handled, scheduler may continue to allocate 
> AM containers on that same NM for the following app attempts. 
> {code}
> Application application_1478721503753_0870 failed 2 times due to Error 
> launching appattempt_1478721503753_0870_02. Got exception: 
> java.io.IOException: Failed on local exception: java.io.IOException: 
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/17.111.179.113:46702 remote=*.me.com/17.111.178.125:8041]; Host 
> Details : local host is: "*.me.com/17.111.179.113"; destination host is: 
> "*.me.com":8041; 
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1475) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1408) 
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>  
> at com.sun.proxy.$Proxy86.startContainers(Unknown Source) 
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
>  
> at sun.reflect.GeneratedMethodAccessor155.invoke(Unknown Source) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
> at java.lang.reflect.Method.invoke(Method.java:497) 
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>  
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>  
> at com.sun.proxy.$Proxy87.startContainers(Unknown Source) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:120)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:256)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  
> at java.lang.Thread.run(Thread.java:745) 
> Caused by: java.io.IOException: java.net.SocketTimeoutException: 6 millis 
> timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/17.111.179.113:46702 
> remote=*.me.com/17.111.178.125:8041] 
> at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:687) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:422) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
>  
> at 
> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:650)
>  
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:738) 
> at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) 
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1524) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1447) 
> ... 15 more 
> Caused by: java.net.SocketTimeoutException: 6 millis timeout while 
> waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/17.111.179.113:46702 
> remote=*.me.com/17.111.178.125:8041] 
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) 
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) 
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) 
> at java.io.FilterInputStream.read(FilterInputStream.java:133) 
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) 
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265) 
> at java.io.DataInputStream.readInt(DataInputStream.java:387) 
> at 
> org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:367) 
> at 
> 

[jira] [Commented] (YARN-6409) RM does not blacklist node for AM launch failures

2017-05-16 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013320#comment-16013320
 ] 

Robert Kanter commented on YARN-6409:
-

I imagine the depth will vary depending on the timing and other factors of the 
error.  A depth of 3 sounds reasonable and we can always increase it later if 
we see that it's not enough.

+1

> RM does not blacklist node for AM launch failures
> -
>
> Key: YARN-6409
> URL: https://issues.apache.org/jira/browse/YARN-6409
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-6409.00.patch, YARN-6409.01.patch, 
> YARN-6409.02.patch, YARN-6409.03.patch
>
>
> Currently, node blacklisting upon AM failures only handles failures that 
> happen after AM container is launched (see 
> RMAppAttemptImpl.shouldCountTowardsNodeBlacklisting()).  However, AM launch 
> can also fail if the NM, where the AM container is allocated, goes 
> unresponsive.  Because it is not handled, scheduler may continue to allocate 
> AM containers on that same NM for the following app attempts. 
> {code}
> Application application_1478721503753_0870 failed 2 times due to Error 
> launching appattempt_1478721503753_0870_02. Got exception: 
> java.io.IOException: Failed on local exception: java.io.IOException: 
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/17.111.179.113:46702 remote=*.me.com/17.111.178.125:8041]; Host 
> Details : local host is: "*.me.com/17.111.179.113"; destination host is: 
> "*.me.com":8041; 
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1475) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1408) 
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>  
> at com.sun.proxy.$Proxy86.startContainers(Unknown Source) 
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
>  
> at sun.reflect.GeneratedMethodAccessor155.invoke(Unknown Source) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
> at java.lang.reflect.Method.invoke(Method.java:497) 
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>  
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>  
> at com.sun.proxy.$Proxy87.startContainers(Unknown Source) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:120)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:256)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  
> at java.lang.Thread.run(Thread.java:745) 
> Caused by: java.io.IOException: java.net.SocketTimeoutException: 6 millis 
> timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/17.111.179.113:46702 
> remote=*.me.com/17.111.178.125:8041] 
> at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:687) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:422) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
>  
> at 
> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:650)
>  
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:738) 
> at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) 
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1524) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1447) 
> ... 15 more 
> Caused by: java.net.SocketTimeoutException: 6 millis timeout while 
> waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/17.111.179.113:46702 
> remote=*.me.com/17.111.178.125:8041] 
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) 
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) 
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) 
> at java.io.FilterInputStream.read(FilterInputStream.java:133) 
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) 
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265) 
> at 

[jira] [Commented] (YARN-5330) SharingPolicy enhancements required to support recurring reservations in the YARN ReservationSystem

2017-05-16 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013276#comment-16013276
 ] 

Carlo Curino commented on YARN-5330:


(patch depends on YARN-5328, hence the issues). 

The patch makes small updates to the policies, and completely reworks the test 
infrastructure, moved to parametrized, and broadly increased coverage.

> SharingPolicy enhancements required to support recurring reservations in the 
> YARN ReservationSystem
> ---
>
> Key: YARN-5330
> URL: https://issues.apache.org/jira/browse/YARN-5330
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Subru Krishnan
>Assignee: Carlo Curino
> Attachments: YARN-5330.v0.patch
>
>
> YARN-5326 proposes adding native support for recurring reservations in the 
> YARN ReservationSystem. This JIRA is a sub-task to track the changes required 
> in SharingPolicy to accomplish it. Please refer to the design doc in the 
> parent JIRA for details.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5330) SharingPolicy enhancements required to support recurring reservations in the YARN ReservationSystem

2017-05-16 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-5330:
---
Attachment: YARN-5330.v0.patch

> SharingPolicy enhancements required to support recurring reservations in the 
> YARN ReservationSystem
> ---
>
> Key: YARN-5330
> URL: https://issues.apache.org/jira/browse/YARN-5330
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Subru Krishnan
>Assignee: Carlo Curino
> Attachments: YARN-5330.v0.patch
>
>
> YARN-5326 proposes adding native support for recurring reservations in the 
> YARN ReservationSystem. This JIRA is a sub-task to track the changes required 
> in SharingPolicy to accomplish it. Please refer to the design doc in the 
> parent JIRA for details.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5328) InMemoryPlan enhancements required to support recurring reservations in the YARN ReservationSystem

2017-05-16 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-5328:
---
Attachment: YARN-5328-v4.patch

(adding a test)

> InMemoryPlan enhancements required to support recurring reservations in the 
> YARN ReservationSystem
> --
>
> Key: YARN-5328
> URL: https://issues.apache.org/jira/browse/YARN-5328
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-5328-v1.patch, YARN-5328-v2.patch, 
> YARN-5328-v3.patch, YARN-5328-v4.patch
>
>
> YARN-5326 proposes adding native support for recurring reservations in the 
> YARN ReservationSystem. This JIRA is a sub-task to track the changes required 
> in InMemoryPlan to accomplish it. Please refer to the design doc in the 
> parent JIRA for details.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5328) InMemoryPlan enhancements required to support recurring reservations in the YARN ReservationSystem

2017-05-16 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013271#comment-16013271
 ] 

Carlo Curino commented on YARN-5328:


[~subru] I have updated the patch to address the fix above and a couple of 
other things I found while working on YARN-5330 (first user patch). I now need 
you to "review back" this change :-)  
In particular, I am making the assumption that the RLE sent in as part of a 
{{InMemoryPlan}} where we add  a allocation (near line 203) is already 
"unrolled" within the LCM (aka maxPeriodicity). 
Does this make sense to you?

> InMemoryPlan enhancements required to support recurring reservations in the 
> YARN ReservationSystem
> --
>
> Key: YARN-5328
> URL: https://issues.apache.org/jira/browse/YARN-5328
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-5328-v1.patch, YARN-5328-v2.patch, 
> YARN-5328-v3.patch
>
>
> YARN-5326 proposes adding native support for recurring reservations in the 
> YARN ReservationSystem. This JIRA is a sub-task to track the changes required 
> in InMemoryPlan to accomplish it. Please refer to the design doc in the 
> parent JIRA for details.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5328) InMemoryPlan enhancements required to support recurring reservations in the YARN ReservationSystem

2017-05-16 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-5328:
---
Attachment: YARN-5328-v3.patch

> InMemoryPlan enhancements required to support recurring reservations in the 
> YARN ReservationSystem
> --
>
> Key: YARN-5328
> URL: https://issues.apache.org/jira/browse/YARN-5328
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-5328-v1.patch, YARN-5328-v2.patch, 
> YARN-5328-v3.patch
>
>
> YARN-5326 proposes adding native support for recurring reservations in the 
> YARN ReservationSystem. This JIRA is a sub-task to track the changes required 
> in InMemoryPlan to accomplish it. Please refer to the design doc in the 
> parent JIRA for details.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6614) Deprecate DistributedSchedulingProtocol and add required fields directly to ApplicationMasterProtocol

2017-05-16 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-6614:
--
Description: 
The {{DistributedSchedulingProtocol}} was initially designed as a wrapper 
protocol over the {{ApplicaitonMasterProtocol}}.

This JIRA proposes to deprecate the protocol itself and move the extra fields 
of the {{RegisterDistributedSchedulingAMResponse}} and 
{{DistributedSchedulingAllocateResponse}} to the 
{{RegisterApplicationMasterResponse}} and {{AllocateResponse}} respectively.

This will simplify the code quite a bit and make it easier to expose it as a 
preprocessor.

  was:
The {{DistributedSchedulingProtocol}} was initially designed as a wrapper 
protocol over the {{ApplicaitonMasterProtocol}}.

This JIRA proposes to deprecate the protocol itself and move the extra fields 
of the {{RegisterDistributedSchedulingAMResponse}} and 
{{DistributedSchedulingAllocateResponse}} to the 
{{RegisterApplicationMasterResponse}} and {{AllocateResponse}} respectively.

This will simplify the code quite a bit and make it reimplement the feature as 
a preprocessor.


> Deprecate DistributedSchedulingProtocol and add required fields directly to 
> ApplicationMasterProtocol
> -
>
> Key: YARN-6614
> URL: https://issues.apache.org/jira/browse/YARN-6614
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>
> The {{DistributedSchedulingProtocol}} was initially designed as a wrapper 
> protocol over the {{ApplicaitonMasterProtocol}}.
> This JIRA proposes to deprecate the protocol itself and move the extra fields 
> of the {{RegisterDistributedSchedulingAMResponse}} and 
> {{DistributedSchedulingAllocateResponse}} to the 
> {{RegisterApplicationMasterResponse}} and {{AllocateResponse}} respectively.
> This will simplify the code quite a bit and make it easier to expose it as a 
> preprocessor.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6614) Deprecate DistributedSchedulingProtocol and add required fields directly to ApplicationMasterProtocol

2017-05-16 Thread Arun Suresh (JIRA)
Arun Suresh created YARN-6614:
-

 Summary: Deprecate DistributedSchedulingProtocol and add required 
fields directly to ApplicationMasterProtocol
 Key: YARN-6614
 URL: https://issues.apache.org/jira/browse/YARN-6614
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Arun Suresh
Assignee: Arun Suresh


The {{DistributedSchedulingProtocol}} was initially designed as a wrapper 
protocol over the {{ApplicaitonMasterProtocol}}.

This JIRA proposes to deprecate the protocol itself and move the extra fields 
of the {{RegisterDistributedSchedulingAMResponse}} and 
{{DistributedSchedulingAllocateResponse}} to the 
{{RegisterApplicationMasterResponse}} and {{AllocateResponse}} respectively.

This will simplify the code quite a bit and make it reimplement the feature as 
a preprocessor.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6160) Create an agent-less docker-less provider in the native services framework

2017-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013216#comment-16013216
 ] 

Hadoop QA commented on YARN-6160:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
56s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
41s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
23s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
3s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 21s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-slider/hadoop-yarn-slider-core:
 The patch generated 29 new + 452 unchanged - 42 fixed = 481 total (was 494) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
0s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
14s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-slider/hadoop-yarn-slider-core
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
48s{color} | {color:green} hadoop-yarn-slider-core in the patch passed. {color} 
|
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 28m 16s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-slider/hadoop-yarn-slider-core
 |
|  |  Dead store to config in 
org.apache.slider.client.SliderClient.doClientInstall(ActionClientArgs)  At 
SliderClient.java:org.apache.slider.client.SliderClient.doClientInstall(ActionClientArgs)
  At SliderClient.java:[line 1172] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ac17dc |
| JIRA Issue | YARN-6160 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12868404/YARN-6160-yarn-native-services.003.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  xml  |
| uname | Linux 15dd2829bd6b 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 
16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | yarn-native-services / 8c32344 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Updated] (YARN-6160) Create an agent-less docker-less provider in the native services framework

2017-05-16 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-6160:
-
Attachment: YARN-6160-yarn-native-services.003.patch

Thanks for the review, [~jianhe]! Here is a new patch that removes the 
slider.xml config file that is no longer used and properly initializes the 
providers list in SliderAppMaster. I opened YARN-6613 for the follow-on 
validation work.

> Create an agent-less docker-less provider in the native services framework
> --
>
> Key: YARN-6160
> URL: https://issues.apache.org/jira/browse/YARN-6160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Fix For: yarn-native-services
>
> Attachments: YARN-6160-yarn-native-services.001.patch, 
> YARN-6160-yarn-native-services.002.patch, 
> YARN-6160-yarn-native-services.003.patch
>
>
> The goal of the agent-less docker-less provider is to be able to use the YARN 
> native services framework when Docker is not installed or other methods of 
> app resource installation are preferable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6613) Update json validation for new native services providers

2017-05-16 Thread Billie Rinaldi (JIRA)
Billie Rinaldi created YARN-6613:


 Summary: Update json validation for new native services providers
 Key: YARN-6613
 URL: https://issues.apache.org/jira/browse/YARN-6613
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Fix For: yarn-native-services


YARN-6160 started some work enabling different validation for each native 
services provider. The validation done in 
ServiceApiUtil#validateApplicationPayload needs to updated accordingly. This 
validation should also be updated to handle the APPLICATION artifact type, 
which does not have an associated provider.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6610) DominantResourceCalculator.getResourceAsValue() dominant param is no longer appropriate

2017-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013141#comment-16013141
 ] 

Hadoop QA commented on YARN-6610:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} YARN-6610 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6610 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12868398/YARN-6610.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/15942/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> DominantResourceCalculator.getResourceAsValue() dominant param is no longer 
> appropriate
> ---
>
> Key: YARN-6610
> URL: https://issues.apache.org/jira/browse/YARN-6610
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-6610.001.patch
>
>
> The {{dominant}} param assumes there are only two resources, i.e. true means 
> to compare the dominant, and false means to compare the subordinate.  Now 
> that there are _n_ resources, this parameter no longer makes sense.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6610) DominantResourceCalculator.getResourceAsValue() dominant param is no longer appropriate

2017-05-16 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-6610:
---
Attachment: YARN-6610.001.patch

Here's a first pass at making the compare make sense again.  Thoughts?  The 
patch still needs tests.

> DominantResourceCalculator.getResourceAsValue() dominant param is no longer 
> appropriate
> ---
>
> Key: YARN-6610
> URL: https://issues.apache.org/jira/browse/YARN-6610
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>Priority: Critical
> Attachments: YARN-6610.001.patch
>
>
> The {{dominant}} param assumes there are only two resources, i.e. true means 
> to compare the dominant, and false means to compare the subordinate.  Now 
> that there are _n_ resources, this parameter no longer makes sense.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6610) DominantResourceCalculator.getResourceAsValue() dominant param is no longer appropriate

2017-05-16 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton reassigned YARN-6610:
--

Assignee: Daniel Templeton

> DominantResourceCalculator.getResourceAsValue() dominant param is no longer 
> appropriate
> ---
>
> Key: YARN-6610
> URL: https://issues.apache.org/jira/browse/YARN-6610
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-6610.001.patch
>
>
> The {{dominant}} param assumes there are only two resources, i.e. true means 
> to compare the dominant, and false means to compare the subordinate.  Now 
> that there are _n_ resources, this parameter no longer makes sense.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013080#comment-16013080
 ] 

Hadoop QA commented on YARN-2113:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
8s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
28s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 56s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 22 new + 198 unchanged - 3 fixed = 220 total (was 201) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
37s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
34s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common 
generated 2 new + 4574 unchanged - 0 fixed = 4576 total (was 4574) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
34s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 39m 
28s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 96m 20s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-2113 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12868383/YARN-2113.0018.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux dadf369e37b0 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / b415c6f |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-YARN-Build/15941/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common-warnings.html
 |
| checkstyle | 

[jira] [Commented] (YARN-6160) Create an agent-less docker-less provider in the native services framework

2017-05-16 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013070#comment-16013070
 ] 

Jian He commented on YARN-6160:
---

looks good over all, minor comments:
- remove unused variable SliderAppMaster#providers ?
- the slider.xml as discussed offline may be removed. we'll add a new xml file 
for how to config native service framework when required.


> Create an agent-less docker-less provider in the native services framework
> --
>
> Key: YARN-6160
> URL: https://issues.apache.org/jira/browse/YARN-6160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Fix For: yarn-native-services
>
> Attachments: YARN-6160-yarn-native-services.001.patch, 
> YARN-6160-yarn-native-services.002.patch
>
>
> The goal of the agent-less docker-less provider is to be able to use the YARN 
> native services framework when Docker is not installed or other methods of 
> app resource installation are preferable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6612) Update fair scheduler policies to be aware of resource types

2017-05-16 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6612:
--

 Summary: Update fair scheduler policies to be aware of resource 
types
 Key: YARN-6612
 URL: https://issues.apache.org/jira/browse/YARN-6612
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Affects Versions: YARN-3926
Reporter: Daniel Templeton
Assignee: Daniel Templeton






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6611) ResourceTypes should be renamed

2017-05-16 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6611:
--

 Summary: ResourceTypes should be renamed
 Key: YARN-6611
 URL: https://issues.apache.org/jira/browse/YARN-6611
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton


{{ResourceTypes}} is too close to the unrelated {{ResourceType}} class.  Maybe 
{{ResourceClass}} would be better?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6610) DominantResourceCalculator.getResourceAsValue() dominant param is no longer appropriate

2017-05-16 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6610:
--

 Summary: DominantResourceCalculator.getResourceAsValue() dominant 
param is no longer appropriate
 Key: YARN-6610
 URL: https://issues.apache.org/jira/browse/YARN-6610
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton
Priority: Critical


The {{dominant}} param assumes there are only two resources, i.e. true means to 
compare the dominant, and false means to compare the subordinate.  Now that 
there are _n_ resources, this parameter no longer makes sense.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6609) AdminService should use RMFatalEvent instead of calling transitionToStandby() directly

2017-05-16 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6609:
--

 Summary: AdminService should use RMFatalEvent instead of calling 
transitionToStandby() directly
 Key: YARN-6609
 URL: https://issues.apache.org/jira/browse/YARN-6609
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.9.0
Reporter: Daniel Templeton
Assignee: Daniel Templeton
Priority: Minor


In YARN-3742, we put the RM in charge of transitioning itself to standby.  
Instead of having lots of different components driving the transition from lots 
of different places, they now throw an {{RMFatalEvent}} at the RM, and the RM 
decides what the right thing to do is.

{{AdminService}} still calls {{transitionToStandby()}} directly, without first 
checking whether HA is enabled.  To be consistent and safer, {{AdminService}} 
should probably also use an {{RMFatalEvent}} to initiate the transition.

This change might have some cascading API impact, making some methods no longer 
useful.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-05-16 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2113:
--
Attachment: YARN-2113.0018.patch

Thanks [~eepayne] for investigating this.
{{when(cs.getResourceCalculator()).thenReturn(rc);}} is needed to be added. 
However simply adding this still passes all my tests.

I think you have made some more changes in the tests such as increasing cluster 
resource to a higher vcores. This has impacted the preemptionLimit value 
(usually 10% or 20% of cluster resource) to give more dominance over vcores.  I 
could see that preemption is happening till vcores are satisfied (thought 
memory is 0). However this is not needed.

As mentioned earlier, this issue is know in YARN-6538. I was trying to fix as 
part of that to add a new api in {{Resources}} class. Usually we will have a 
resource object and will deduct resources from it in every loop. At the start 
of loop, {{Resources.lessThanOrEqual}} will be checked to see whether resource 
has gone under {{Resources.none}}

For eg:

{code:title=AbstractPreemptableResourceCalculator.computeFixpointAllocation}
// assign all cluster resources until no more demand, or no resources are
// left
while (!orderedByNeed.isEmpty() && Resources.greaterThan(rc, totGuarant,
unassigned, Resources.none())) {
{code}
Here {{unassigned}} could be like *0 memory and 10 vcores*. However 
{{Resources.greaterThan}} does work with dominance and loop will continue.

I have added a new api named {{Resources.isAnyResourceZero}} to ensure that 
loop will not continue if any resource is zero in DRF. I ll extend same in 
YARN-6538 as well.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.apply.onto.0012.ericp.patch, YARN-2113 Intra-QueuePreemption 
> Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5531) UnmanagedAM pool manager for federating application across clusters

2017-05-16 Thread Botong Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012936#comment-16012936
 ] 

Botong Huang edited comment on YARN-5531 at 5/16/17 6:59 PM:
-

Thanks [~kasha] for the detailed comments! I have addressed most of them in v11 
patch, the rest explanations are here: 

* 1 & 3.3.3. The reason we put it here is that Federation Interceptor 
(YARN-3666 and YARN-6511) in NM will be using UAM. Putting it in Yarn Client 
will result in cyclic dependencies for NM project. 

* 2.1-2 This is generalized from the Federation use case, where for one 
application we enforce the same applicationId in all sub-clusters (RMs in 
different sub-clusters use different epochs, so that their app Id won't 
overlap). uamID (sub-cluster ID really) is used to identify the UAMs. In v11 
patch, I made the input attemptId becomes optional. If not supplied, the UAM 
will ask for an appID from RM first. In general, attempt id can be used as the 
uamID. 

* 2.5.1 Parallel kill is necessary for performance reason. In federation, the 
service stop of UAM pool is in the code path of Federation Interceptor 
shutdown, potentially blocking the application finish event in the NM where AM 
is running. Furthermore, when we try to kill the UAMs, RM in some sub-clusters 
might be failing over, which takes several minutes to come back. Sequential 
kill can be bad. 

* 2.5.5 Because of the above reason, I prefer not to retry here. One option is 
to throw the exception past this stop call, the user can handle the exception 
and retry if needed. In Federation Interceptor's case, we can simply catch it, 
log as warning and move on. What do you think?

* 2.8.2 & 3.1 & 3.6.2 As mentioned with [~subru] earlier, this UAM pool and UAM 
is more of a library for the actual UAM. The interface UAM pool expose to user 
is similar to {{ApplicationMasterProtocol}} (registerAM, allocate and 
finishAM), user is supposed to act like an AM and heartbeat to us. So for 
{{finishApplicationMaster}}, we abide by the protocol, if the UAM is still 
registered after the finishAM call, the user should retry. 

* 3.3.1 & 3.3.4 The launch UAM code is indeed a bit messy, I've cleaned up the 
code in v11. I merged the two monitor methods, might look a bit complex, can 
revert if needed. 

* 3.5.1 AsyncCallback works nicely in here. I think dispatcher can work as 
well, but I'd prefer to do that in another JIRA if needed. 

* 3.7.2-3 This is a corner use case for Federation. In federation interceptor, 
we handle the UAMs asynchronously. UAM is created the first time AM tries to 
ask for resource from certain sub-cluster. The register, allocate and finish 
calls for UAM are all triggered by heartbeats from AM. This means that all 
three calls are triggered asynchronously. For instance, while the register call 
for UAM is still pending (say because the UAM RM is falling over and the 
register call is blocked for five minutes), we need to allow the allocate calls 
to come in without exception and buffer them. Once the register succeeds later, 
we should be able to move on from there. 






was (Author: botong):
Thanks [~kasha] for the detailed comments! I have addressed most of them in v11 
patch, the rest explanations are here: 

* 1 & 3.3.3. The reason we put it here is that Federation Interceptor 
(YARN-3666 and YARN-6511) in NM will be using UAM. Putting it in Yarn Client 
will result in cyclic dependencies for NM project. 

* 2.1-2 This is generalized from the Federation use case, where for one 
application we enforce the same applicationId in all sub-clusters (RMs in 
different sub-clusters use different epochs, so that their app Id won't 
overlap). uamID (sub-cluster ID really) is used to identify the UAMs. In v11 
patch, I made the input attemptId becomes optional. If not supplied, the UAM 
will ask for an appID from RM first. In general, attempt id can be used as the 
uamID. 

* 2.5.1 Parallel kill is necessary for performance reason. In federation, the 
service stop of UAM pool is in the code path of Federation Interceptor 
shutdown, potentially blocking the application finish event in the NM where AM 
is running. Furthermore, when we try to kill the UAMs, RM in some sub-clusters 
might be failing over, which takes several minutes to come back. Sequential 
kill can be bad. 

* 2.5.5 Because of the above reason, I prefer not to retry here. One option is 
to throw the exception past this stop call, the user can handle the exception 
and retry if needed. In Federation Interceptor's case, we can simply catch it, 
log as warning and move on. What do you think?

* 2.8.2 & 3.1 & 3.6.2 As mentioned with [~subru] earlier, this UAM pool and UAM 
is more of a library for the actual UAM. The interface UAM pool expose to user 
is similar to {{ApplicationMasterProtocol}} (registerAM, allocate and 
finishAM), user is supposed to act like an AM and heartbeat 

[jira] [Commented] (YARN-5531) UnmanagedAM pool manager for federating application across clusters

2017-05-16 Thread Botong Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012936#comment-16012936
 ] 

Botong Huang commented on YARN-5531:


Thanks [~kasha] for the detailed comments! I have addressed most of them in v11 
patch, the rest explanations are here: 

* 1 & 3.3.3. The reason we put it here is that Federation Interceptor 
(YARN-3666 and YARN-6511) in NM will be using UAM. Putting it in Yarn Client 
will result in cyclic dependencies for NM project. 

* 2.1-2 This is generalized from the Federation use case, where for one 
application we enforce the same applicationId in all sub-clusters (RMs in 
different sub-clusters use different epochs, so that their app Id won't 
overlap). uamID (sub-cluster ID really) is used to identify the UAMs. In v11 
patch, I made the input attemptId becomes optional. If not supplied, the UAM 
will ask for an appID from RM first. In general, attempt id can be used as the 
uamID. 

* 2.5.1 Parallel kill is necessary for performance reason. In federation, the 
service stop of UAM pool is in the code path of Federation Interceptor 
shutdown, potentially blocking the application finish event in the NM where AM 
is running. Furthermore, when we try to kill the UAMs, RM in some sub-clusters 
might be failing over, which takes several minutes to come back. Sequential 
kill can be bad. 

* 2.5.5 Because of the above reason, I prefer not to retry here. One option is 
to throw the exception past this stop call, the user can handle the exception 
and retry if needed. In Federation Interceptor's case, we can simply catch it, 
log as warning and move on. What do you think?

* 2.8.2 & 3.1 & 3.6.2 As mentioned with [~subru] earlier, this UAM pool and UAM 
is more of a library for the actual UAM. The interface UAM pool expose to user 
is similar to {{ApplicationMasterProtocol}} (registerAM, allocate and 
finishAM), user is supposed to act like an AM and heartbeat to us. So for 
{{finishApplicationMaster}}, we abide by the protocol, if the UAM is still 
registered after the finishAM call, the user should retry. 

* 3.3.1 & 3.3.4 The launch UAM code is indeed a bit messy, I've cleaned up the 
code in v11. I merged the two monitor methods, might look a bit complex, can 
revert if needed. 

* 3.5.1 AsyncCallback works nicely in here. I think dispatcher can work as 
well, but I'd prefer to do that in another JIRA if needed. 

* 3.7.2-3 This is a corner use case for Federation. In federation interceptor, 
we handle the UAMs asynchronously. UAM is created the first time AM try to ask 
for resource from certain sub-cluster. The register, allocate and finish calls 
for UAM are all triggered by heartbeats from AM. This means that all three 
calls are triggered asynchronously. For instance, while the register call for 
UAM is still pending (say because the UAM RM is falling over and the register 
call is blocked for five minutes), we need to allow the allocate calls to come 
in without exception and buffer them. Once the register succeeds late, we 
should be able to move on from there. 





> UnmanagedAM pool manager for federating application across clusters
> ---
>
> Key: YARN-5531
> URL: https://issues.apache.org/jira/browse/YARN-5531
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Botong Huang
> Attachments: YARN-5531-YARN-2915.v10.patch, 
> YARN-5531-YARN-2915.v11.patch, YARN-5531-YARN-2915.v1.patch, 
> YARN-5531-YARN-2915.v2.patch, YARN-5531-YARN-2915.v3.patch, 
> YARN-5531-YARN-2915.v4.patch, YARN-5531-YARN-2915.v5.patch, 
> YARN-5531-YARN-2915.v6.patch, YARN-5531-YARN-2915.v7.patch, 
> YARN-5531-YARN-2915.v8.patch, YARN-5531-YARN-2915.v9.patch
>
>
> One of the main tenets the YARN Federation is to *transparently* scale 
> applications across multiple clusters. This is achieved by running UAMs on 
> behalf of the application on other clusters. This JIRA tracks the addition of 
> a UnmanagedAM pool manager for federating application across clusters which 
> will be used the FederationInterceptor (YARN-3666) which is part of the 
> AMRMProxy pipeline introduced in YARN-2884.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6306) NMClient API change for container upgrade

2017-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012837#comment-16012837
 ] 

Hudson commented on YARN-6306:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11738 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11738/])
YARN-6306. NMClient API change for container upgrade. Contributed by (jianhe: 
rev 8236130b2c61ab0ee9b8ed747ce8cf96af7f17aa)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/NMClientAsyncImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/NMClientImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestNMClientAsync.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/NMClient.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/NMClientAsync.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java


> NMClient API change for container upgrade
> -
>
> Key: YARN-6306
> URL: https://issues.apache.org/jira/browse/YARN-6306
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Arun Suresh
> Attachments: YARN-6306.001.patch, YARN-6306.002.patch, 
> YARN-6306.003.patch, YARN-6306.004.patch
>
>
> This JIRA is track the addition of Upgrade API (Re-Initialize, Restart, 
> Rollback and Commit) to the NMClient and NMClientAsync



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5531) UnmanagedAM pool manager for federating application across clusters

2017-05-16 Thread Botong Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-5531:
---
Attachment: YARN-5531-YARN-2915.v11.patch

> UnmanagedAM pool manager for federating application across clusters
> ---
>
> Key: YARN-5531
> URL: https://issues.apache.org/jira/browse/YARN-5531
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Botong Huang
> Attachments: YARN-5531-YARN-2915.v10.patch, 
> YARN-5531-YARN-2915.v11.patch, YARN-5531-YARN-2915.v1.patch, 
> YARN-5531-YARN-2915.v2.patch, YARN-5531-YARN-2915.v3.patch, 
> YARN-5531-YARN-2915.v4.patch, YARN-5531-YARN-2915.v5.patch, 
> YARN-5531-YARN-2915.v6.patch, YARN-5531-YARN-2915.v7.patch, 
> YARN-5531-YARN-2915.v8.patch, YARN-5531-YARN-2915.v9.patch
>
>
> One of the main tenets the YARN Federation is to *transparently* scale 
> applications across multiple clusters. This is achieved by running UAMs on 
> behalf of the application on other clusters. This JIRA tracks the addition of 
> a UnmanagedAM pool manager for federating application across clusters which 
> will be used the FederationInterceptor (YARN-3666) which is part of the 
> AMRMProxy pipeline introduced in YARN-2884.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6608) Backport all SLS improvements from trunk to branch-2

2017-05-16 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-6608:
---
Attachment: YARN-6608-branch-2.v0.patch

> Backport all SLS improvements from trunk to branch-2
> 
>
> Key: YARN-6608
> URL: https://issues.apache.org/jira/browse/YARN-6608
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-6608-branch-2.v0.patch
>
>
> The SLS has received lots of attention in trunk, but only some of it made it 
> back to branch-2. This patch is a "raw" fork-lift of the trunk development 
> from hadoop-tools/hadoop-sls.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6608) Backport all SLS improvements from trunk to branch-2

2017-05-16 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012767#comment-16012767
 ] 

Carlo Curino commented on YARN-6608:


SLS is providing a basis to construct useful integration tests (YARN-6363, 
YARN-6451, YARN-6547). This would be nice to have in branch-2 as well. 

Per off-line discussion with [~leftnoteasy] and [~chris.douglas], we are 
proposing to forklift back to branch-2 all the improvements that SLS received 
in trunk. 
The attached patch is basically a {{git checkout trunk -- 
hadoop-tools/hadoop-sls}} plus simple deletion of {{ResourceSchedulerWrapper}} 
(not needed anymore
and not compiling). SLS tests pass locally (expect issue tracked in 
YARN-6), let's see what yetus has to say.

 



> Backport all SLS improvements from trunk to branch-2
> 
>
> Key: YARN-6608
> URL: https://issues.apache.org/jira/browse/YARN-6608
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>
> The SLS has received lots of attention in trunk, but only some of it made it 
> back to branch-2. This patch is a "raw" fork-lift of the trunk development 
> from hadoop-tools/hadoop-sls.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6608) Backport all SLS improvements from trunk to branch-2

2017-05-16 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012767#comment-16012767
 ] 

Carlo Curino edited comment on YARN-6608 at 5/16/17 5:25 PM:
-

SLS is providing a basis to construct useful integration tests (YARN-6363, 
YARN-6451, YARN-6547). This would be nice to have in branch-2 as well. 

Per off-line discussion with [~leftnoteasy] and [~chris.douglas], we are 
proposing to forklift back to branch-2 all the improvements that SLS received 
in trunk. 
The attached patch is basically a {{git checkout trunk -- 
hadoop-tools/hadoop-sls}} plus simple deletion of {{ResourceSchedulerWrapper}} 
(not needed anymore
and not compiling). SLS tests pass locally (expect issue tracked in YARN-6111), 
let's see what yetus has to say.

 




was (Author: curino):
SLS is providing a basis to construct useful integration tests (YARN-6363, 
YARN-6451, YARN-6547). This would be nice to have in branch-2 as well. 

Per off-line discussion with [~leftnoteasy] and [~chris.douglas], we are 
proposing to forklift back to branch-2 all the improvements that SLS received 
in trunk. 
The attached patch is basically a {{git checkout trunk -- 
hadoop-tools/hadoop-sls}} plus simple deletion of {{ResourceSchedulerWrapper}} 
(not needed anymore
and not compiling). SLS tests pass locally (expect issue tracked in 
YARN-6), let's see what yetus has to say.

 



> Backport all SLS improvements from trunk to branch-2
> 
>
> Key: YARN-6608
> URL: https://issues.apache.org/jira/browse/YARN-6608
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>
> The SLS has received lots of attention in trunk, but only some of it made it 
> back to branch-2. This patch is a "raw" fork-lift of the trunk development 
> from hadoop-tools/hadoop-sls.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6608) Backport all SLS improvements from trunk to branch-2

2017-05-16 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012764#comment-16012764
 ] 

Yufei Gu commented on YARN-6608:


Great! Thanks for filing this, [~curino]!

> Backport all SLS improvements from trunk to branch-2
> 
>
> Key: YARN-6608
> URL: https://issues.apache.org/jira/browse/YARN-6608
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>
> The SLS has received lots of attention in trunk, but only some of it made it 
> back to branch-2. This patch is a "raw" fork-lift of the trunk development 
> from hadoop-tools/hadoop-sls.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6608) Backport all SLS improvements from trunk to branch-2

2017-05-16 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-6608:
--

 Summary: Backport all SLS improvements from trunk to branch-2
 Key: YARN-6608
 URL: https://issues.apache.org/jira/browse/YARN-6608
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.9.0
Reporter: Carlo Curino
Assignee: Carlo Curino


The SLS has received lots of attention in trunk, but only some of it made it 
back to branch-2. This patch is a "raw" fork-lift of the trunk development from 
hadoop-tools/hadoop-sls.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6587) Refactor of ResourceManager#startWebApp in a Util class

2017-05-16 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012664#comment-16012664
 ] 

Carlo Curino commented on YARN-6587:


I manually kicked Jenkins to pickup the branch-2 patch (let's see if it works, 
we might have to re-open the JIRA).

> Refactor of ResourceManager#startWebApp in a Util class
> ---
>
> Key: YARN-6587
> URL: https://issues.apache.org/jira/browse/YARN-6587
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6587-branch-2.v1.patch, YARN-6587.v1.patch, 
> YARN-6587.v2.patch
>
>
> This jira tracks the refactor of ResourceManager#startWebApp in a util class 
> since Router in YARN-5412 has to implement the same logic for Filtering and 
> Authentication.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5412) Create a proxy chain for ResourceManager REST API in the Router

2017-05-16 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012661#comment-16012661
 ] 

Carlo Curino commented on YARN-5412:


[~giovanni.fumarola] I rebased the branch YARN-2915 to pick YARN-6587. 

> Create a proxy chain for ResourceManager REST API in the Router
> ---
>
> Key: YARN-5412
> URL: https://issues.apache.org/jira/browse/YARN-5412
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Giovanni Matteo Fumarola
>
> As detailed in the proposal in the umbrella JIRA, we are introducing a new 
> component that routes client request to appropriate ResourceManager(s). This 
> JIRA tracks the creation of a proxy for ResourceManager REST API in the 
> Router. This provides a placeholder for:
> 1) throttling mis-behaving clients (YARN-1546)
> 3) mask the access to multiple RMs (YARN-3659)
> We are planning to follow the interceptor pattern like we did in YARN-2884 to 
> generalize the approach and have only dynamically coupling for Federation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6603) NPE in RMAppsBlock

2017-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012516#comment-16012516
 ] 

Hudson commented on YARN-6603:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11735 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11735/])
YARN-6603. NPE in RMAppsBlock. Contributed by Jason Lowe (jlowe: rev 
489f85933c508bc26de607b921e56e23b979fce8)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppsBlock.java


> NPE in RMAppsBlock
> --
>
> Key: YARN-6603
> URL: https://issues.apache.org/jira/browse/YARN-6603
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.9.0, 2.8.1, 3.0.0-alpha3
>
> Attachments: YARN-6603.001.patch, YARN-6603.002.patch
>
>
> We are seeing an intermittent NPE when the RM is trying to render the 
> /cluster URI.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6570) No logs were found for running application, running container

2017-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012511#comment-16012511
 ] 

Hadoop QA commented on YARN-6570:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
40s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 in trunk has 5 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 11s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 33m 29s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.nodemanager.TestEventFlow |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6570 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12868320/YARN-6570.poc.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 066ace92d35a 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / c48f297 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-YARN-Build/15938/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/15938/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/15938/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output 

[jira] [Commented] (YARN-6603) NPE in RMAppsBlock

2017-05-16 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012435#comment-16012435
 ] 

Daryn Sharp commented on YARN-6603:
---

+1 I think no test is fine due to difficulty of forcing the race condition and 
the patch essentially amounts to a null check.  Failed tests appear unrelated.

> NPE in RMAppsBlock
> --
>
> Key: YARN-6603
> URL: https://issues.apache.org/jira/browse/YARN-6603
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-6603.001.patch, YARN-6603.002.patch
>
>
> We are seeing an intermittent NPE when the RM is trying to render the 
> /cluster URI.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6570) No logs were found for running application, running container

2017-05-16 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6570:
-
Attachment: YARN-6570.poc.patch

> No logs were found for running application, running container
> -
>
> Key: YARN-6570
> URL: https://issues.apache.org/jira/browse/YARN-6570
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-6570.poc.patch
>
>
> 1.Obtain running containers from the following CLI for running application:
>  yarn  container -list appattempt
> 2. Couldnot fetch logs 
> {code}
> Can not find any log file matching the pattern: ALL for the container
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6570) No logs were found for running application, running container

2017-05-16 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012408#comment-16012408
 ] 

Junping Du commented on YARN-6570:
--

Just check trunk branch code and found that YARN-4597 (commit to 2.9 and trunk) 
already included a fix by introducing SCHEDULED state for containers before 
into running state. However, there is still a small issue that NEW state are 
captured as SCHEDULED which doesn't seems to be right.
Put up a poc patch with quick fix. Patch with UT will come after.

> No logs were found for running application, running container
> -
>
> Key: YARN-6570
> URL: https://issues.apache.org/jira/browse/YARN-6570
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
>
> 1.Obtain running containers from the following CLI for running application:
>  yarn  container -list appattempt
> 2. Couldnot fetch logs 
> {code}
> Can not find any log file matching the pattern: ALL for the container
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6471) Support to add min/max resource configuration for a queue

2017-05-16 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-6471:
--
Attachment: YARN-6471.005.patch

Thanks [~leftnoteasy].

#1/2 looks fine for me as well after a detailed check. 

For #3, we need to re-calculate configuredResource only when there is a change 
in queue config. Rest all changes to calculate eff* resources will be in 
*updateClusterResources*.

Attaching a new patch

> Support to add min/max resource configuration for a queue
> -
>
> Key: YARN-6471
> URL: https://issues.apache.org/jira/browse/YARN-6471
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-6471.001.patch, YARN-6471.002.patch, 
> YARN-6471.003.patch, YARN-6471.004.patch, YARN-6471.005.patch
>
>
> This jira will track the new configurations which are needed to configure min 
> resource and max resource of various resource types in a queue.
> For eg: 
> {noformat}
> yarn.scheduler.capacity.root.default.memory.min-resource
> yarn.scheduler.capacity.root.default.memory.max-resource
> yarn.scheduler.capacity.root.default.vcores.min-resource
> yarn.scheduler.capacity.root.default.vcores.max-resource
> {noformat}
> Uploading a patch soon



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1503) Support making additional 'LocalResources' available to running containers

2017-05-16 Thread Bingxue Qiu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012024#comment-16012024
 ] 

Bingxue Qiu commented on YARN-1503:
---

hi,[~jianhe] we have a use case, where the full implementation of localization 
status in ContainerStatusProto  need to be done , so we make it. please feel 
free to  give some advice , thx.
[YARN-6606 |https://issues.apache.org/jira/browse/YARN-6606]

> Support making additional 'LocalResources' available to running containers
> --
>
> Key: YARN-1503
> URL: https://issues.apache.org/jira/browse/YARN-1503
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Jian He
> Attachments: Continuous-resource-localization.pdf
>
>
> We have a use case, where additional resources (jars, libraries etc) need to 
> be made available to an already running container. Ideally, we'd like this to 
> be done via YARN (instead of having potentially multiple containers per node 
> download resources on their own).
> Proposal:
>   NM to support an additional API where a list of resources can be specified. 
> Something like "localiceResource(ContainerId, Map)
>   NM would also require an additional API to get state for these resources - 
> "getLocalizationState(ContainerId)" - which returns the current state of all 
> local resources for the specified container(s).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6607) YARN Resource Manager quits with the exception java.util.concurrent.RejectedExecutionException:

2017-05-16 Thread Anandhaprabhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anandhaprabhu updated YARN-6607:

Description: 
ResourceManager goes down frequently with the below exception

2017-05-16 03:32:36,897 FATAL event.AsyncDispatcher 
(AsyncDispatcher.java:dispatch(189)) - Error in dispatcher thread
java.util.concurrent.RejectedExecutionException: Task 
java.util.concurrent.FutureTask@9efeac9 rejected from 
java.util.concurrent.ThreadPoolExecutor@42ab30[Shutting down, pool size = 16, 
active threads = 0, queued tasks = 0, completed tasks = 223337]
at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
at 
org.apache.hadoop.registry.server.services.RegistryAdminService.submit(RegistryAdminService.java:176)
at 
org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.purgeRecordsAsync(RMRegistryOperationsService.java:200)
at 
org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.purgeRecordsAsync(RMRegistryOperationsService.java:170)
at 
org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.onContainerFinished(RMRegistryOperationsService.java:146)
at 
org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService.handleAppAttemptEvent(RMRegistryService.java:151)
at 
org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService$AppEventHandler.handle(RMRegistryService.java:183)
at 
org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService$AppEventHandler.handle(RMRegistryService.java:177)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$MultiListenerHandler.handle(AsyncDispatcher.java:276)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
at java.lang.Thread.run(Thread.java:745)
2017-05-16 03:32:36,898 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(524)) - 
EventThread shut down
2017-05-16 03:32:36,898 INFO  zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - 
Session: 0x15b8703e986b750 closed
2017-05-16 03:32:36,898 INFO  capacity.ParentQueue 
(ParentQueue.java:completedContainer(623)) - completedContainer queue=high 
usedCapacity=0.41496983 absoluteUsedCapacity=0.29047886 used= cluster=
2017-05-16 03:32:36,905 INFO  capacity.ParentQueue 
(ParentQueue.java:completedContainer(640)) - Re-sorting completed queue: 
root.high.lawful stats: lawful: capacity=0.3, absoluteCapacity=0.2101, 
usedResources=, usedCapacity=0.16657583, 
absoluteUsedCapacity=0.034980923, numApps=19, numContainers=102
2017-05-16 03:32:36,905 INFO  capacity.ParentQueue 
(ParentQueue.java:completedContainer(623)) - completedContainer queue=root 
usedCapacity=0.41565567 absoluteUsedCapacity=0.41565567 used= cluster=
2017-05-16 03:32:36,906 INFO  capacity.ParentQueue 
(ParentQueue.java:completedContainer(640)) - Re-sorting completed queue: 
root.high stats: high: numChildQueue= 4, capacity=0.7, absoluteCapacity=0.7, 
usedResources=usedCapacity=0.41496983, numApps=61, 
numContainers=847
2017-05-16 03:32:36,906 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:completedContainer(1562)) - Application attempt 
appattempt_1494886223429_7023_01 released container 
container_e43_1494886223429_7023_01_43 on node: host: 
r13d8.hadoop.log10.blackberry:45454 #containers=1 available= used= with event: FINISHED


  was:
Namenode goes down frequently with the below exception

2017-05-16 03:32:36,897 FATAL event.AsyncDispatcher 
(AsyncDispatcher.java:dispatch(189)) - Error in dispatcher thread
java.util.concurrent.RejectedExecutionException: Task 
java.util.concurrent.FutureTask@9efeac9 rejected from 
java.util.concurrent.ThreadPoolExecutor@42ab30[Shutting down, pool size = 16, 
active threads = 0, queued tasks = 0, completed tasks = 223337]
at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
at 

[jira] [Updated] (YARN-6607) YARN Resource Manager quits with the exception java.util.concurrent.RejectedExecutionException:

2017-05-16 Thread Anandhaprabhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anandhaprabhu updated YARN-6607:

Description: 
Namenode goes down frequently with the below exception

2017-05-16 03:32:36,897 FATAL event.AsyncDispatcher 
(AsyncDispatcher.java:dispatch(189)) - Error in dispatcher thread
java.util.concurrent.RejectedExecutionException: Task 
java.util.concurrent.FutureTask@9efeac9 rejected from 
java.util.concurrent.ThreadPoolExecutor@42ab30[Shutting down, pool size = 16, 
active threads = 0, queued tasks = 0, completed tasks = 223337]
at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
at 
org.apache.hadoop.registry.server.services.RegistryAdminService.submit(RegistryAdminService.java:176)
at 
org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.purgeRecordsAsync(RMRegistryOperationsService.java:200)
at 
org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.purgeRecordsAsync(RMRegistryOperationsService.java:170)
at 
org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.onContainerFinished(RMRegistryOperationsService.java:146)
at 
org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService.handleAppAttemptEvent(RMRegistryService.java:151)
at 
org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService$AppEventHandler.handle(RMRegistryService.java:183)
at 
org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService$AppEventHandler.handle(RMRegistryService.java:177)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$MultiListenerHandler.handle(AsyncDispatcher.java:276)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
at java.lang.Thread.run(Thread.java:745)
2017-05-16 03:32:36,898 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(524)) - 
EventThread shut down
2017-05-16 03:32:36,898 INFO  zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - 
Session: 0x15b8703e986b750 closed
2017-05-16 03:32:36,898 INFO  capacity.ParentQueue 
(ParentQueue.java:completedContainer(623)) - completedContainer queue=high 
usedCapacity=0.41496983 absoluteUsedCapacity=0.29047886 used= cluster=
2017-05-16 03:32:36,905 INFO  capacity.ParentQueue 
(ParentQueue.java:completedContainer(640)) - Re-sorting completed queue: 
root.high.lawful stats: lawful: capacity=0.3, absoluteCapacity=0.2101, 
usedResources=, usedCapacity=0.16657583, 
absoluteUsedCapacity=0.034980923, numApps=19, numContainers=102
2017-05-16 03:32:36,905 INFO  capacity.ParentQueue 
(ParentQueue.java:completedContainer(623)) - completedContainer queue=root 
usedCapacity=0.41565567 absoluteUsedCapacity=0.41565567 used= cluster=
2017-05-16 03:32:36,906 INFO  capacity.ParentQueue 
(ParentQueue.java:completedContainer(640)) - Re-sorting completed queue: 
root.high stats: high: numChildQueue= 4, capacity=0.7, absoluteCapacity=0.7, 
usedResources=usedCapacity=0.41496983, numApps=61, 
numContainers=847
2017-05-16 03:32:36,906 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:completedContainer(1562)) - Application attempt 
appattempt_1494886223429_7023_01 released container 
container_e43_1494886223429_7023_01_43 on node: host: 
r13d8.hadoop.log10.blackberry:45454 #containers=1 available= used= with event: FINISHED


  was:
2017-05-16 03:32:36,897 FATAL event.AsyncDispatcher 
(AsyncDispatcher.java:dispatch(189)) - Error in dispatcher thread
java.util.concurrent.RejectedExecutionException: Task 
java.util.concurrent.FutureTask@9efeac9 rejected from 
java.util.concurrent.ThreadPoolExecutor@42ab30[Shutting down, pool size = 16, 
active threads = 0, queued tasks = 0, completed tasks = 223337]
at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
at 
org.apache.hadoop.registry.server.services.RegistryAdminService.submit(RegistryAdminService.java:176)
at 

[jira] [Created] (YARN-6607) YARN Resource Manager quits with the exception java.util.concurrent.RejectedExecutionException:

2017-05-16 Thread Anandhaprabhu (JIRA)
Anandhaprabhu created YARN-6607:
---

 Summary: YARN Resource Manager quits with the exception 
java.util.concurrent.RejectedExecutionException: 
 Key: YARN-6607
 URL: https://issues.apache.org/jira/browse/YARN-6607
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Anandhaprabhu


2017-05-16 03:32:36,897 FATAL event.AsyncDispatcher 
(AsyncDispatcher.java:dispatch(189)) - Error in dispatcher thread
java.util.concurrent.RejectedExecutionException: Task 
java.util.concurrent.FutureTask@9efeac9 rejected from 
java.util.concurrent.ThreadPoolExecutor@42ab30[Shutting down, pool size = 16, 
active threads = 0, queued tasks = 0, completed tasks = 223337]
at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
at 
org.apache.hadoop.registry.server.services.RegistryAdminService.submit(RegistryAdminService.java:176)
at 
org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.purgeRecordsAsync(RMRegistryOperationsService.java:200)
at 
org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.purgeRecordsAsync(RMRegistryOperationsService.java:170)
at 
org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.onContainerFinished(RMRegistryOperationsService.java:146)
at 
org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService.handleAppAttemptEvent(RMRegistryService.java:151)
at 
org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService$AppEventHandler.handle(RMRegistryService.java:183)
at 
org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService$AppEventHandler.handle(RMRegistryService.java:177)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$MultiListenerHandler.handle(AsyncDispatcher.java:276)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
at java.lang.Thread.run(Thread.java:745)
2017-05-16 03:32:36,898 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(524)) - 
EventThread shut down
2017-05-16 03:32:36,898 INFO  zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - 
Session: 0x15b8703e986b750 closed
2017-05-16 03:32:36,898 INFO  capacity.ParentQueue 
(ParentQueue.java:completedContainer(623)) - completedContainer queue=high 
usedCapacity=0.41496983 absoluteUsedCapacity=0.29047886 used= cluster=
2017-05-16 03:32:36,905 INFO  capacity.ParentQueue 
(ParentQueue.java:completedContainer(640)) - Re-sorting completed queue: 
root.high.lawful stats: lawful: capacity=0.3, absoluteCapacity=0.2101, 
usedResources=, usedCapacity=0.16657583, 
absoluteUsedCapacity=0.034980923, numApps=19, numContainers=102
2017-05-16 03:32:36,905 INFO  capacity.ParentQueue 
(ParentQueue.java:completedContainer(623)) - completedContainer queue=root 
usedCapacity=0.41565567 absoluteUsedCapacity=0.41565567 used= cluster=
2017-05-16 03:32:36,906 INFO  capacity.ParentQueue 
(ParentQueue.java:completedContainer(640)) - Re-sorting completed queue: 
root.high stats: high: numChildQueue= 4, capacity=0.7, absoluteCapacity=0.7, 
usedResources=usedCapacity=0.41496983, numApps=61, 
numContainers=847
2017-05-16 03:32:36,906 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:completedContainer(1562)) - Application attempt 
appattempt_1494886223429_7023_01 released container 
container_e43_1494886223429_7023_01_43 on node: host: 
r13d8.hadoop.log10.blackberry:45454 #containers=1 available= used= with event: FINISHED




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6606) The implementation of LocalizationStatus in ContainerStatusProto

2017-05-16 Thread Bingxue Qiu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bingxue Qiu updated YARN-6606:
--
Attachment: YARN-6606.1.patch

add the YARN-6606.1.patch

> The implementation of LocalizationStatus in ContainerStatusProto
> 
>
> Key: YARN-6606
> URL: https://issues.apache.org/jira/browse/YARN-6606
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Bingxue Qiu
> Fix For: 2.9.0
>
> Attachments: YARN-6606.1.patch
>
>
> we have a use case, where the full implementation of localization status in 
> ContainerStatusProto 
> [Continuous-resource-localization|https://issues.apache.org/jira/secure/attachment/12825041/Continuous-resource-localization.pdf]
>need to be done , so we make it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6606) The implementation of LocalizationStatus in ContainerStatusProto

2017-05-16 Thread Bingxue Qiu (JIRA)
Bingxue Qiu created YARN-6606:
-

 Summary: The implementation of LocalizationStatus in 
ContainerStatusProto
 Key: YARN-6606
 URL: https://issues.apache.org/jira/browse/YARN-6606
 Project: Hadoop YARN
  Issue Type: Task
  Components: nodemanager
Affects Versions: 2.9.0
Reporter: Bingxue Qiu


we have a use case, where the full implementation of localization status in 
ContainerStatusProto 
[Continuous-resource-localization|https://issues.apache.org/jira/secure/attachment/12825041/Continuous-resource-localization.pdf]
   need to be done , so we make it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-6605) dafasfass

2017-05-16 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S reopened YARN-6605:
-

> dafasfass
> -
>
> Key: YARN-6605
> URL: https://issues.apache.org/jira/browse/YARN-6605
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: wuchang
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6605) dafasfass

2017-05-16 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S resolved YARN-6605.
-
Resolution: Invalid

Closed the issue as invalid.

> dafasfass
> -
>
> Key: YARN-6605
> URL: https://issues.apache.org/jira/browse/YARN-6605
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: wuchang
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6601) Allow service to be started as System Services during serviceapi start up

2017-05-16 Thread Lokesh Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16011816#comment-16011816
 ] 

Lokesh Jain commented on YARN-6601:
---

As per offline discussion with [~rohithsharma] we have done a POC for this. 
Please give contributor permission to attach POC patch.

> Allow service to be started as System Services during serviceapi start up
> -
>
> Key: YARN-6601
> URL: https://issues.apache.org/jira/browse/YARN-6601
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
> Attachments: SystemServices.pdf
>
>
> This is extended from YARN-1593 focusing only on system services. This 
> particular JIRA focusing on starting the system services during 
> native-service-api start up. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6605) dafasfass

2017-05-16 Thread wuchang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuchang updated YARN-6605:
--
Description: (was: {code}
[appuser@hz-192-168-80-146 kafka]$ bin/kafka-topics.sh --describe --topic 
DruidAppColdStartMsg --zookeeper 10.120.241.50:2181
Topic:DruidAppColdStartMsg  PartitionCount:2ReplicationFactor:2 
Configs:
Topic: DruidAppColdStartMsg Partition: 0Leader: 82  
Replicas: 82,146Isr: 82
Topic: DruidAppColdStartMsg Partition: 1Leader: 110 
Replicas: 110,82Isr: 110,82
{code}


{code}
   

   10 mb, 30 vcores
   50 mb, 100 vcores
   0.35
   20
   25
   0.8


   25000 mb, 20 vcores
   225000 mb, 70 vcores
   0.14
   20
   25
   0.5
   -1.0f


   20 mb, 30 vcores
   60 mb, 100 vcores
   0.42
   20
   25
   0.8
   -1.0f


   5 mb, 20 vcores
   12 mb, 30 vcores
   0.09
   20
   25
   0.8
   -1.0f
 

{code})

> dafasfass
> -
>
> Key: YARN-6605
> URL: https://issues.apache.org/jira/browse/YARN-6605
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: wuchang
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6605) dafasfass

2017-05-16 Thread wuchang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuchang resolved YARN-6605.
---
Resolution: Fixed

> dafasfass
> -
>
> Key: YARN-6605
> URL: https://issues.apache.org/jira/browse/YARN-6605
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: wuchang
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6605) dafasfass

2017-05-16 Thread wuchang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuchang updated YARN-6605:
--
Description: 
{code}
[appuser@hz-192-168-80-146 kafka]$ bin/kafka-topics.sh --describe --topic 
DruidAppColdStartMsg --zookeeper 10.120.241.50:2181
Topic:DruidAppColdStartMsg  PartitionCount:2ReplicationFactor:2 
Configs:
Topic: DruidAppColdStartMsg Partition: 0Leader: 82  
Replicas: 82,146Isr: 82
Topic: DruidAppColdStartMsg Partition: 1Leader: 110 
Replicas: 110,82Isr: 110,82
{code}


{code}
   

   10 mb, 30 vcores
   50 mb, 100 vcores
   0.35
   20
   25
   0.8


   25000 mb, 20 vcores
   225000 mb, 70 vcores
   0.14
   20
   25
   0.5
   -1.0f


   20 mb, 30 vcores
   60 mb, 100 vcores
   0.42
   20
   25
   0.8
   -1.0f


   5 mb, 20 vcores
   12 mb, 30 vcores
   0.09
   20
   25
   0.8
   -1.0f
 

{code}

  was:
{code}
[appuser@hz-192-168-80-146 kafka]$ bin/kafka-topics.sh --describe --topic 
DruidAppColdStartMsg --zookeeper 10.120.241.50:2181
Topic:DruidAppColdStartMsg  PartitionCount:2ReplicationFactor:2 
Configs:
Topic: DruidAppColdStartMsg Partition: 0Leader: 82  
Replicas: 82,146Isr: 82,146
Topic: DruidAppColdStartMsg Partition: 1Leader: 110 
Replicas: 110,82Isr: 110,82
{code}


{code}
   

   10 mb, 30 vcores
   50 mb, 100 vcores
   0.35
   20
   25
   0.8


   25000 mb, 20 vcores
   225000 mb, 70 vcores
   0.14
   20
   25
   0.5
   -1.0f


   20 mb, 30 vcores
   60 mb, 100 vcores
   0.42
   20
   25
   0.8
   -1.0f


   5 mb, 20 vcores
   12 mb, 30 vcores
   0.09
   20
   25
   0.8
   -1.0f
 

{code}


> dafasfass
> -
>
> Key: YARN-6605
> URL: https://issues.apache.org/jira/browse/YARN-6605
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: wuchang
>
> {code}
> [appuser@hz-192-168-80-146 kafka]$ bin/kafka-topics.sh --describe --topic 
> DruidAppColdStartMsg --zookeeper 10.120.241.50:2181
> Topic:DruidAppColdStartMsg  PartitionCount:2ReplicationFactor:2   
>   Configs:
> Topic: DruidAppColdStartMsg Partition: 0Leader: 82  
> Replicas: 82,146Isr: 82
> Topic: DruidAppColdStartMsg Partition: 1Leader: 110 
> Replicas: 110,82Isr: 110,82
> {code}
> {code}
>
> 
>10 mb, 30 vcores
>50 mb, 100 vcores
>0.35
>20
>25
>0.8
> 
> 
>25000 mb, 20 vcores
>225000 mb, 70 vcores
>0.14
>20
>25
>0.5
>-1.0f
> 
> 
>20 mb, 30 vcores
>60 mb, 100 vcores
>0.42
>20
>25
>0.8
>-1.0f
> 
> 
>5 mb, 20 vcores
>12 mb, 30 vcores
>0.09
>20
>25
>0.8
>-1.0f
>  
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6605) dafasfass

2017-05-16 Thread wuchang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuchang updated YARN-6605:
--
Description: 
{code}
[appuser@hz-192-168-80-146 kafka]$ bin/kafka-topics.sh --describe --topic 
DruidAppColdStartMsg --zookeeper 10.120.241.50:2181
Topic:DruidAppColdStartMsg  PartitionCount:2ReplicationFactor:2 
Configs:
Topic: DruidAppColdStartMsg Partition: 0Leader: 82  
Replicas: 82,146Isr: 82,146
Topic: DruidAppColdStartMsg Partition: 1Leader: 110 
Replicas: 110,82Isr: 110,82
{code}


{code}
   

   10 mb, 30 vcores
   50 mb, 100 vcores
   0.35
   20
   25
   0.8


   25000 mb, 20 vcores
   225000 mb, 70 vcores
   0.14
   20
   25
   0.5
   -1.0f


   20 mb, 30 vcores
   60 mb, 100 vcores
   0.42
   20
   25
   0.8
   -1.0f


   5 mb, 20 vcores
   12 mb, 30 vcores
   0.09
   20
   25
   0.8
   -1.0f
 

{code}

  was:
{code}


   10 mb, 30 vcores
   25 mb, 100 vcores


   5 mb, 20 vcores
   10 mb, 50 vcores
   -1.0f


   10 mb, 30 vcores
   30 mb, 100 vcores
   -1.0f


   3 mb, 20 vcores
   6 mb, 50 vcores
   -1.0f
 
  300

{code}


{code}
   

   10 mb, 30 vcores
   50 mb, 100 vcores
   0.35
   20
   25
   0.8


   25000 mb, 20 vcores
   225000 mb, 70 vcores
   0.14
   20
   25
   0.5
   -1.0f


   20 mb, 30 vcores
   60 mb, 100 vcores
   0.42
   20
   25
   0.8
   -1.0f


   5 mb, 20 vcores
   12 mb, 30 vcores
   0.09
   20
   25
   0.8
   -1.0f
 

{code}


> dafasfass
> -
>
> Key: YARN-6605
> URL: https://issues.apache.org/jira/browse/YARN-6605
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: wuchang
>
> {code}
> [appuser@hz-192-168-80-146 kafka]$ bin/kafka-topics.sh --describe --topic 
> DruidAppColdStartMsg --zookeeper 10.120.241.50:2181
> Topic:DruidAppColdStartMsg  PartitionCount:2ReplicationFactor:2   
>   Configs:
> Topic: DruidAppColdStartMsg Partition: 0Leader: 82  
> Replicas: 82,146Isr: 82,146
> Topic: DruidAppColdStartMsg Partition: 1Leader: 110 
> Replicas: 110,82Isr: 110,82
> {code}
> {code}
>
> 
>10 mb, 30 vcores
>50 mb, 100 vcores
>0.35
>20
>25
>0.8
> 
> 
>25000 mb, 20 vcores
>225000 mb, 70 vcores
>0.14
>20
>25
>0.5
>-1.0f
> 
> 
>20 mb, 30 vcores
>60 mb, 100 vcores
>0.42
>20
>25
>0.8
>-1.0f
> 
> 
>5 mb, 20 vcores
>12 mb, 30 vcores
>0.09
>20
>25
>0.8
>-1.0f
>  
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org