[jira] [Commented] (YARN-2161) Fix build on macosx: YARN parts

2014-10-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178004#comment-14178004
 ] 

Allen Wittenauer commented on YARN-2161:


I've re-opened YARN-2701.

 Fix build on macosx: YARN parts
 ---

 Key: YARN-2161
 URL: https://issues.apache.org/jira/browse/YARN-2161
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Fix For: 2.6.0

 Attachments: YARN-2161.v1.patch, YARN-2161.v2.patch


 When compiling on macosx with -Pnative, there are several warning and errors, 
 fix this would help hadoop developers with macosx env. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor

2014-10-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178007#comment-14178007
 ] 

Allen Wittenauer commented on YARN-2701:


FWIW, I've re-opened this JIRA because I'm -1 on the suggested code fix.  It 
was clearly done for patch expediency reasons rather than a proper fix.

 Potential race condition in startLocalizer when using LinuxContainerExecutor  
 --

 Key: YARN-2701
 URL: https://issues.apache.org/jira/browse/YARN-2701
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
 YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch


 When using LinuxContainerExecutor do startLocalizer, we are using native code 
 container-executor.c. 
 {code}
  if (stat(npath, sb) != 0) {
if (mkdir(npath, perm) != 0) {
 {code}
 We are using check and create method to create the appDir under /usercache. 
 But if there are two containers trying to do this at the same time, race 
 condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2707) Potential null dereference in FSDownload

2014-10-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178026#comment-14178026
 ] 

Hadoop QA commented on YARN-2707:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676034/YARN-2707.v01.patch
  against trunk revision 171f237.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5479//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5479//console

This message is automatically generated.

 Potential null dereference in FSDownload
 

 Key: YARN-2707
 URL: https://issues.apache.org/jira/browse/YARN-2707
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Gera Shegalov
Priority: Minor
 Attachments: YARN-2707.v01.patch


 Here is related code in call():
 {code}
   Pattern pattern = null;
   String p = resource.getPattern();
   if (p != null) {
 pattern = Pattern.compile(p);
   }
   unpack(new File(dTmp.toUri()), new File(dFinal.toUri()), pattern);
 {code}
 In unpack():
 {code}
 RunJar.unJar(localrsrc, dst, pattern);
 {code}
 unJar() would dereference the pattern without checking whether it is null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-10-21 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-2198:
---
Attachment: YARN-2198.15.patch

.15.patch rebased to current trunk, .gitignore conflict resolved

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: .YARN-2198.delta.10.patch, YARN-2198.1.patch, 
 YARN-2198.11.patch, YARN-2198.12.patch, YARN-2198.13.patch, 
 YARN-2198.14.patch, YARN-2198.15.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
 YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, 
 YARN-2198.delta.7.patch, YARN-2198.separation.patch, 
 YARN-2198.trunk.10.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, 
 YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch, YARN-2198.trunk.9.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires the process launching the container to be LocalSystem or a 
 member of the a local Administrators group. Since the process in question is 
 the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-10-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178057#comment-14178057
 ] 

Hadoop QA commented on YARN-2198:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676038/YARN-2198.15.patch
  against trunk revision 171f237.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5480//console

This message is automatically generated.

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: .YARN-2198.delta.10.patch, YARN-2198.1.patch, 
 YARN-2198.11.patch, YARN-2198.12.patch, YARN-2198.13.patch, 
 YARN-2198.14.patch, YARN-2198.15.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
 YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, 
 YARN-2198.delta.7.patch, YARN-2198.separation.patch, 
 YARN-2198.trunk.10.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, 
 YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch, YARN-2198.trunk.9.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires the process launching the container to be LocalSystem or a 
 member of the a local Administrators group. Since the process in question is 
 the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2279) Add UTs to cover timeline server authentication

2014-10-21 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-2279:
-

Assignee: Zhijie Shen

 Add UTs to cover timeline server authentication
 ---

 Key: YARN-2279
 URL: https://issues.apache.org/jira/browse/YARN-2279
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
  Labels: test
 Attachments: YARN-2279.1.patch


 Currently, timeline server authentication is lacking unit tests. We have to 
 verify each incremental patch manually. It's good to add some unit tests here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2279) Add UTs to cover timeline server authentication

2014-10-21 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2279:
--
Attachment: YARN-2279.1.patch

After refactoring the authentication code. The end-to-end test has already been 
added. Use this Jira to attach patch to enhance the test coverage:

1. Verify put domain API
2. Verify encrypted channel.

 Add UTs to cover timeline server authentication
 ---

 Key: YARN-2279
 URL: https://issues.apache.org/jira/browse/YARN-2279
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
  Labels: test
 Attachments: YARN-2279.1.patch


 Currently, timeline server authentication is lacking unit tests. We have to 
 verify each incremental patch manually. It's good to add some unit tests here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-10-21 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-2198:
---
Attachment: (was: YARN-2198.15.patch)

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: .YARN-2198.delta.10.patch, YARN-2198.1.patch, 
 YARN-2198.11.patch, YARN-2198.12.patch, YARN-2198.13.patch, 
 YARN-2198.14.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
 YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, 
 YARN-2198.delta.7.patch, YARN-2198.separation.patch, 
 YARN-2198.trunk.10.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, 
 YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch, YARN-2198.trunk.9.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires the process launching the container to be LocalSystem or a 
 member of the a local Administrators group. Since the process in question is 
 the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-10-21 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-2198:
---
Attachment: YARN-2198.15.patch

Reload .15.patch with TestLCE fix

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: .YARN-2198.delta.10.patch, YARN-2198.1.patch, 
 YARN-2198.11.patch, YARN-2198.12.patch, YARN-2198.13.patch, 
 YARN-2198.14.patch, YARN-2198.15.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
 YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, 
 YARN-2198.delta.7.patch, YARN-2198.separation.patch, 
 YARN-2198.trunk.10.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, 
 YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch, YARN-2198.trunk.9.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires the process launching the container to be LocalSystem or a 
 member of the a local Administrators group. Since the process in question is 
 the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2279) Add UTs to cover timeline server authentication

2014-10-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178099#comment-14178099
 ] 

Hadoop QA commented on YARN-2279:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676042/YARN-2279.1.patch
  against trunk revision 171f237.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5481//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5481//console

This message is automatically generated.

 Add UTs to cover timeline server authentication
 ---

 Key: YARN-2279
 URL: https://issues.apache.org/jira/browse/YARN-2279
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
  Labels: test
 Attachments: YARN-2279.1.patch


 Currently, timeline server authentication is lacking unit tests. We have to 
 verify each incremental patch manually. It's good to add some unit tests here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-10-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178125#comment-14178125
 ] 

Hadoop QA commented on YARN-2198:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676048/YARN-2198.15.patch
  against trunk revision 171f237.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5482//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5482//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5482//console

This message is automatically generated.

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: .YARN-2198.delta.10.patch, YARN-2198.1.patch, 
 YARN-2198.11.patch, YARN-2198.12.patch, YARN-2198.13.patch, 
 YARN-2198.14.patch, YARN-2198.15.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
 YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, 
 YARN-2198.delta.7.patch, YARN-2198.separation.patch, 
 YARN-2198.trunk.10.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, 
 YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch, YARN-2198.trunk.9.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires the process launching the container to be LocalSystem or a 
 member of the a local Administrators group. Since the process in question is 
 the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-10-21 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178218#comment-14178218
 ] 

Remus Rusanu commented on YARN-2198:


The 2 new hadoop-common Findbugs are unrelated to the patch:

 - Inconsistent synchronization of 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.delegationTokenSequenceNumber;
 locked 71% of time
 - Dereference of the result of readLine() without nullcheck in 
org.apache.hadoop.tracing.SpanReceiverHost.getUniqueLocalTraceFileName()

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: .YARN-2198.delta.10.patch, YARN-2198.1.patch, 
 YARN-2198.11.patch, YARN-2198.12.patch, YARN-2198.13.patch, 
 YARN-2198.14.patch, YARN-2198.15.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
 YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, 
 YARN-2198.delta.7.patch, YARN-2198.separation.patch, 
 YARN-2198.trunk.10.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, 
 YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch, YARN-2198.trunk.9.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires the process launching the container to be LocalSystem or a 
 member of the a local Administrators group. Since the process in question is 
 the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS

2014-10-21 Thread cntic (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cntic updated YARN-2681:

Affects Version/s: (was: 2.4.0)
   2.5.1

 Support bandwidth enforcement for containers while reading from HDFS
 

 Key: YARN-2681
 URL: https://issues.apache.org/jira/browse/YARN-2681
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: capacityscheduler, nodemanager, resourcemanager
Affects Versions: 2.5.1
 Environment: Linux
Reporter: cntic
 Attachments: Traffic Control Design.png


 To read/write data from HDFS on data node, applications establise TCP/IP 
 connections with the datanode. The HDFS read can be controled by setting 
 Linux Traffic Control  (TC) subsystem on the data node to make filters on 
 appropriate connections.
 The current cgroups net_cls concept can not be applied on the node where the 
 container is launched, netheir on data node since:
 -   TC hanldes outgoing bandwidth only, so it can be set on container node 
 (HDFS read = incoming data for the container)
 -   Since HDFS data node is handled by only one process,  it is not possible 
 to use net_cls to separate connections from different containers to the 
 datanode.
 Tasks:
 1) Extend Resource model to define bandwidth enforcement rate
 2) Monitor TCP/IP connection estabilised by container handling process and 
 its child processes
 3) Set Linux Traffic Control rules on data node base on address:port pairs in 
 order to enforce bandwidth of outgoing data



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS

2014-10-21 Thread cntic (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cntic updated YARN-2681:

Attachment: HADOOP-2681.patch

 Support bandwidth enforcement for containers while reading from HDFS
 

 Key: YARN-2681
 URL: https://issues.apache.org/jira/browse/YARN-2681
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: capacityscheduler, nodemanager, resourcemanager
Affects Versions: 2.5.1
 Environment: Linux
Reporter: cntic
 Attachments: HADOOP-2681.patch, Traffic Control Design.png


 To read/write data from HDFS on data node, applications establise TCP/IP 
 connections with the datanode. The HDFS read can be controled by setting 
 Linux Traffic Control  (TC) subsystem on the data node to make filters on 
 appropriate connections.
 The current cgroups net_cls concept can not be applied on the node where the 
 container is launched, netheir on data node since:
 -   TC hanldes outgoing bandwidth only, so it can be set on container node 
 (HDFS read = incoming data for the container)
 -   Since HDFS data node is handled by only one process,  it is not possible 
 to use net_cls to separate connections from different containers to the 
 datanode.
 Tasks:
 1) Extend Resource model to define bandwidth enforcement rate
 2) Monitor TCP/IP connection estabilised by container handling process and 
 its child processes
 3) Set Linux Traffic Control rules on data node base on address:port pairs in 
 order to enforce bandwidth of outgoing data



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2673) Add retry for timeline client put APIs

2014-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178309#comment-14178309
 ] 

Hudson commented on YARN-2673:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #719 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/719/])
YARN-2673. Made timeline client put APIs retry if ConnectException happens. 
Contributed by Li Lu. (zjshen: rev 89427419a3c5eaab0f73bae98d675979b9efab5f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt


 Add retry for timeline client put APIs
 --

 Key: YARN-2673
 URL: https://issues.apache.org/jira/browse/YARN-2673
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Fix For: 2.6.0

 Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
 YARN-2673-101414.patch, YARN-2673-101714.patch, YARN-2673-102014.patch


 Timeline client now does not handle the case gracefully when the server is 
 down. Jobs from distributed shell may fail due to ATS restart. We may need to 
 add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor

2014-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178305#comment-14178305
 ] 

Hudson commented on YARN-2701:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #719 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/719/])
YARN-2701. Potential race condition in startLocalizer when using 
LinuxContainerExecutor. Contributed by Xuan Gong (jianhe: rev 
2839365f230165222f63129979ea82ada79ec56e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
Missing file for YARN-2701 (jianhe: rev 
4fa1fb3193bf39fcb1bd7f8f8391a78f69c3c302)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockContainerLocalizer.java


 Potential race condition in startLocalizer when using LinuxContainerExecutor  
 --

 Key: YARN-2701
 URL: https://issues.apache.org/jira/browse/YARN-2701
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
 YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch


 When using LinuxContainerExecutor do startLocalizer, we are using native code 
 container-executor.c. 
 {code}
  if (stat(npath, sb) != 0) {
if (mkdir(npath, perm) != 0) {
 {code}
 We are using check and create method to create the appDir under /usercache. 
 But if there are two containers trying to do this at the same time, race 
 condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2717) containerLogNotFound log shows multiple time for the same container

2014-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178310#comment-14178310
 ] 

Hudson commented on YARN-2717:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #719 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/719/])
YARN-2717. Avoided duplicate logging when container logs are not found. 
Contributed by Xuan Gong. (zjshen: rev 171f2376d23d51b61b9c9b3804ee86dbd4de033a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java
* hadoop-yarn-project/CHANGES.txt


 containerLogNotFound log shows multiple time for the same container
 ---

 Key: YARN-2717
 URL: https://issues.apache.org/jira/browse/YARN-2717
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2717.1.patch


 containerLogNotFound is called multiple times when the container log for the 
 same container does not exist



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over

2014-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178303#comment-14178303
 ] 

Hudson commented on YARN-1879:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #719 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/719/])
Missing file for YARN-1879 (jianhe: rev 
4a78a752286effbf1a0d8695325f9d7464a09fb4)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java


 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
 fail over
 

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Fix For: 2.6.0

 Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
 YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
 YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
 YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
 YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
 YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.23.patch, 
 YARN-1879.23.patch, YARN-1879.24.patch, YARN-1879.25.patch, 
 YARN-1879.26.patch, YARN-1879.27.patch, YARN-1879.28.patch, 
 YARN-1879.29.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, 
 YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS

2014-10-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178322#comment-14178322
 ] 

Hadoop QA commented on YARN-2681:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676069/HADOOP-2681.patch
  against trunk revision 171f237.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 20 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5483//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5483//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5483//console

This message is automatically generated.

 Support bandwidth enforcement for containers while reading from HDFS
 

 Key: YARN-2681
 URL: https://issues.apache.org/jira/browse/YARN-2681
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: capacityscheduler, nodemanager, resourcemanager
Affects Versions: 2.5.1
 Environment: Linux
Reporter: cntic
 Attachments: HADOOP-2681.patch, Traffic Control Design.png


 To read/write data from HDFS on data node, applications establise TCP/IP 
 connections with the datanode. The HDFS read can be controled by setting 
 Linux Traffic Control  (TC) subsystem on the data node to make filters on 
 appropriate connections.
 The current cgroups net_cls concept can not be applied on the node where the 
 container is launched, netheir on data node since:
 -   TC hanldes outgoing bandwidth only, so it can be set on container node 
 (HDFS read = incoming data for the container)
 -   Since HDFS data node is handled by only one process,  it is not possible 
 to use net_cls to separate connections from different containers to the 
 datanode.
 Tasks:
 1) Extend Resource model to define bandwidth enforcement rate
 2) Monitor TCP/IP connection estabilised by container handling process and 
 its child processes
 3) Set Linux Traffic Control rules on data node base on address:port pairs in 
 order to enforce bandwidth of outgoing data



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over

2014-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178422#comment-14178422
 ] 

Hudson commented on YARN-1879:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1908 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1908/])
Missing file for YARN-1879 (jianhe: rev 
4a78a752286effbf1a0d8695325f9d7464a09fb4)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java


 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
 fail over
 

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Fix For: 2.6.0

 Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
 YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
 YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
 YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
 YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
 YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.23.patch, 
 YARN-1879.23.patch, YARN-1879.24.patch, YARN-1879.25.patch, 
 YARN-1879.26.patch, YARN-1879.27.patch, YARN-1879.28.patch, 
 YARN-1879.29.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, 
 YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor

2014-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178424#comment-14178424
 ] 

Hudson commented on YARN-2701:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1908 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1908/])
YARN-2701. Potential race condition in startLocalizer when using 
LinuxContainerExecutor. Contributed by Xuan Gong (jianhe: rev 
2839365f230165222f63129979ea82ada79ec56e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
Missing file for YARN-2701 (jianhe: rev 
4fa1fb3193bf39fcb1bd7f8f8391a78f69c3c302)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockContainerLocalizer.java


 Potential race condition in startLocalizer when using LinuxContainerExecutor  
 --

 Key: YARN-2701
 URL: https://issues.apache.org/jira/browse/YARN-2701
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
 YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch


 When using LinuxContainerExecutor do startLocalizer, we are using native code 
 container-executor.c. 
 {code}
  if (stat(npath, sb) != 0) {
if (mkdir(npath, perm) != 0) {
 {code}
 We are using check and create method to create the appDir under /usercache. 
 But if there are two containers trying to do this at the same time, race 
 condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2717) containerLogNotFound log shows multiple time for the same container

2014-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178429#comment-14178429
 ] 

Hudson commented on YARN-2717:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1908 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1908/])
YARN-2717. Avoided duplicate logging when container logs are not found. 
Contributed by Xuan Gong. (zjshen: rev 171f2376d23d51b61b9c9b3804ee86dbd4de033a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java
* hadoop-yarn-project/CHANGES.txt


 containerLogNotFound log shows multiple time for the same container
 ---

 Key: YARN-2717
 URL: https://issues.apache.org/jira/browse/YARN-2717
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2717.1.patch


 containerLogNotFound is called multiple times when the container log for the 
 same container does not exist



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2582) Log related CLI and Web UI changes for Aggregated Logs in LRS

2014-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178427#comment-14178427
 ] 

Hudson commented on YARN-2582:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1908 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1908/])
YARN-2582. Fixed Log CLI and Web UI for showing aggregated logs of LRS. 
Contributed Xuan Gong. (zjshen: rev e90718fa5a0e7c18592af61534668acebb9db51b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogAggregationUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/log/AggregatedLogsBlock.java


 Log related CLI and Web UI changes for Aggregated Logs in LRS
 -

 Key: YARN-2582
 URL: https://issues.apache.org/jira/browse/YARN-2582
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2582.1.patch, YARN-2582.2.patch, YARN-2582.3.patch


 After YARN-2468, we have change the log layout to support log aggregation for 
 Long Running Service. Log CLI and related Web UI should be modified 
 accordingly.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2673) Add retry for timeline client put APIs

2014-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178428#comment-14178428
 ] 

Hudson commented on YARN-2673:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1908 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1908/])
YARN-2673. Made timeline client put APIs retry if ConnectException happens. 
Contributed by Li Lu. (zjshen: rev 89427419a3c5eaab0f73bae98d675979b9efab5f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java


 Add retry for timeline client put APIs
 --

 Key: YARN-2673
 URL: https://issues.apache.org/jira/browse/YARN-2673
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Fix For: 2.6.0

 Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
 YARN-2673-101414.patch, YARN-2673-101714.patch, YARN-2673-102014.patch


 Timeline client now does not handle the case gracefully when the server is 
 down. Jobs from distributed shell may fail due to ATS restart. We may need to 
 add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor

2014-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178494#comment-14178494
 ] 

Hudson commented on YARN-2701:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1933 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1933/])
YARN-2701. Potential race condition in startLocalizer when using 
LinuxContainerExecutor. Contributed by Xuan Gong (jianhe: rev 
2839365f230165222f63129979ea82ada79ec56e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
* hadoop-yarn-project/CHANGES.txt
Missing file for YARN-2701 (jianhe: rev 
4fa1fb3193bf39fcb1bd7f8f8391a78f69c3c302)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockContainerLocalizer.java


 Potential race condition in startLocalizer when using LinuxContainerExecutor  
 --

 Key: YARN-2701
 URL: https://issues.apache.org/jira/browse/YARN-2701
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
 YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch


 When using LinuxContainerExecutor do startLocalizer, we are using native code 
 container-executor.c. 
 {code}
  if (stat(npath, sb) != 0) {
if (mkdir(npath, perm) != 0) {
 {code}
 We are using check and create method to create the appDir under /usercache. 
 But if there are two containers trying to do this at the same time, race 
 condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2582) Log related CLI and Web UI changes for Aggregated Logs in LRS

2014-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178497#comment-14178497
 ] 

Hudson commented on YARN-2582:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1933 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1933/])
YARN-2582. Fixed Log CLI and Web UI for showing aggregated logs of LRS. 
Contributed Xuan Gong. (zjshen: rev e90718fa5a0e7c18592af61534668acebb9db51b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/log/AggregatedLogsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogAggregationUtils.java
* hadoop-yarn-project/CHANGES.txt


 Log related CLI and Web UI changes for Aggregated Logs in LRS
 -

 Key: YARN-2582
 URL: https://issues.apache.org/jira/browse/YARN-2582
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2582.1.patch, YARN-2582.2.patch, YARN-2582.3.patch


 After YARN-2468, we have change the log layout to support log aggregation for 
 Long Running Service. Log CLI and related Web UI should be modified 
 accordingly.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2717) containerLogNotFound log shows multiple time for the same container

2014-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178499#comment-14178499
 ] 

Hudson commented on YARN-2717:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1933 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1933/])
YARN-2717. Avoided duplicate logging when container logs are not found. 
Contributed by Xuan Gong. (zjshen: rev 171f2376d23d51b61b9c9b3804ee86dbd4de033a)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java


 containerLogNotFound log shows multiple time for the same container
 ---

 Key: YARN-2717
 URL: https://issues.apache.org/jira/browse/YARN-2717
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2717.1.patch


 containerLogNotFound is called multiple times when the container log for the 
 same container does not exist



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over

2014-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178492#comment-14178492
 ] 

Hudson commented on YARN-1879:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1933 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1933/])
Missing file for YARN-1879 (jianhe: rev 
4a78a752286effbf1a0d8695325f9d7464a09fb4)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java


 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
 fail over
 

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Fix For: 2.6.0

 Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
 YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
 YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
 YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
 YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
 YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.23.patch, 
 YARN-1879.23.patch, YARN-1879.24.patch, YARN-1879.25.patch, 
 YARN-1879.26.patch, YARN-1879.27.patch, YARN-1879.28.patch, 
 YARN-1879.29.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, 
 YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2719) Windows: Wildcard classpath variables not expanded against resources contained in archives

2014-10-21 Thread Craig Welch (JIRA)
Craig Welch created YARN-2719:
-

 Summary: Windows: Wildcard classpath variables not expanded 
against resources contained in archives
 Key: YARN-2719
 URL: https://issues.apache.org/jira/browse/YARN-2719
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Craig Welch
Assignee: Craig Welch


On windows there are limitations to the length of command lines and environment 
variables which prevent placing all classpath resources into these elements.  
Instead, a jar containing only a classpath manifest is created to provide the 
classpath.  During this process wildcard references are expanded by inspecting 
the filesystem.  Since archives are extracted to a different location and 
linked into the final location after the classpath jar is created, resources 
referred to via wildcards which exist in localized archives  (.zip, tar.gz) are 
not added to the classpath manifest jar.  Since these entries are removed from 
the final classpath for the container they are not on the container's classpath 
as they should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2720) Windows: Wildcard classpath variables not expanded against resources contained in archives

2014-10-21 Thread Craig Welch (JIRA)
Craig Welch created YARN-2720:
-

 Summary: Windows: Wildcard classpath variables not expanded 
against resources contained in archives
 Key: YARN-2720
 URL: https://issues.apache.org/jira/browse/YARN-2720
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Craig Welch
Assignee: Craig Welch


On windows there are limitations to the length of command lines and environment 
variables which prevent placing all classpath resources into these elements.  
Instead, a jar containing only a classpath manifest is created to provide the 
classpath.  During this process wildcard references are expanded by inspecting 
the filesystem.  Since archives are extracted to a different location and 
linked into the final location after the classpath jar is created, resources 
referred to via wildcards which exist in localized archives  (.zip, tar.gz) are 
not added to the classpath manifest jar.  Since these entries are removed from 
the final classpath for the container they are not on the container's classpath 
as they should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7

2014-10-21 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178586#comment-14178586
 ] 

Wei Yan commented on YARN-2194:
---

Thanks your comments, [~beckham007].
bq. startSystemdSlice/stopSystemdSlice needs root privilege?
Yes, systemctl start/stop slice needs root privilege.
bq. Let container-executor to run sudo systemctl start ?
You mean adding start/stop slice function in the container-executor, and let 
SystemdLCEResourceHandler invokes these functions?

 Add Cgroup support for RedHat 7
 ---

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2194-1.patch


 In previous versions of RedHat, we can build custom cgroup hierarchies with 
 use of the cgconfig command from the libcgroup package. From RedHat 7, 
 package libcgroup is deprecated and it is not recommended to use it since it 
 can easily create conflicts with the default cgroup hierarchy. The systemd is 
 provided and recommended for cgroup management. We need to add support for 
 this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2720) Windows: Wildcard classpath variables not expanded against resources contained in archives

2014-10-21 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-2720:

 Component/s: nodemanager
Target Version/s: 2.6.0

 Windows: Wildcard classpath variables not expanded against resources 
 contained in archives
 --

 Key: YARN-2720
 URL: https://issues.apache.org/jira/browse/YARN-2720
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Craig Welch
Assignee: Craig Welch

 On windows there are limitations to the length of command lines and 
 environment variables which prevent placing all classpath resources into 
 these elements.  Instead, a jar containing only a classpath manifest is 
 created to provide the classpath.  During this process wildcard references 
 are expanded by inspecting the filesystem.  Since archives are extracted to a 
 different location and linked into the final location after the classpath jar 
 is created, resources referred to via wildcards which exist in localized 
 archives  (.zip, tar.gz) are not added to the classpath manifest jar.  Since 
 these entries are removed from the final classpath for the container they are 
 not on the container's classpath as they should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels in each NM (Distributed configuration)

2014-10-21 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178608#comment-14178608
 ] 

Naganarasimha G R commented on YARN-2495:
-

Hi [~aw] ,
bq. I don't think you understand the use case at all. In fact, it's clear you 
need to re-read the sample script. It does not get updated with every new JDK. 
It's smart enough to update the label regardless of the JDK that is installed...
I meant like script needs to be modified for new label set for example, 
currently admin as configured for JDK Labels and further if he wanted to add 
label related to some native lib version, *As admin will knows all the valid 
native lib versions in the system(or can be automated to get this list) while 
modifying the script he will be able to configure the valid labels too*.

bq. which means the only friction to operations is point is going to be 
updating this 'valid label list' on the RM.
Seems like maintenance wise it might become difficult  for example, once the 
valid JDK labels are loaded admins will forget about this feature and later on 
based on the req, some other admin/person might update the JDK and he might not 
be aware about such a script exists which updates the labels based on JDK or 
native libs version. So he might miss to update the valid labels and that node 
might not be useful or wrong labels will will be tagged to it as new labels are 
not updated.

So i feel Allen's scenario needs to be addressed. As [~Wangda] suggested i feel 
centralized Label validation can be made configurable. Please provide opinion 
on this.

 Allow admin specify labels in each NM (Distributed configuration)
 -

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R

 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml or using script 
 suggested by [~aw])
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2719) Windows: Wildcard classpath variables not expanded against resources contained in archives

2014-10-21 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch resolved YARN-2719.
---
Resolution: Duplicate

 Windows: Wildcard classpath variables not expanded against resources 
 contained in archives
 --

 Key: YARN-2719
 URL: https://issues.apache.org/jira/browse/YARN-2719
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Craig Welch
Assignee: Craig Welch

 On windows there are limitations to the length of command lines and 
 environment variables which prevent placing all classpath resources into 
 these elements.  Instead, a jar containing only a classpath manifest is 
 created to provide the classpath.  During this process wildcard references 
 are expanded by inspecting the filesystem.  Since archives are extracted to a 
 different location and linked into the final location after the classpath jar 
 is created, resources referred to via wildcards which exist in localized 
 archives  (.zip, tar.gz) are not added to the classpath manifest jar.  Since 
 these entries are removed from the final classpath for the container they are 
 not on the container's classpath as they should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2720) Windows: Wildcard classpath variables not expanded against resources contained in archives

2014-10-21 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2720:
--
Attachment: YARN-2720.2.patch

Patch which tracks unexpanded wildcard classpath entries and adds them to the 
final classpath which is used when the container is launched

 Windows: Wildcard classpath variables not expanded against resources 
 contained in archives
 --

 Key: YARN-2720
 URL: https://issues.apache.org/jira/browse/YARN-2720
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-2720.2.patch


 On windows there are limitations to the length of command lines and 
 environment variables which prevent placing all classpath resources into 
 these elements.  Instead, a jar containing only a classpath manifest is 
 created to provide the classpath.  During this process wildcard references 
 are expanded by inspecting the filesystem.  Since archives are extracted to a 
 different location and linked into the final location after the classpath jar 
 is created, resources referred to via wildcards which exist in localized 
 archives  (.zip, tar.gz) are not added to the classpath manifest jar.  Since 
 these entries are removed from the final classpath for the container they are 
 not on the container's classpath as they should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-90) NodeManager should identify failed disks becoming good again

2014-10-21 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-90:
---
Summary: NodeManager should identify failed disks becoming good again  
(was: NodeManager should identify failed disks becoming good back again)

+1 latest patch lgtm as well.  Committing this.

 NodeManager should identify failed disks becoming good again
 

 Key: YARN-90
 URL: https://issues.apache.org/jira/browse/YARN-90
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Ravi Gummadi
Assignee: Varun Vasudev
 Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
 YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
 apache-yarn-90.10.patch, apache-yarn-90.2.patch, apache-yarn-90.3.patch, 
 apache-yarn-90.4.patch, apache-yarn-90.5.patch, apache-yarn-90.6.patch, 
 apache-yarn-90.7.patch, apache-yarn-90.8.patch, apache-yarn-90.9.patch


 MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
 down, it is marked as failed forever. To reuse that disk (after it becomes 
 good), NodeManager needs restart. This JIRA is to improve NodeManager to 
 reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor

2014-10-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178708#comment-14178708
 ] 

Allen Wittenauer commented on YARN-2701:



So let's summarize the current situation:

* previous code had a potential race condition. this is bad.
* reverting the code broke the portability for OSes that don't yet support the 
relatively new \*at routines.  this is equally bad, as it breaks a significant 
segment of the developer community.

There is a middle ground here that solves both of these problems:  introduce 
\*at routines as a compile-time dependency.  We should be able to detect if the 
current libc has mkdirat, opendirat, etc and, if not, compile our own in from 
sources like Free/Net/OpenBSD's implementation. 

Let's revert the revert, then build a new patch that does the above.

 Potential race condition in startLocalizer when using LinuxContainerExecutor  
 --

 Key: YARN-2701
 URL: https://issues.apache.org/jira/browse/YARN-2701
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
 YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch


 When using LinuxContainerExecutor do startLocalizer, we are using native code 
 container-executor.c. 
 {code}
  if (stat(npath, sb) != 0) {
if (mkdir(npath, perm) != 0) {
 {code}
 We are using check and create method to create the appDir under /usercache. 
 But if there are two containers trying to do this at the same time, race 
 condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good again

2014-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178714#comment-14178714
 ] 

Hudson commented on YARN-90:


FAILURE: Integrated in Hadoop-trunk-Commit #6301 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6301/])
YARN-90. NodeManager should identify failed disks becoming good again. 
Contributed by Varun Vasudev (jlowe: rev 
6f2028bd1514d90b831f889fd0ee7f2ba5c15000)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/loghandler/TestNonAggregatingLogHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/loghandler/NonAggregatingLogHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLocalDirsHandlerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java


 NodeManager should identify failed disks becoming good again
 

 Key: YARN-90
 URL: https://issues.apache.org/jira/browse/YARN-90
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Ravi Gummadi
Assignee: Varun Vasudev
 Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
 YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
 apache-yarn-90.10.patch, apache-yarn-90.2.patch, apache-yarn-90.3.patch, 
 apache-yarn-90.4.patch, apache-yarn-90.5.patch, apache-yarn-90.6.patch, 
 apache-yarn-90.7.patch, apache-yarn-90.8.patch, apache-yarn-90.9.patch


 MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
 down, it is marked as failed forever. To reuse that disk (after it becomes 
 good), NodeManager needs restart. This JIRA is to improve NodeManager to 
 reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2694) Ensure only single node labels specified in resource request, and node label expression only specified when resourceName=ANY

2014-10-21 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2694:
-
Attachment: YARN-2694-20141021-1.patch

Attached updated patch fixed test failure of TestContainerAllocation

 Ensure only single node labels specified in resource request, and node label 
 expression only specified when resourceName=ANY
 

 Key: YARN-2694
 URL: https://issues.apache.org/jira/browse/YARN-2694
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch


 Currently, node label expression supporting in capacity scheduler is partial 
 completed. Now node label expression specified in Resource Request will only 
 respected when it specified at ANY level. And a ResourceRequest with multiple 
 node labels will make user limit computation becomes tricky.
 Now we need temporarily disable them, changes include,
 - AMRMClient
 - ApplicationMasterService



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2720) Windows: Wildcard classpath variables not expanded against resources contained in archives

2014-10-21 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2720:
--
Attachment: YARN-2720.3.patch

It's hacky and not always possible to pass back the additional classpath info 
using the environment, so change the createJarWithClassPath signature to return 
an array

 Windows: Wildcard classpath variables not expanded against resources 
 contained in archives
 --

 Key: YARN-2720
 URL: https://issues.apache.org/jira/browse/YARN-2720
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-2720.2.patch, YARN-2720.3.patch


 On windows there are limitations to the length of command lines and 
 environment variables which prevent placing all classpath resources into 
 these elements.  Instead, a jar containing only a classpath manifest is 
 created to provide the classpath.  During this process wildcard references 
 are expanded by inspecting the filesystem.  Since archives are extracted to a 
 different location and linked into the final location after the classpath jar 
 is created, resources referred to via wildcards which exist in localized 
 archives  (.zip, tar.gz) are not added to the classpath manifest jar.  Since 
 these entries are removed from the final classpath for the container they are 
 not on the container's classpath as they should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2715) Proxy user is problem for RPC interface if yarn.resourcemanager.webapp.proxyuser is not set.

2014-10-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178761#comment-14178761
 ] 

Vinod Kumar Vavilapalli commented on YARN-2715:
---

Quick comments on the patch
 - Similar to the configuration that you changed, we also need to get rid of 
RM_WEBAPP_DELEGATION_TOKEN_AUTH_FILTER (is this a compatible change?)?
 - ResourceManager.startWepApp() used to allow loading common auth-filter, we 
now require our custom filter for various reasons? /cc [~vvasudev]
 - Test-case:
-- There's a ref to timeline-service in the patch
-- No need to start the entire mini-cluster - starting RM is enough?

 Proxy user is problem for RPC interface if 
 yarn.resourcemanager.webapp.proxyuser is not set.
 

 Key: YARN-2715
 URL: https://issues.apache.org/jira/browse/YARN-2715
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-2715.1.patch, YARN-2715.2.patch


 After YARN-2656, if people set hadoop.proxyuser for the client--RM RPC 
 interface, it's not going to work, because ProxyUsers#sip is a singleton per 
 daemon. After YARN-2656, RM has both channels that want to set this 
 configuration: RPC and HTTP. RPC interface sets it first by reading 
 hadoop.proxyuser, but it is overwritten by HTTP interface, who sets it to 
 empty because yarn.resourcemanager.webapp.proxyuser doesn't exist.
 The fix for it could be similar to what we've done for YARN-2676: make the 
 HTTP interface anyway source hadoop.proxyuser first, then 
 yarn.resourcemanager.webapp.proxyuser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels in each NM (Distributed configuration)

2014-10-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178762#comment-14178762
 ] 

Allen Wittenauer commented on YARN-2495:


bq. while modifying the script he will be able to configure the valid labels 
too.

The script can be updated *independently* of changing the running configuration 
files.  Changing the xml comfigs will also require a *coordinated* reconfigure 
of the RM.  That isn't realistic, especially for things such as rolling 
upgrades. HARM, of course, makes the situation even worse. Additionally, I'm 
sure the label validation code will spam the RM logs every time it gets an 
invalid label, which is pretty much a please fill the log directory action.

The *only* scenario I can think of where label validation has a practical use 
is if AMs and/or containers are allowed to inject labels.  But that should be a 
different control structure altogether and have zero impact on administrator 
controlled labels.

bq. Seems like maintenance wise it might become difficult for example,

Label validation actually makes your example worse because now the labels 
disappear completely.  Is it a problem with the script or is it a problem with 
the label definition?

bq.  i feel centralized Label validation can be made configurable. Please 
provide opinion on this.

Just disable it completely.  I'm still waiting to hear what practical 
application this bug would have.

 Allow admin specify labels in each NM (Distributed configuration)
 -

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R

 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml or using script 
 suggested by [~aw])
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2720) Windows: Wildcard classpath variables not expanded against resources contained in archives

2014-10-21 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2720:
--
Attachment: YARN-2720.4.patch

Updated version with unit test modification and white space fixes

 Windows: Wildcard classpath variables not expanded against resources 
 contained in archives
 --

 Key: YARN-2720
 URL: https://issues.apache.org/jira/browse/YARN-2720
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-2720.2.patch, YARN-2720.3.patch, YARN-2720.4.patch


 On windows there are limitations to the length of command lines and 
 environment variables which prevent placing all classpath resources into 
 these elements.  Instead, a jar containing only a classpath manifest is 
 created to provide the classpath.  During this process wildcard references 
 are expanded by inspecting the filesystem.  Since archives are extracted to a 
 different location and linked into the final location after the classpath jar 
 is created, resources referred to via wildcards which exist in localized 
 archives  (.zip, tar.gz) are not added to the classpath manifest jar.  Since 
 these entries are removed from the final classpath for the container they are 
 not on the container's classpath as they should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2468) Log handling for LRS

2014-10-21 Thread Kannan Rajah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178794#comment-14178794
 ] 

Kannan Rajah commented on YARN-2468:


Xuan, I looked through the code changes and have a question about uploading 
logs for unfinished containers. Let's say we have already uploaded syslog for a 
container at time T1. At time T2, the container is still running and when the 
log aggregation is triggered again, will it re-upload the same syslog file? 
That seems to be the case.

 Log handling for LRS
 

 Key: YARN-2468
 URL: https://issues.apache.org/jira/browse/YARN-2468
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2468.1.patch, YARN-2468.10.patch, 
 YARN-2468.11.patch, YARN-2468.2.patch, YARN-2468.3.patch, 
 YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, 
 YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, 
 YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, 
 YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch, 
 YARN-2468.7.1.patch, YARN-2468.7.patch, YARN-2468.8.patch, 
 YARN-2468.9.1.patch, YARN-2468.9.patch


 Currently, when application is finished, NM will start to do the log 
 aggregation. But for Long running service applications, this is not ideal. 
 The problems we have are:
 1) LRS applications are expected to run for a long time (weeks, months).
 2) Currently, all the container logs (from one NM) will be written into a 
 single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2720) Windows: Wildcard classpath variables not expanded against resources contained in archives

2014-10-21 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-2720:

Hadoop Flags: Reviewed

+1 for the patch, pending Jenkins run.  I've verified that this works in my 
environment with a few test runs.  Thank you for fixing this, Craig.

 Windows: Wildcard classpath variables not expanded against resources 
 contained in archives
 --

 Key: YARN-2720
 URL: https://issues.apache.org/jira/browse/YARN-2720
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-2720.2.patch, YARN-2720.3.patch, YARN-2720.4.patch


 On windows there are limitations to the length of command lines and 
 environment variables which prevent placing all classpath resources into 
 these elements.  Instead, a jar containing only a classpath manifest is 
 created to provide the classpath.  During this process wildcard references 
 are expanded by inspecting the filesystem.  Since archives are extracted to a 
 different location and linked into the final location after the classpath jar 
 is created, resources referred to via wildcards which exist in localized 
 archives  (.zip, tar.gz) are not added to the classpath manifest jar.  Since 
 these entries are removed from the final classpath for the container they are 
 not on the container's classpath as they should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2468) Log handling for LRS

2014-10-21 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178809#comment-14178809
 ] 

Xuan Gong commented on YARN-2468:
-

[~rkannan82] 
bq. Xuan, I looked through the code changes and have a question about uploading 
logs for unfinished containers. Let's say we have already uploaded syslog for a 
container at time T1. At time T2, the container is still running and when the 
log aggregation is triggered again, will it re-upload the same syslog file? 
That seems to be the case.

It will not. EveryTime after we do the log aggregation, we will save the 
information for aggregated log file with (containerId.toString() + _ + 
file.getName() + _+ file.lastModified()). So, in next run, before we start to 
upload logs, we will check the log file whether it exists in the 
savedAggregatedLogFileCache (uploadedFileMeta in AppLogAggregatorImpl), if it 
exists, we will skip. Otherwise, we will upload it.

 Log handling for LRS
 

 Key: YARN-2468
 URL: https://issues.apache.org/jira/browse/YARN-2468
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2468.1.patch, YARN-2468.10.patch, 
 YARN-2468.11.patch, YARN-2468.2.patch, YARN-2468.3.patch, 
 YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, 
 YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, 
 YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, 
 YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch, 
 YARN-2468.7.1.patch, YARN-2468.7.patch, YARN-2468.8.patch, 
 YARN-2468.9.1.patch, YARN-2468.9.patch


 Currently, when application is finished, NM will start to do the log 
 aggregation. But for Long running service applications, this is not ideal. 
 The problems we have are:
 1) LRS applications are expected to run for a long time (weeks, months).
 2) Currently, all the container logs (from one NM) will be written into a 
 single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2709) Add retry for timeline client getDelegationToken method

2014-10-21 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2709:

Attachment: YARN-2709-102114.patch

Hi [~zjshen], I updated my patch according to your comments. Specifically:

1. TimelineClientConnectionRetry is only used in test, so I added a 
visiblefortesting tag to it. I set the other two to private. 

2, 4, 7. fixed

3, 6. I changed the unit test code to use kerberos, and now the mock is not 
necessary. So I merged the getDelegationTokenInternal to run(). 

5. Fixed, but I thought you meant TestTimelineClient? (I didn't add any new 
imports in TimelineClientImpl. )

 Add retry for timeline client getDelegationToken method
 ---

 Key: YARN-2709
 URL: https://issues.apache.org/jira/browse/YARN-2709
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2709-102014-1.patch, YARN-2709-102014.patch, 
 YARN-2709-102114.patch


 As mentioned in YARN-2673, we need to add retry mechanism to timeline client 
 for secured clusters. This means if the timeline server is not available, a 
 timeline client needs to retry to get a delegation token. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2468) Log handling for LRS

2014-10-21 Thread Kannan Rajah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178864#comment-14178864
 ] 

Kannan Rajah commented on YARN-2468:


Thanks. But what about the case where the file was modified. Let's say 10 more 
lines were added to the syslog file. Doesn't it upload the full file again?

 Log handling for LRS
 

 Key: YARN-2468
 URL: https://issues.apache.org/jira/browse/YARN-2468
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2468.1.patch, YARN-2468.10.patch, 
 YARN-2468.11.patch, YARN-2468.2.patch, YARN-2468.3.patch, 
 YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, 
 YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, 
 YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, 
 YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch, 
 YARN-2468.7.1.patch, YARN-2468.7.patch, YARN-2468.8.patch, 
 YARN-2468.9.1.patch, YARN-2468.9.patch


 Currently, when application is finished, NM will start to do the log 
 aggregation. But for Long running service applications, this is not ideal. 
 The problems we have are:
 1) LRS applications are expected to run for a long time (weeks, months).
 2) Currently, all the container logs (from one NM) will be written into a 
 single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2468) Log handling for LRS

2014-10-21 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178869#comment-14178869
 ] 

Xuan Gong commented on YARN-2468:
-

bq. Thanks. But what about the case where the file was modified. Let's say 10 
more lines were added to the syslog file. Doesn't it upload the full file again?

This is the pre-requirement: We will rely on user’s log application (such as 
log4j) to do the rollover for the logs. Users need to set up their log 
application properly. For our side, we upload every logs in our log dirs.

 Log handling for LRS
 

 Key: YARN-2468
 URL: https://issues.apache.org/jira/browse/YARN-2468
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2468.1.patch, YARN-2468.10.patch, 
 YARN-2468.11.patch, YARN-2468.2.patch, YARN-2468.3.patch, 
 YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, 
 YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, 
 YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, 
 YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch, 
 YARN-2468.7.1.patch, YARN-2468.7.patch, YARN-2468.8.patch, 
 YARN-2468.9.1.patch, YARN-2468.9.patch


 Currently, when application is finished, NM will start to do the log 
 aggregation. But for Long running service applications, this is not ideal. 
 The problems we have are:
 1) LRS applications are expected to run for a long time (weeks, months).
 2) Currently, all the container logs (from one NM) will be written into a 
 single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2715) Proxy user is problem for RPC interface if yarn.resourcemanager.webapp.proxyuser is not set.

2014-10-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178867#comment-14178867
 ] 

Vinod Kumar Vavilapalli commented on YARN-2715:
---

More comments
 - AdminService should also start recognizing the new proxyuser prefix. It will 
be useful to refactor the proxy-user handling into a common method that is used 
in both RM and AdminService.
 - Add comments in RMAuthenticationFilterInitializer about why we are having 
special handling of proxy-users.

Please ignore my other comments about filter. We need to fix them separately, 
outlining below
 - Similar to the configuration that you changed, we also need to get rid of 
RM_WEBAPP_DELEGATION_TOKEN_AUTH_FILTER (is this a compatible change?)?
 - ResourceManager.startWepApp() used to allow loading common auth-filter, we 
now require our custom filter for various reasons? /cc Varun Vasudev
 - RMAuthenticationFilterInitializer.configPrefix doesn't need to be a class 
variable.
 - RMAuthenticationFilterInitializer and 
TimelineAuthenticationFilterInitializer share a lot of code, they can be 
refactored.

I'll file tickets for the above.

 Proxy user is problem for RPC interface if 
 yarn.resourcemanager.webapp.proxyuser is not set.
 

 Key: YARN-2715
 URL: https://issues.apache.org/jira/browse/YARN-2715
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-2715.1.patch, YARN-2715.2.patch


 After YARN-2656, if people set hadoop.proxyuser for the client--RM RPC 
 interface, it's not going to work, because ProxyUsers#sip is a singleton per 
 daemon. After YARN-2656, RM has both channels that want to set this 
 configuration: RPC and HTTP. RPC interface sets it first by reading 
 hadoop.proxyuser, but it is overwritten by HTTP interface, who sets it to 
 empty because yarn.resourcemanager.webapp.proxyuser doesn't exist.
 The fix for it could be similar to what we've done for YARN-2676: make the 
 HTTP interface anyway source hadoop.proxyuser first, then 
 yarn.resourcemanager.webapp.proxyuser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2694) Ensure only single node labels specified in resource request, and node label expression only specified when resourceName=ANY

2014-10-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178885#comment-14178885
 ] 

Hadoop QA commented on YARN-2694:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12676130/YARN-2694-20141021-1.patch
  against trunk revision 6f2028b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5484//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5484//console

This message is automatically generated.

 Ensure only single node labels specified in resource request, and node label 
 expression only specified when resourceName=ANY
 

 Key: YARN-2694
 URL: https://issues.apache.org/jira/browse/YARN-2694
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch


 Currently, node label expression supporting in capacity scheduler is partial 
 completed. Now node label expression specified in Resource Request will only 
 respected when it specified at ANY level. And a ResourceRequest with multiple 
 node labels will make user limit computation becomes tricky.
 Now we need temporarily disable them, changes include,
 - AMRMClient
 - ApplicationMasterService



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2720) Windows: Wildcard classpath variables not expanded against resources contained in archives

2014-10-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178894#comment-14178894
 ] 

Hadoop QA commented on YARN-2720:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676140/YARN-2720.4.patch
  against trunk revision 6f2028b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5485//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5485//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5485//console

This message is automatically generated.

 Windows: Wildcard classpath variables not expanded against resources 
 contained in archives
 --

 Key: YARN-2720
 URL: https://issues.apache.org/jira/browse/YARN-2720
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-2720.2.patch, YARN-2720.3.patch, YARN-2720.4.patch


 On windows there are limitations to the length of command lines and 
 environment variables which prevent placing all classpath resources into 
 these elements.  Instead, a jar containing only a classpath manifest is 
 created to provide the classpath.  During this process wildcard references 
 are expanded by inspecting the filesystem.  Since archives are extracted to a 
 different location and linked into the final location after the classpath jar 
 is created, resources referred to via wildcards which exist in localized 
 archives  (.zip, tar.gz) are not added to the classpath manifest jar.  Since 
 these entries are removed from the final classpath for the container they are 
 not on the container's classpath as they should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2720) Windows: Wildcard classpath variables not expanded against resources contained in archives

2014-10-21 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178897#comment-14178897
 ] 

Chris Nauroth commented on YARN-2720:
-

The Findbugs warnings are unrelated.  I'll commit this.

 Windows: Wildcard classpath variables not expanded against resources 
 contained in archives
 --

 Key: YARN-2720
 URL: https://issues.apache.org/jira/browse/YARN-2720
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-2720.2.patch, YARN-2720.3.patch, YARN-2720.4.patch


 On windows there are limitations to the length of command lines and 
 environment variables which prevent placing all classpath resources into 
 these elements.  Instead, a jar containing only a classpath manifest is 
 created to provide the classpath.  During this process wildcard references 
 are expanded by inspecting the filesystem.  Since archives are extracted to a 
 different location and linked into the final location after the classpath jar 
 is created, resources referred to via wildcards which exist in localized 
 archives  (.zip, tar.gz) are not added to the classpath manifest jar.  Since 
 these entries are removed from the final classpath for the container they are 
 not on the container's classpath as they should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2709) Add retry for timeline client getDelegationToken method

2014-10-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178921#comment-14178921
 ] 

Hadoop QA commented on YARN-2709:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12676151/YARN-2709-102114.patch
  against trunk revision 4e134a0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5486//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5486//console

This message is automatically generated.

 Add retry for timeline client getDelegationToken method
 ---

 Key: YARN-2709
 URL: https://issues.apache.org/jira/browse/YARN-2709
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2709-102014-1.patch, YARN-2709-102014.patch, 
 YARN-2709-102114.patch


 As mentioned in YARN-2673, we need to add retry mechanism to timeline client 
 for secured clusters. This means if the timeline server is not available, a 
 timeline client needs to retry to get a delegation token. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2720) Windows: Wildcard classpath variables not expanded against resources contained in archives

2014-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178926#comment-14178926
 ] 

Hudson commented on YARN-2720:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #6303 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6303/])
YARN-2720. Windows: Wildcard classpath variables not expanded against resources 
contained in archives. Contributed by Craig Welch. (cnauroth: rev 
6637e3cf95b3a9be8d6b9cd66bc849a0607e8ed5)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Classpath.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestFileUtil.java


 Windows: Wildcard classpath variables not expanded against resources 
 contained in archives
 --

 Key: YARN-2720
 URL: https://issues.apache.org/jira/browse/YARN-2720
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.6.0

 Attachments: YARN-2720.2.patch, YARN-2720.3.patch, YARN-2720.4.patch


 On windows there are limitations to the length of command lines and 
 environment variables which prevent placing all classpath resources into 
 these elements.  Instead, a jar containing only a classpath manifest is 
 created to provide the classpath.  During this process wildcard references 
 are expanded by inspecting the filesystem.  Since archives are extracted to a 
 different location and linked into the final location after the classpath jar 
 is created, resources referred to via wildcards which exist in localized 
 archives  (.zip, tar.gz) are not added to the classpath manifest jar.  Since 
 these entries are removed from the final classpath for the container they are 
 not on the container's classpath as they should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2715) Proxy user is problem for RPC interface if yarn.resourcemanager.webapp.proxyuser is not set.

2014-10-21 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178992#comment-14178992
 ] 

Zhijie Shen commented on YARN-2715:
---

Vinod, thanks for the comments. Agree with them, and let do the filter 
refactoring in a separate Jira. Here're the response the comments related to 
this one.

bq. Test-case:

Fixed the issue of the test cases and move it into the rm submodule.

bq. AdminService should also start recognizing the new proxyuser prefix.

Refactor the code of processing RM proxy user configs, and make both RM and 
AdminService refer to it. In AdminService, make the refreshing request source 
yarn-site.xml for RM specific configs too.

bq. Add comments in RMAuthenticationFilterInitializer about why we are having 
special handling of proxy-users.

Add a comment there.

I uploaded a new patch accordingly

 Proxy user is problem for RPC interface if 
 yarn.resourcemanager.webapp.proxyuser is not set.
 

 Key: YARN-2715
 URL: https://issues.apache.org/jira/browse/YARN-2715
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-2715.1.patch, YARN-2715.2.patch, YARN-2715.3.patch


 After YARN-2656, if people set hadoop.proxyuser for the client--RM RPC 
 interface, it's not going to work, because ProxyUsers#sip is a singleton per 
 daemon. After YARN-2656, RM has both channels that want to set this 
 configuration: RPC and HTTP. RPC interface sets it first by reading 
 hadoop.proxyuser, but it is overwritten by HTTP interface, who sets it to 
 empty because yarn.resourcemanager.webapp.proxyuser doesn't exist.
 The fix for it could be similar to what we've done for YARN-2676: make the 
 HTTP interface anyway source hadoop.proxyuser first, then 
 yarn.resourcemanager.webapp.proxyuser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2715) Proxy user is problem for RPC interface if yarn.resourcemanager.webapp.proxyuser is not set.

2014-10-21 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2715:
--
Attachment: YARN-2715.3.patch

 Proxy user is problem for RPC interface if 
 yarn.resourcemanager.webapp.proxyuser is not set.
 

 Key: YARN-2715
 URL: https://issues.apache.org/jira/browse/YARN-2715
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-2715.1.patch, YARN-2715.2.patch, YARN-2715.3.patch


 After YARN-2656, if people set hadoop.proxyuser for the client--RM RPC 
 interface, it's not going to work, because ProxyUsers#sip is a singleton per 
 daemon. After YARN-2656, RM has both channels that want to set this 
 configuration: RPC and HTTP. RPC interface sets it first by reading 
 hadoop.proxyuser, but it is overwritten by HTTP interface, who sets it to 
 empty because yarn.resourcemanager.webapp.proxyuser doesn't exist.
 The fix for it could be similar to what we've done for YARN-2676: make the 
 HTTP interface anyway source hadoop.proxyuser first, then 
 yarn.resourcemanager.webapp.proxyuser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2468) Log handling for LRS

2014-10-21 Thread Kannan Rajah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179001#comment-14179001
 ] 

Kannan Rajah commented on YARN-2468:


Makes sense. Sorry, but I have just one last question, not completely relevant 
to this JIRA though. Is there any ongoing effort to write the logs directly to 
HDFS instead of this 2 phase approach? If not, can you point out the reasons? 
This work being done to take care of the lifecycle of these logs seem fairly 
complex and also potentially adding performance overhead to the cluster. So I 
am interested to understand the rationale.

 Log handling for LRS
 

 Key: YARN-2468
 URL: https://issues.apache.org/jira/browse/YARN-2468
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2468.1.patch, YARN-2468.10.patch, 
 YARN-2468.11.patch, YARN-2468.2.patch, YARN-2468.3.patch, 
 YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, 
 YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, 
 YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, 
 YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch, 
 YARN-2468.7.1.patch, YARN-2468.7.patch, YARN-2468.8.patch, 
 YARN-2468.9.1.patch, YARN-2468.9.patch


 Currently, when application is finished, NM will start to do the log 
 aggregation. But for Long running service applications, this is not ideal. 
 The problems we have are:
 1) LRS applications are expected to run for a long time (weeks, months).
 2) Currently, all the container logs (from one NM) will be written into a 
 single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2721) Race condition: ZKRMStateStore retry logic may throw NodeExist exception

2014-10-21 Thread Jian He (JIRA)
Jian He created YARN-2721:
-

 Summary: Race condition: ZKRMStateStore retry logic may throw 
NodeExist exception 
 Key: YARN-2721
 URL: https://issues.apache.org/jira/browse/YARN-2721
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0


Blindly retrying operations in zookeeper will not work for non-idempotent 
operations (like create znode). The reason is that the client can do a create 
znode, but the response may not be returned because the server can die or 
timeout. In case of retrying the create znode, it will throw a NODE_EXISTS 
exception from the earlier create from the same session.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2721) Race condition: ZKRMStateStore retry logic may throw NodeExist exception

2014-10-21 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179067#comment-14179067
 ] 

Jian He commented on YARN-2721:
---

Curator should handle the retry properly which is addressed in YARN-2716.
As a temporary fix, we can simply ignore the potential NodeExist exception for 
now. Creating a patch. 

 Race condition: ZKRMStateStore retry logic may throw NodeExist exception 
 -

 Key: YARN-2721
 URL: https://issues.apache.org/jira/browse/YARN-2721
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0


 Blindly retrying operations in zookeeper will not work for non-idempotent 
 operations (like create znode). The reason is that the client can do a create 
 znode, but the response may not be returned because the server can die or 
 timeout. In case of retrying the create znode, it will throw a NODE_EXISTS 
 exception from the earlier create from the same session.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2709) Add retry for timeline client getDelegationToken method

2014-10-21 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179107#comment-14179107
 ] 

Zhijie Shen commented on YARN-2709:
---

Almost good to me. Some nits:

1. Can you add a comment to say the following config is to bypass the issue in 
HADOOP-11215.
{code}
conf.set(CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION,
kerberos);
{code}

2. For both retry test cases, set newMaxRetries to 5 and newIntervalMs to 500? 
Make sure it's able to retry multiple times?
{code}
int newMaxRetries = 1;
long newIntervalMs = 1500;
{coe}

3. token is an unused var
{code}
  TokenTimelineDelegationTokenIdentifier token = 
client.getDelegationToken(
UserGroupInformation.getCurrentUser().getShortUserName());
{code}

4. You can directly change connectionRetry to default visibility (no private 
modifier) because the test class is in the same package, and mark it 
@VisibleForTesting.
{code}
  @Private
  @VisibleForTesting
  public TimelineClientConnectionRetry getConnectionRetry() {
return connectionRetry;
  }
{code}

5. Retried is not thread safe, but it should be fine if it is not used for unit 
test. Would you please add a comment?
{code}
// Indicates if retries happened last time
@Private
@VisibleForTesting
public boolean retried = false;
{code}

 Add retry for timeline client getDelegationToken method
 ---

 Key: YARN-2709
 URL: https://issues.apache.org/jira/browse/YARN-2709
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2709-102014-1.patch, YARN-2709-102014.patch, 
 YARN-2709-102114.patch


 As mentioned in YARN-2673, we need to add retry mechanism to timeline client 
 for secured clusters. This means if the timeline server is not available, a 
 timeline client needs to retry to get a delegation token. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-2709) Add retry for timeline client getDelegationToken method

2014-10-21 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179107#comment-14179107
 ] 

Zhijie Shen edited comment on YARN-2709 at 10/21/14 9:10 PM:
-

Almost good to me. Some nits:

1. Can you add a comment to say the following config is to bypass the issue in 
HADOOP-11215.
{code}
conf.set(CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION,
kerberos);
{code}

2. For both retry test cases, set newMaxRetries to 5 and newIntervalMs to 500? 
Make sure it's able to retry multiple times?
{code}
int newMaxRetries = 1;
long newIntervalMs = 1500;
{code}

3. token is an unused var
{code}
  TokenTimelineDelegationTokenIdentifier token = 
client.getDelegationToken(
UserGroupInformation.getCurrentUser().getShortUserName());
{code}

4. You can directly change connectionRetry to default visibility (no private 
modifier) because the test class is in the same package, and mark it 
\@VisibleForTesting.
{code}
  @Private
  @VisibleForTesting
  public TimelineClientConnectionRetry getConnectionRetry() {
return connectionRetry;
  }
{code}

5. Retried is not thread safe, but it should be fine if it is not used for unit 
test. Would you please add a comment?
{code}
// Indicates if retries happened last time
@Private
@VisibleForTesting
public boolean retried = false;
{code}


was (Author: zjshen):
Almost good to me. Some nits:

1. Can you add a comment to say the following config is to bypass the issue in 
HADOOP-11215.
{code}
conf.set(CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION,
kerberos);
{code}

2. For both retry test cases, set newMaxRetries to 5 and newIntervalMs to 500? 
Make sure it's able to retry multiple times?
{code}
int newMaxRetries = 1;
long newIntervalMs = 1500;
{coe}

3. token is an unused var
{code}
  TokenTimelineDelegationTokenIdentifier token = 
client.getDelegationToken(
UserGroupInformation.getCurrentUser().getShortUserName());
{code}

4. You can directly change connectionRetry to default visibility (no private 
modifier) because the test class is in the same package, and mark it 
@VisibleForTesting.
{code}
  @Private
  @VisibleForTesting
  public TimelineClientConnectionRetry getConnectionRetry() {
return connectionRetry;
  }
{code}

5. Retried is not thread safe, but it should be fine if it is not used for unit 
test. Would you please add a comment?
{code}
// Indicates if retries happened last time
@Private
@VisibleForTesting
public boolean retried = false;
{code}

 Add retry for timeline client getDelegationToken method
 ---

 Key: YARN-2709
 URL: https://issues.apache.org/jira/browse/YARN-2709
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2709-102014-1.patch, YARN-2709-102014.patch, 
 YARN-2709-102114.patch


 As mentioned in YARN-2673, we need to add retry mechanism to timeline client 
 for secured clusters. This means if the timeline server is not available, a 
 timeline client needs to retry to get a delegation token. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2709) Add retry for timeline client getDelegationToken method

2014-10-21 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2709:

Attachment: YARN-2709-102114-1.patch

Hi [~zjshen], I've addressed your comments in this patch. If you have time 
please feel free to have a look at it. Thanks! 

 Add retry for timeline client getDelegationToken method
 ---

 Key: YARN-2709
 URL: https://issues.apache.org/jira/browse/YARN-2709
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2709-102014-1.patch, YARN-2709-102014.patch, 
 YARN-2709-102114-1.patch, YARN-2709-102114.patch


 As mentioned in YARN-2673, we need to add retry mechanism to timeline client 
 for secured clusters. This means if the timeline server is not available, a 
 timeline client needs to retry to get a delegation token. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2715) Proxy user is problem for RPC interface if yarn.resourcemanager.webapp.proxyuser is not set.

2014-10-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179162#comment-14179162
 ] 

Hadoop QA commented on YARN-2715:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676165/YARN-2715.3.patch
  against trunk revision ac56b06.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5487//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5487//console

This message is automatically generated.

 Proxy user is problem for RPC interface if 
 yarn.resourcemanager.webapp.proxyuser is not set.
 

 Key: YARN-2715
 URL: https://issues.apache.org/jira/browse/YARN-2715
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-2715.1.patch, YARN-2715.2.patch, YARN-2715.3.patch


 After YARN-2656, if people set hadoop.proxyuser for the client--RM RPC 
 interface, it's not going to work, because ProxyUsers#sip is a singleton per 
 daemon. After YARN-2656, RM has both channels that want to set this 
 configuration: RPC and HTTP. RPC interface sets it first by reading 
 hadoop.proxyuser, but it is overwritten by HTTP interface, who sets it to 
 empty because yarn.resourcemanager.webapp.proxyuser doesn't exist.
 The fix for it could be similar to what we've done for YARN-2676: make the 
 HTTP interface anyway source hadoop.proxyuser first, then 
 yarn.resourcemanager.webapp.proxyuser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-10-21 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179165#comment-14179165
 ] 

Craig Welch commented on YARN-2505:
---

[~ksumit] I do like the idea of being able to add node label(s) to multiple 
nodes, it seems natural/like it would be useful.  I would think that should be 
a post with a (list of) label(s) and a list of node ids.  I'm not sure if it 
will go in the first iteration but it makes sense to me to have it.  I don't 
think there's a compelling purpose at the moment for a node label type, it's 
a string/textual label and I think it is sensible to just model it as such.

 Support get/add/remove/change labels in RM REST API
 ---

 Key: YARN-2505
 URL: https://issues.apache.org/jira/browse/YARN-2505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Craig Welch
 Attachments: YARN-2505.1.patch, YARN-2505.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2709) Add retry for timeline client getDelegationToken method

2014-10-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179213#comment-14179213
 ] 

Hadoop QA commented on YARN-2709:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12676183/YARN-2709-102114-1.patch
  against trunk revision 4baca31.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5489//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5489//console

This message is automatically generated.

 Add retry for timeline client getDelegationToken method
 ---

 Key: YARN-2709
 URL: https://issues.apache.org/jira/browse/YARN-2709
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2709-102014-1.patch, YARN-2709-102014.patch, 
 YARN-2709-102114-1.patch, YARN-2709-102114.patch


 As mentioned in YARN-2673, we need to add retry mechanism to timeline client 
 for secured clusters. This means if the timeline server is not available, a 
 timeline client needs to retry to get a delegation token. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2721) Race condition: ZKRMStateStore retry logic may throw NodeExist exception

2014-10-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179218#comment-14179218
 ] 

Hadoop QA commented on YARN-2721:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676172/YARN-2721.1.patch
  against trunk revision b85919f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5488//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5488//console

This message is automatically generated.

 Race condition: ZKRMStateStore retry logic may throw NodeExist exception 
 -

 Key: YARN-2721
 URL: https://issues.apache.org/jira/browse/YARN-2721
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2721.1.patch


 Blindly retrying operations in zookeeper will not work for non-idempotent 
 operations (like create znode). The reason is that the client can do a create 
 znode, but the response may not be returned because the server can die or 
 timeout. In case of retrying the create znode, it will throw a NODE_EXISTS 
 exception from the earlier create from the same session.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2709) Add retry for timeline client getDelegationToken method

2014-10-21 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2709:

Attachment: YARN-2709-102114-2.patch

 Add retry for timeline client getDelegationToken method
 ---

 Key: YARN-2709
 URL: https://issues.apache.org/jira/browse/YARN-2709
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2709-102014-1.patch, YARN-2709-102014.patch, 
 YARN-2709-102114-2.patch, YARN-2709-102114.patch


 As mentioned in YARN-2673, we need to add retry mechanism to timeline client 
 for secured clusters. This means if the timeline server is not available, a 
 timeline client needs to retry to get a delegation token. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2709) Add retry for timeline client getDelegationToken method

2014-10-21 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2709:

Attachment: (was: YARN-2709-102114-1.patch)

 Add retry for timeline client getDelegationToken method
 ---

 Key: YARN-2709
 URL: https://issues.apache.org/jira/browse/YARN-2709
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2709-102014-1.patch, YARN-2709-102014.patch, 
 YARN-2709-102114-2.patch, YARN-2709-102114.patch


 As mentioned in YARN-2673, we need to add retry mechanism to timeline client 
 for secured clusters. This means if the timeline server is not available, a 
 timeline client needs to retry to get a delegation token. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2448) RM should expose the resource types considered during scheduling when AMs register

2014-10-21 Thread Maxim Ivanov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179235#comment-14179235
 ] 

Maxim Ivanov commented on YARN-2448:


[~vvasudev], just curious what this patch is aiming to achieve? Surely 
requirements of the specific application shouldn't change depending on 
scheduler is taking into consideration when doing its job of allocating 
resources. As [~kkambatl] suggested, AM can submit all it knows about resources 
it needs and then scheduler can safely ignore those which request which it 
doesn't know about

 RM should expose the resource types considered during scheduling when AMs 
 register
 --

 Key: YARN-2448
 URL: https://issues.apache.org/jira/browse/YARN-2448
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.6.0

 Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch, 
 apache-yarn-2448.2.patch


 The RM should expose the name of the ResourceCalculator being used when AMs 
 register, as part of the RegisterApplicationMasterResponse.
 This will allow applications to make better decisions when scheduling. 
 MapReduce for example, only looks at memory when deciding it's scheduling, 
 even though the RM could potentially be using the DominantResourceCalculator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-10-21 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2505:
--
Attachment: YARN-2505.3.patch

Attached is a preview patch - it is incomplete, lacking tests  not all planned 
interfaces are present yet, but a basic set is.  Couple minor changes wrt the 
plan a couple posts above - to keep consistency with the rest of the service 
interface I'm sticking with plural names everywhere (/nodes/, /node-labels/)  
deferring (perhaps, permanently...) a couple of items which seem 
duplicative/purely for completeness but not really needed.


DONE

POST .../cluster/node-labels
(serialized data) adds multiple labels in an operation

GET .../cluster/node-labels
returns multiple labels as serialized data (all labels)

POST .../cluster/nodes/id/labels
(serialized data) adds multiple labels to a node in an operation

GET .../cluster/nodes/id/labels
returns serialized set of all labels for node

TODO NOW

DELETE .../cluster/node-labels/a
deletes an existing node label, a

DELETE .../cluster/nodes/id/labels/a
deletes label a from node id

PUT .../cluster/node-labels/a
creates a new node label, a

PUT .../cluster/nodes/id/labels/a
ads label a to node id

JUST DEFERRING FOR THE MOMENT - seems like a good idea, though

POST label to multiple nodes

DEFERRING - DUPLICATIVE

GET .../cluster/node-labels/a
return value indicates presence or absense of a

GET .../cluster/nodes/id/labels/a
indicates existance of label on node by return value

 Support get/add/remove/change labels in RM REST API
 ---

 Key: YARN-2505
 URL: https://issues.apache.org/jira/browse/YARN-2505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Craig Welch
 Attachments: YARN-2505.1.patch, YARN-2505.3.patch, YARN-2505.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2709) Add retry for timeline client getDelegationToken method

2014-10-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179280#comment-14179280
 ] 

Hadoop QA commented on YARN-2709:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12676202/YARN-2709-102114-2.patch
  against trunk revision 4baca31.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5490//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5490//console

This message is automatically generated.

 Add retry for timeline client getDelegationToken method
 ---

 Key: YARN-2709
 URL: https://issues.apache.org/jira/browse/YARN-2709
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2709-102014-1.patch, YARN-2709-102014.patch, 
 YARN-2709-102114-2.patch, YARN-2709-102114.patch


 As mentioned in YARN-2673, we need to add retry mechanism to timeline client 
 for secured clusters. This means if the timeline server is not available, a 
 timeline client needs to retry to get a delegation token. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2722) Disable SSLv3 (POODLEbleed vulnerability) in YARN shuffle

2014-10-21 Thread Wei Yan (JIRA)
Wei Yan created YARN-2722:
-

 Summary: Disable SSLv3 (POODLEbleed vulnerability) in YARN shuffle
 Key: YARN-2722
 URL: https://issues.apache.org/jira/browse/YARN-2722
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2709) Add retry for timeline client getDelegationToken method

2014-10-21 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179281#comment-14179281
 ] 

Zhijie Shen commented on YARN-2709:
---

+1 for the last patch. Will commit it.

 Add retry for timeline client getDelegationToken method
 ---

 Key: YARN-2709
 URL: https://issues.apache.org/jira/browse/YARN-2709
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2709-102014-1.patch, YARN-2709-102014.patch, 
 YARN-2709-102114-2.patch, YARN-2709-102114.patch


 As mentioned in YARN-2673, we need to add retry mechanism to timeline client 
 for secured clusters. This means if the timeline server is not available, a 
 timeline client needs to retry to get a delegation token. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2709) Add retry for timeline client getDelegationToken method

2014-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179288#comment-14179288
 ] 

Hudson commented on YARN-2709:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6307 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6307/])
YARN-2709. Made timeline client getDelegationToken API retry if 
ConnectException happens. Contributed by Li Lu. (zjshen: rev 
b2942762d7f76d510ece5621c71116346a6b12f6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* hadoop-yarn-project/CHANGES.txt


 Add retry for timeline client getDelegationToken method
 ---

 Key: YARN-2709
 URL: https://issues.apache.org/jira/browse/YARN-2709
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Fix For: 2.6.0

 Attachments: YARN-2709-102014-1.patch, YARN-2709-102014.patch, 
 YARN-2709-102114-2.patch, YARN-2709-102114.patch


 As mentioned in YARN-2673, we need to add retry mechanism to timeline client 
 for secured clusters. This means if the timeline server is not available, a 
 timeline client needs to retry to get a delegation token. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2647) Add yarn queue CLI to get queue info including labels of such queue

2014-10-21 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179377#comment-14179377
 ] 

Mayank Bansal commented on YARN-2647:
-

HI [~sunilg] ,

Are u still working on this ? Can i take it over if u r not looking at it?

Thanks,
Mayank

 Add yarn queue CLI to get queue info including labels of such queue
 ---

 Key: YARN-2647
 URL: https://issues.apache.org/jira/browse/YARN-2647
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Wangda Tan
Assignee: Sunil G





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI

2014-10-21 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal reassigned YARN-2698:
---

Assignee: Mayank Bansal  (was: Wangda Tan)

 Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of 
 RMAdminCLI
 ---

 Key: YARN-2698
 URL: https://issues.apache.org/jira/browse/YARN-2698
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Mayank Bansal

 YARN RMAdminCLI and AdminService should have write API only, for other read 
 APIs, they should be located at YARNCLI and RMClientService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI

2014-10-21 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179379#comment-14179379
 ] 

Mayank Bansal commented on YARN-2698:
-

taking it over

Thanks,
Mayank

 Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of 
 RMAdminCLI
 ---

 Key: YARN-2698
 URL: https://issues.apache.org/jira/browse/YARN-2698
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Mayank Bansal

 YARN RMAdminCLI and AdminService should have write API only, for other read 
 APIs, they should be located at YARNCLI and RMClientService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2723) rmadmin -replaceLabelsOnNode does not correctly parse port

2014-10-21 Thread Phil D'Amore (JIRA)
Phil D'Amore created YARN-2723:
--

 Summary: rmadmin -replaceLabelsOnNode does not correctly parse port
 Key: YARN-2723
 URL: https://issues.apache.org/jira/browse/YARN-2723
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Reporter: Phil D'Amore


There is an off-by-one issue in RMAdminCLI.java (line 457):

port = Integer.valueOf(nodeIdStr.substring(nodeIdStr.indexOf(:)));

should probably be:

port = Integer.valueOf(nodeIdStr.substring(nodeIdStr.indexOf(:)+1));

Currently attempting to add a label to a node with a port specified looks like 
this:

[yarn@ip-10-0-0-66 ~]$ yarn rmadmin -replaceLabelsOnNode 
node.example.com:45454,test-label
replaceLabelsOnNode: For input string: :45454
Usage: yarn rmadmin [-replaceLabelsOnNode [node1:port,label1,label2 
node2:port,label1,label2]]

It appears to be trying to parse the ':' as part of the integer because the 
substring index is off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2715) Proxy user is problem for RPC interface if yarn.resourcemanager.webapp.proxyuser is not set.

2014-10-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179398#comment-14179398
 ] 

Vinod Kumar Vavilapalli commented on YARN-2715:
---

Looks better now. A couple of comments: 
 - TestRMAdminService: can you change it to also use the YARN property names 
also?
 - Can we move processRMProxyUsersConf to somewhere else? Say RMServerUtils?

 Proxy user is problem for RPC interface if 
 yarn.resourcemanager.webapp.proxyuser is not set.
 

 Key: YARN-2715
 URL: https://issues.apache.org/jira/browse/YARN-2715
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-2715.1.patch, YARN-2715.2.patch, YARN-2715.3.patch


 After YARN-2656, if people set hadoop.proxyuser for the client--RM RPC 
 interface, it's not going to work, because ProxyUsers#sip is a singleton per 
 daemon. After YARN-2656, RM has both channels that want to set this 
 configuration: RPC and HTTP. RPC interface sets it first by reading 
 hadoop.proxyuser, but it is overwritten by HTTP interface, who sets it to 
 empty because yarn.resourcemanager.webapp.proxyuser doesn't exist.
 The fix for it could be similar to what we've done for YARN-2676: make the 
 HTTP interface anyway source hadoop.proxyuser first, then 
 yarn.resourcemanager.webapp.proxyuser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-10-21 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2505:
--
Attachment: YARN-2505.4.patch

WIP update - some items were moved to [YARN-2503] and so are removed from here, 
some tests are now done

 Support get/add/remove/change labels in RM REST API
 ---

 Key: YARN-2505
 URL: https://issues.apache.org/jira/browse/YARN-2505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Craig Welch
 Attachments: YARN-2505.1.patch, YARN-2505.3.patch, YARN-2505.4.patch, 
 YARN-2505.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2724) If an unreadable file is encountered during log aggregation then aggregated file in HDFS badly formed

2014-10-21 Thread Sumit Mohanty (JIRA)
Sumit Mohanty created YARN-2724:
---

 Summary: If an unreadable file is encountered during log 
aggregation then aggregated file in HDFS badly formed
 Key: YARN-2724
 URL: https://issues.apache.org/jira/browse/YARN-2724
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.5.1
Reporter: Sumit Mohanty


Look into the log output snippet. It looks like there is an issue during 
aggregation when an unreadable file is encountered. Likely, this results in bad 
encoding.

{noformat}
LogType: command-13.json
LogLength: 13934
Log Contents:
Error aggregating log file. Log file : 
/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
 (Permission denied)command-3.json13983Error aggregating log file. Log file : 
/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
 (Permission denied)
  
errors-3.txt0gc.log-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
[GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K-15575K(184320K), 
0.0488700 secs] 163840K-15575K(1028096K), 0.0492510 secs] [Times: user=0.06 
sys=0.01, real=0.05 secs]
2014-10-21T04:45:14.939+: 8.027: [GC2014-10-21T04:45:14.939+: 8.027: 
[ParNew: 179415K-11865K(184320K), 0.0941310 secs] 179415K-17228K(1028096K), 
0.0943140 secs] [Times: user=0.13 sys=0.04, real=0.09 secs]
2014-10-21T04:46:42.099+: 95.187: [GC2014-10-21T04:46:42.099+: 95.187: 
[ParNew: 175705K-12802K(184320K), 0.0466420 secs] 181068K-18164K(1028096K), 
0.0468490 secs] [Times: user=0.06 sys=0.00, real=0.04 secs]
{noformat}

Specifically, look at the text after the exception text. There should be two 
more entries for log files but none exist. This is likely due to the fact that 
command-13.json is expected to be of length 13934 but its is not as the file 
was never read.

I think, it should have been

{noformat}
LogType: command-13.json
LogLength: Length of the exception text
Log Contents:
Error aggregating log file. Log file : 
/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
 (Permission denied)command-3.json13983Error aggregating log file. Log file : 
/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
 (Permission denied)
{noformat}

{noformat}
LogType: errors-3.txt
LogLength:0
Log Contents:
{noformat}

{noformat}
LogType:gc.log
LogLength:???
Log Contents:
..-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
[GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K- ...
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-10-21 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179438#comment-14179438
 ] 

Jian He commented on YARN-2198:
---

Hi [~rusanu], +1 for the latest patch. Looks like it's conflicting with trunk 
again. Could you update ? 
I'd like to commit this after that. sorry for the repeated updating.

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: .YARN-2198.delta.10.patch, YARN-2198.1.patch, 
 YARN-2198.11.patch, YARN-2198.12.patch, YARN-2198.13.patch, 
 YARN-2198.14.patch, YARN-2198.15.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
 YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, 
 YARN-2198.delta.7.patch, YARN-2198.separation.patch, 
 YARN-2198.trunk.10.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, 
 YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch, YARN-2198.trunk.9.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires the process launching the container to be LocalSystem or a 
 member of the a local Administrators group. Since the process in question is 
 the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2715) Proxy user is problem for RPC interface if yarn.resourcemanager.webapp.proxyuser is not set.

2014-10-21 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2715:
--
Attachment: YARN-2715.4.patch

Thanks for the comments. I uploaded a new patch which address the two comments.

 Proxy user is problem for RPC interface if 
 yarn.resourcemanager.webapp.proxyuser is not set.
 

 Key: YARN-2715
 URL: https://issues.apache.org/jira/browse/YARN-2715
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-2715.1.patch, YARN-2715.2.patch, YARN-2715.3.patch, 
 YARN-2715.4.patch


 After YARN-2656, if people set hadoop.proxyuser for the client--RM RPC 
 interface, it's not going to work, because ProxyUsers#sip is a singleton per 
 daemon. After YARN-2656, RM has both channels that want to set this 
 configuration: RPC and HTTP. RPC interface sets it first by reading 
 hadoop.proxyuser, but it is overwritten by HTTP interface, who sets it to 
 empty because yarn.resourcemanager.webapp.proxyuser doesn't exist.
 The fix for it could be similar to what we've done for YARN-2676: make the 
 HTTP interface anyway source hadoop.proxyuser first, then 
 yarn.resourcemanager.webapp.proxyuser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2495) Allow admin specify labels in each NM (Distributed configuration)

2014-10-21 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2495:

Attachment: YARN-2495_20141022.1.patch

Hi All,
Uploading a WIP patch, just to share the approach...
Completed 
* User can set labels in each NM (by setting yarn-site.xml or using script 
suggested by Allen Wittenauer)
* NM will send labels to RM via ResourceTracker API
* RM will set labels in NodeLabelManager when NM register/update labels

Pending :
* No test cases written. and may be test cases modified to resolve compilation 
issues can be done in a better way.
* As per the design Doc there was requirement to specifically either support 
distributed or Centralized but was not sure how to get it done as current 
design does not seem to be specific to central or distributed and  class was 
configured to identify NodeLabelsManager, 
* Currently i have not completely ensured that Node Labels are sent only when 
there is change in the labels got from script to the last successful updated 
labels to RM. Response of node heartbeat and node register can be made used for 
this. Yet to finish.
* Configuration has been added to validate for centralized labels. But need to 
further discuss on the approach for this.

 Allow admin specify labels in each NM (Distributed configuration)
 -

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml or using script 
 suggested by [~aw])
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time

2014-10-21 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179479#comment-14179479
 ] 

Jian He commented on YARN-1915:
---

bq. However there's some confusion as to how the client token master key should 
be sent to the RM (e.g.: via container credentials, via the current method, 
etc.)
thanks Jason, I read through the discussion. I prefer setting via container 
credential as it's the common way to pass the credential/tokens to both AM 
container and non-AM container.  I'm also OK to get the current patch in first.

 ClientToAMTokenMasterKey should be provided to AM at launch time
 

 Key: YARN-1915
 URL: https://issues.apache.org/jira/browse/YARN-1915
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Hitesh Shah
Assignee: Jason Lowe
Priority: Blocker
 Attachments: YARN-1915.patch, YARN-1915v2.patch, YARN-1915v3.patch


 Currently, the AM receives the key as part of registration. This introduces a 
 race where a client can connect to the AM when the AM has not received the 
 key. 
 Current Flow:
 1) AM needs to start the client listening service in order to get host:port 
 and send it to the RM as part of registration
 2) RM gets the port info in register() and transitions the app to RUNNING. 
 Responds back with client secret to AM.
 3) User asks RM for client token. Gets it and pings the AM. AM hasn't 
 received client secret from RM and so RPC itself rejects the request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2715) Proxy user is problem for RPC interface if yarn.resourcemanager.webapp.proxyuser is not set.

2014-10-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179491#comment-14179491
 ] 

Hadoop QA commented on YARN-2715:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676234/YARN-2715.4.patch
  against trunk revision b294276.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5491//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5491//console

This message is automatically generated.

 Proxy user is problem for RPC interface if 
 yarn.resourcemanager.webapp.proxyuser is not set.
 

 Key: YARN-2715
 URL: https://issues.apache.org/jira/browse/YARN-2715
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-2715.1.patch, YARN-2715.2.patch, YARN-2715.3.patch, 
 YARN-2715.4.patch


 After YARN-2656, if people set hadoop.proxyuser for the client--RM RPC 
 interface, it's not going to work, because ProxyUsers#sip is a singleton per 
 daemon. After YARN-2656, RM has both channels that want to set this 
 configuration: RPC and HTTP. RPC interface sets it first by reading 
 hadoop.proxyuser, but it is overwritten by HTTP interface, who sets it to 
 empty because yarn.resourcemanager.webapp.proxyuser doesn't exist.
 The fix for it could be similar to what we've done for YARN-2676: make the 
 HTTP interface anyway source hadoop.proxyuser first, then 
 yarn.resourcemanager.webapp.proxyuser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2723) rmadmin -replaceLabelsOnNode does not correctly parse port

2014-10-21 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R reassigned YARN-2723:
---

Assignee: Naganarasimha G R

 rmadmin -replaceLabelsOnNode does not correctly parse port
 --

 Key: YARN-2723
 URL: https://issues.apache.org/jira/browse/YARN-2723
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Reporter: Phil D'Amore
Assignee: Naganarasimha G R

 There is an off-by-one issue in RMAdminCLI.java (line 457):
 port = Integer.valueOf(nodeIdStr.substring(nodeIdStr.indexOf(:)));
 should probably be:
 port = Integer.valueOf(nodeIdStr.substring(nodeIdStr.indexOf(:)+1));
 Currently attempting to add a label to a node with a port specified looks 
 like this:
 [yarn@ip-10-0-0-66 ~]$ yarn rmadmin -replaceLabelsOnNode 
 node.example.com:45454,test-label
 replaceLabelsOnNode: For input string: :45454
 Usage: yarn rmadmin [-replaceLabelsOnNode [node1:port,label1,label2 
 node2:port,label1,label2]]
 It appears to be trying to parse the ':' as part of the integer because the 
 substring index is off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels in each NM (Distributed configuration)

2014-10-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179507#comment-14179507
 ] 

Vinod Kumar Vavilapalli commented on YARN-2495:
---

This is very useful to get in for 2.6, [~leftnoteasy]/[~Naganarasimha] how 
feasible is it?

 Allow admin specify labels in each NM (Distributed configuration)
 -

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml or using script 
 suggested by [~aw])
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2715) Proxy user is problem for RPC interface if yarn.resourcemanager.webapp.proxyuser is not set.

2014-10-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179509#comment-14179509
 ] 

Vinod Kumar Vavilapalli commented on YARN-2715:
---

Looks good, +1. Checking this in.

 Proxy user is problem for RPC interface if 
 yarn.resourcemanager.webapp.proxyuser is not set.
 

 Key: YARN-2715
 URL: https://issues.apache.org/jira/browse/YARN-2715
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-2715.1.patch, YARN-2715.2.patch, YARN-2715.3.patch, 
 YARN-2715.4.patch


 After YARN-2656, if people set hadoop.proxyuser for the client--RM RPC 
 interface, it's not going to work, because ProxyUsers#sip is a singleton per 
 daemon. After YARN-2656, RM has both channels that want to set this 
 configuration: RPC and HTTP. RPC interface sets it first by reading 
 hadoop.proxyuser, but it is overwritten by HTTP interface, who sets it to 
 empty because yarn.resourcemanager.webapp.proxyuser doesn't exist.
 The fix for it could be similar to what we've done for YARN-2676: make the 
 HTTP interface anyway source hadoop.proxyuser first, then 
 yarn.resourcemanager.webapp.proxyuser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2715) Proxy user is problem for RPC interface if yarn.resourcemanager.webapp.proxyuser is not set.

2014-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179514#comment-14179514
 ] 

Hudson commented on YARN-2715:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6308 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6308/])
YARN-2715. Fixed ResourceManager to respect common configurations for proxy 
users/groups beyond just the YARN level config. Contributed by Zhijie Shen. 
(vinodkv: rev c0e034336c85296be6f549d88d137fb2b2b79a15)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMProxyUsersConf.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/http/RMAuthenticationFilterInitializer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesDelegationTokenAuthentication.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* hadoop-yarn-project/CHANGES.txt


 Proxy user is problem for RPC interface if 
 yarn.resourcemanager.webapp.proxyuser is not set.
 

 Key: YARN-2715
 URL: https://issues.apache.org/jira/browse/YARN-2715
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2715.1.patch, YARN-2715.2.patch, YARN-2715.3.patch, 
 YARN-2715.4.patch


 After YARN-2656, if people set hadoop.proxyuser for the client--RM RPC 
 interface, it's not going to work, because ProxyUsers#sip is a singleton per 
 daemon. After YARN-2656, RM has both channels that want to set this 
 configuration: RPC and HTTP. RPC interface sets it first by reading 
 hadoop.proxyuser, but it is overwritten by HTTP interface, who sets it to 
 empty because yarn.resourcemanager.webapp.proxyuser doesn't exist.
 The fix for it could be similar to what we've done for YARN-2676: make the 
 HTTP interface anyway source hadoop.proxyuser first, then 
 yarn.resourcemanager.webapp.proxyuser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2724) If an unreadable file is encountered during log aggregation then aggregated file in HDFS badly formed

2014-10-21 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-2724:
---

Assignee: Xuan Gong

 If an unreadable file is encountered during log aggregation then aggregated 
 file in HDFS badly formed
 -

 Key: YARN-2724
 URL: https://issues.apache.org/jira/browse/YARN-2724
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Xuan Gong

 Look into the log output snippet. It looks like there is an issue during 
 aggregation when an unreadable file is encountered. Likely, this results in 
 bad encoding.
 {noformat}
 LogType: command-13.json
 LogLength: 13934
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
   
 errors-3.txt0gc.log-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K-15575K(184320K), 
 0.0488700 secs] 163840K-15575K(1028096K), 0.0492510 secs] [Times: user=0.06 
 sys=0.01, real=0.05 secs]
 2014-10-21T04:45:14.939+: 8.027: [GC2014-10-21T04:45:14.939+: 8.027: 
 [ParNew: 179415K-11865K(184320K), 0.0941310 secs] 179415K-17228K(1028096K), 
 0.0943140 secs] [Times: user=0.13 sys=0.04, real=0.09 secs]
 2014-10-21T04:46:42.099+: 95.187: [GC2014-10-21T04:46:42.099+: 
 95.187: [ParNew: 175705K-12802K(184320K), 0.0466420 secs] 
 181068K-18164K(1028096K), 0.0468490 secs] [Times: user=0.06 sys=0.00, 
 real=0.04 secs]
 {noformat}
 Specifically, look at the text after the exception text. There should be two 
 more entries for log files but none exist. This is likely due to the fact 
 that command-13.json is expected to be of length 13934 but its is not as the 
 file was never read.
 I think, it should have been
 {noformat}
 LogType: command-13.json
 LogLength: Length of the exception text
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
 {noformat}
 {noformat}
 LogType: errors-3.txt
 LogLength:0
 Log Contents:
 {noformat}
 {noformat}
 LogType:gc.log
 LogLength:???
 Log Contents:
 ..-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K- ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time

2014-10-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179528#comment-14179528
 ] 

Vinod Kumar Vavilapalli commented on YARN-1915:
---

bq. Yes, I thought the ugi mangling was gone, but the AMRMToken is indeed 
manually removed.
I had a JIRA for fixing this, so that NMs themselves will remove it for non-AM 
containers, will find it.

bq. I'm assuming there was a valid reason why the secret is passed in the 
registration response, perhaps for future functionality.
The secret used to be in env. We moved it to registration because of security 
issues in Windows.

bq. However there's some confusion as to how the client token master key should 
be sent to the RM (e.g.: via container credentials, via the current method, 
etc.).
We can deprecate the key returning in response and instead put it inside 
container credentials. The credentials is unfortunately named as 'tokens' - it 
was always token so far. We could deprecate tokens too and instead move to 
credentials ala CredentialsInfo for web-services.

The wait in the current patch is worrisome *only* if we have large number of 
clients pinging in and blocking RPC handlers. This doesn't happen in practice 
though, I'm okay getting it in for 2.6.

 ClientToAMTokenMasterKey should be provided to AM at launch time
 

 Key: YARN-1915
 URL: https://issues.apache.org/jira/browse/YARN-1915
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Hitesh Shah
Assignee: Jason Lowe
Priority: Blocker
 Attachments: YARN-1915.patch, YARN-1915v2.patch, YARN-1915v3.patch


 Currently, the AM receives the key as part of registration. This introduces a 
 race where a client can connect to the AM when the AM has not received the 
 key. 
 Current Flow:
 1) AM needs to start the client listening service in order to get host:port 
 and send it to the RM as part of registration
 2) RM gets the port info in register() and transitions the app to RUNNING. 
 Responds back with client secret to AM.
 3) User asks RM for client token. Gets it and pings the AM. AM hasn't 
 received client secret from RM and so RPC itself rejects the request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor

2014-10-21 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2701:

Attachment: YARN-2701.addendum.1.patch

 Potential race condition in startLocalizer when using LinuxContainerExecutor  
 --

 Key: YARN-2701
 URL: https://issues.apache.org/jira/browse/YARN-2701
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
 YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch, 
 YARN-2701.addendum.1.patch


 When using LinuxContainerExecutor do startLocalizer, we are using native code 
 container-executor.c. 
 {code}
  if (stat(npath, sb) != 0) {
if (mkdir(npath, perm) != 0) {
 {code}
 We are using check and create method to create the appDir under /usercache. 
 But if there are two containers trying to do this at the same time, race 
 condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor

2014-10-21 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179560#comment-14179560
 ] 

Xuan Gong commented on YARN-2701:
-

[~aw] Thanks for the summary. Let us not revert the current code. I uploaded an 
addendum patch. In this patch, I revert the current mkdirs codes to the codes 
which were committed in YARN-2161. Also I made some necessary changes to solve 
the race condition issue. If you can review it, that will be very helpful.

 

 Potential race condition in startLocalizer when using LinuxContainerExecutor  
 --

 Key: YARN-2701
 URL: https://issues.apache.org/jira/browse/YARN-2701
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
 YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch, 
 YARN-2701.addendum.1.patch


 When using LinuxContainerExecutor do startLocalizer, we are using native code 
 container-executor.c. 
 {code}
  if (stat(npath, sb) != 0) {
if (mkdir(npath, perm) != 0) {
 {code}
 We are using check and create method to create the appDir under /usercache. 
 But if there are two containers trying to do this at the same time, race 
 condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2721) Race condition: ZKRMStateStore retry logic may throw NodeExist exception

2014-10-21 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179563#comment-14179563
 ] 

Zhijie Shen commented on YARN-2721:
---

+1 straightforward change. Let's make the complete solution in YARN-2716.  Will 
commit the patch.

 Race condition: ZKRMStateStore retry logic may throw NodeExist exception 
 -

 Key: YARN-2721
 URL: https://issues.apache.org/jira/browse/YARN-2721
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2721.1.patch


 Blindly retrying operations in zookeeper will not work for non-idempotent 
 operations (like create znode). The reason is that the client can do a create 
 znode, but the response may not be returned because the server can die or 
 timeout. In case of retrying the create znode, it will throw a NODE_EXISTS 
 exception from the earlier create from the same session.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2721) Race condition: ZKRMStateStore retry logic may throw NodeExist exception

2014-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179573#comment-14179573
 ] 

Hudson commented on YARN-2721:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6309 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6309/])
YARN-2721. Suppress NodeExist exception thrown by ZKRMStateStore when it 
retries creating znode. Contributed by Jian He. (zjshen: rev 
7e3b5e6f5cb4945b4fab27e8a83d04280df50e17)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* hadoop-yarn-project/CHANGES.txt


 Race condition: ZKRMStateStore retry logic may throw NodeExist exception 
 -

 Key: YARN-2721
 URL: https://issues.apache.org/jira/browse/YARN-2721
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2721.1.patch


 Blindly retrying operations in zookeeper will not work for non-idempotent 
 operations (like create znode). The reason is that the client can do a create 
 znode, but the response may not be returned because the server can die or 
 timeout. In case of retrying the create znode, it will throw a NODE_EXISTS 
 exception from the earlier create from the same session.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2710) RM HA tests failed intermittently on trunk

2014-10-21 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2710:
-
Attachment: TestResourceTrackerOnHA-output.2.txt

I could reproduced same issue about TestResourceTrackerOnHA - it's intermittent 
failure, and it happens rarely. Attaching log on my local.

 RM HA tests failed intermittently on trunk
 --

 Key: YARN-2710
 URL: https://issues.apache.org/jira/browse/YARN-2710
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Reporter: Wangda Tan
 Attachments: TestResourceTrackerOnHA-output.2.txt, 
 org.apache.hadoop.yarn.client.TestResourceTrackerOnHA-output.txt


 Failure like, it can be happened in TestApplicationClientProtocolOnHA, 
 TestResourceTrackerOnHA, etc.
 {code}
 org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
 testGetApplicationAttemptsOnHA(org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA)
   Time elapsed: 9.491 sec   ERROR!
 java.net.ConnectException: Call From asf905.gq1.ygridcore.net/67.195.81.149 
 to asf905.gq1.ygridcore.net:28032 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
   at 
 org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
   at 
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
   at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
   at org.apache.hadoop.ipc.Client.call(Client.java:1438)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
   at com.sun.proxy.$Proxy17.getApplicationAttempts(Unknown Source)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationAttempts(ApplicationClientProtocolPBClientImpl.java:372)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
   at com.sun.proxy.$Proxy18.getApplicationAttempts(Unknown Source)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationAttempts(YarnClientImpl.java:583)
   at 
 org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA.testGetApplicationAttemptsOnHA(TestApplicationClientProtocolOnHA.java:137)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2723) rmadmin -replaceLabelsOnNode does not correctly parse port

2014-10-21 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2723:
-
Issue Type: Sub-task  (was: Bug)
Parent: YARN-2492

 rmadmin -replaceLabelsOnNode does not correctly parse port
 --

 Key: YARN-2723
 URL: https://issues.apache.org/jira/browse/YARN-2723
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Phil D'Amore
Assignee: Naganarasimha G R

 There is an off-by-one issue in RMAdminCLI.java (line 457):
 port = Integer.valueOf(nodeIdStr.substring(nodeIdStr.indexOf(:)));
 should probably be:
 port = Integer.valueOf(nodeIdStr.substring(nodeIdStr.indexOf(:)+1));
 Currently attempting to add a label to a node with a port specified looks 
 like this:
 [yarn@ip-10-0-0-66 ~]$ yarn rmadmin -replaceLabelsOnNode 
 node.example.com:45454,test-label
 replaceLabelsOnNode: For input string: :45454
 Usage: yarn rmadmin [-replaceLabelsOnNode [node1:port,label1,label2 
 node2:port,label1,label2]]
 It appears to be trying to parse the ':' as part of the integer because the 
 substring index is off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)