[jira] [Commented] (YARN-2468) Log handling for LRS

2014-10-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156127#comment-14156127
 ] 

Zhijie Shen commented on YARN-2468:
---

bq. I would like to check how many log files we can upload this time. If the 
number is 0, we can skip this time. And this check is also happened before 
LogKey.write(), otherwise, we will write key, but without value.

I think Vinod meant that pendingUploadFiles is needed, but doesn't need to the 
member variable. getPendingLogFilesToUploadForThisContainer can return this 
collection, and pass it into LogValue.write by adding one param of it.

2. IMHO, the following code can be improved. If we use iterator, we can delete 
the unnecessary element on the fly.
{code}
  for (File file : candidates) {
Matcher fileMatcher = filterPattern.matcher(file.getName());
if (fileMatcher.find()) {
  filteredFiles.add(file);
}
  }
  if (!exclusion) {
return filteredFiles;
  } else {
candidates.removeAll(filteredFiles);
return candidates;
  }
{code}
This block could be:
{code}
...
while(candidatesItr.hasNext()) {
  candidate = candidatesItr.next();
  ...
  if ((not match  inclusive) || (match  exclusive)) {
candidatesItr.remove()
  } 
}
{code}

3. [~jianhe] mentioned to me before that the following condition is not always 
true to determine an AM container. Any idea? And it seems that we don't need 
shouldUploadLogsForRunningContainer, we can re-use shouldUploadLogs and set 
wasContainerSuccessful to true. Personally, if it's not trivial to identify the 
AM container, I prefer to write a TODO comment and leave it until we implement 
the log retention API.
{code}
  if (containerId.getId() == 1) {
return true;
  }
{code}

bq. It seems to be, let's validate this via a test-case.

Is it addressed by
{code}
this.conf.setLong(YarnConfiguration.DEBUG_NM_DELETE_DELAY_SEC, 3600);
{code}
Is it better to add a line of comment of the rationale behind the config?

5. Can the following code
{code}
SetContainerId finishedContainers = new HashSetContainerId();
for (ContainerId id : pendingContainerInThisCycle) {
  finishedContainers.add(id);
}
{code}
be simplified as
{code}
 SetContainerId finishedContainers = new 
HashSetContainerId(pendingContainerInThisCycle);
{code}

 Log handling for LRS
 

 Key: YARN-2468
 URL: https://issues.apache.org/jira/browse/YARN-2468
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, 
 YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, 
 YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, 
 YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, 
 YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch, 
 YARN-2468.7.1.patch, YARN-2468.7.patch, YARN-2468.8.patch, 
 YARN-2468.9.1.patch, YARN-2468.9.patch


 Currently, when application is finished, NM will start to do the log 
 aggregation. But for Long running service applications, this is not ideal. 
 The problems we have are:
 1) LRS applications are expected to run for a long time (weeks, months).
 2) Currently, all the container logs (from one NM) will be written into a 
 single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2640) TestDirectoryCollection.testCreateDirectories failed

2014-10-02 Thread Jun Gong (JIRA)
Jun Gong created YARN-2640:
--

 Summary: TestDirectoryCollection.testCreateDirectories failed
 Key: YARN-2640
 URL: https://issues.apache.org/jira/browse/YARN-2640
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Jun Gong
Assignee: Jun Gong


When running test mvn test -Dtest=TestDirectoryCollection, it failed:
{code}
Running org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.538 sec  
FAILURE! - in org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
  Time elapsed: 0.969 sec   FAILURE!
java.lang.AssertionError: local dir parent not created with proper permissions 
expected:rwxr-xr-x but was:rwxrwxr-x
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at 
org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:104)
{code}

I found it was because testDiskSpaceUtilizationLimit ran before 
testCreateDirectories when running test, then directory dirA was created in 
test function testDiskSpaceUtilizationLimit. When testCreateDirectories tried 
to create dirA with specified permission, it found dirA has already been 
there and it did nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor

2014-10-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156192#comment-14156192
 ] 

Junping Du commented on YARN-1972:
--

Hi [~vinodkv], I think we should commit this patch to branch-2.6 given this 
JIRA is marked as fixed in 2.6. 

 Implement secure Windows Container Executor
 ---

 Key: YARN-1972
 URL: https://issues.apache.org/jira/browse/YARN-1972
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Fix For: 2.6.0

 Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, 
 YARN-1972.delta.4.patch, YARN-1972.delta.5-branch-2.patch, 
 YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, YARN-1972.trunk.5.patch


 h1. Windows Secure Container Executor (WCE)
 YARN-1063 adds the necessary infrasturcture to launch a process as a domain 
 user as a solution for the problem of having a security boundary between 
 processes executed in YARN containers and the Hadoop services. The WCE is a 
 container executor that leverages the winutils capabilities introduced in 
 YARN-1063 and launches containers as an OS process running as the job 
 submitter user. A description of the S4U infrastructure used by YARN-1063 
 alternatives considered can be read on that JIRA.
 The WCE is based on the DefaultContainerExecutor. It relies on the DCE to 
 drive the flow of execution, but it overwrrides some emthods to the effect of:
 * change the DCE created user cache directories to be owned by the job user 
 and by the nodemanager group.
 * changes the actual container run command to use the 'createAsUser' command 
 of winutils task instead of 'create'
 * runs the localization as standalone process instead of an in-process Java 
 method call. This in turn relies on the winutil createAsUser feature to run 
 the localization as the job user.
  
 When compared to LinuxContainerExecutor (LCE), the WCE has some minor 
 differences:
 * it does no delegate the creation of the user cache directories to the 
 native implementation.
 * it does no require special handling to be able to delete user files
 The approach on the WCE came from a practical trial-and-error approach. I had 
 to iron out some issues around the Windows script shell limitations (command 
 line length) to get it to work, the biggest issue being the huge CLASSPATH 
 that is commonplace in Hadoop environment container executions. The job 
 container itself is already dealing with this via a so called 'classpath 
 jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch 
 as a separate container the same issue had to be resolved and I used the same 
 'classpath jar' approach.
 h2. Deployment Requirements
 To use the WCE one needs to set the 
 `yarn.nodemanager.container-executor.class` to 
 `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` 
 and set the `yarn.nodemanager.windows-secure-container-executor.group` to a 
 Windows security group name that is the nodemanager service principal is a 
 member of (equivalent of LCE 
 `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE 
 does not require any configuration outside of the Hadoop own's yar-site.xml.
 For WCE to work the nodemanager must run as a service principal that is 
 member of the local Administrators group or LocalSystem. this is derived from 
 the need to invoke LoadUserProfile API which mention these requirements in 
 the specifications. This is in addition to the SE_TCB privilege mentioned in 
 YARN-1063, but this requirement will automatically imply that the SE_TCB 
 privilege is held by the nodemanager. For the Linux speakers in the audience, 
 the requirement is basically to run NM as root.
 h2. Dedicated high privilege Service
 Due to the high privilege required by the WCE we had discussed the need to 
 isolate the high privilege operations into a separate process, an 'executor' 
 service that is solely responsible to start the containers (incloding the 
 localizer). The NM would have to authenticate, authorize and communicate with 
 this service via an IPC mechanism and use this service to launch the 
 containers. I still believe we'll end up deploying such a service, but the 
 effort to onboard such a new platfrom specific new service on the project are 
 not trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM

2014-10-02 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156198#comment-14156198
 ] 

Jun Gong commented on YARN-2617:


I investigated why TestDirectoryCollection failed. And it might be because of 
YARN-2640. Could you help check and review it please? Thank you.

 NM does not need to send finished container whose APP is not running to RM
 --

 Key: YARN-2617
 URL: https://issues.apache.org/jira/browse/YARN-2617
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jun Gong
Assignee: Jun Gong
 Fix For: 2.6.0

 Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, 
 YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.6.patch, 
 YARN-2617.patch


 We([~chenchun]) are testing RM work preserving restart and found the 
 following logs when we ran a simple MapReduce task PI. NM continuously 
 reported completed containers whose Application had already finished while AM 
 had finished. 
 {code}
 2014-09-26 17:00:42,228 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 2014-09-26 17:00:42,228 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 2014-09-26 17:00:43,230 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 2014-09-26 17:00:43,230 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 2014-09-26 17:00:44,233 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 2014-09-26 17:00:44,233 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {code}
 In the patch for YARN-1372, ApplicationImpl on NM should guarantee to  clean 
 up already completed applications. But it will only remove appId from  
 'app.context.getApplications()' when ApplicaitonImpl received evnet 
 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might 
 receive this event for a long time or could not receive. 
 * For NonAggregatingLogHandler, it wait for 
 YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, 
 then it will be scheduled to delete Application logs and send the event.
 * For LogAggregationService, it might fail(e.g. if user does not have HDFS 
 write permission), and it will not send the event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2562) ContainerId@toString() is unreadable for epoch 0 after YARN-2182

2014-10-02 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2562:
-
Attachment: (was: YARN-2562.5.patch)

 ContainerId@toString() is unreadable for epoch 0 after YARN-2182
 -

 Key: YARN-2562
 URL: https://issues.apache.org/jira/browse/YARN-2562
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, 
 YARN-2562.4.patch, YARN-2562.5.patch


 ContainerID string format is unreadable for RMs that restarted at least once 
 (epoch  0) after YARN-2182. For e.g, 
 container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2562) ContainerId@toString() is unreadable for epoch 0 after YARN-2182

2014-10-02 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2562:
-
Attachment: YARN-2562.5.patch

 ContainerId@toString() is unreadable for epoch 0 after YARN-2182
 -

 Key: YARN-2562
 URL: https://issues.apache.org/jira/browse/YARN-2562
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, 
 YARN-2562.4.patch, YARN-2562.5.patch


 ContainerID string format is unreadable for RMs that restarted at least once 
 (epoch  0) after YARN-2182. For e.g, 
 container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2640) TestDirectoryCollection.testCreateDirectories failed

2014-10-02 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-2640:
---
Attachment: YARN-2640.patch

Patch submitted.

 TestDirectoryCollection.testCreateDirectories failed
 

 Key: YARN-2640
 URL: https://issues.apache.org/jira/browse/YARN-2640
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-2640.patch


 When running test mvn test -Dtest=TestDirectoryCollection, it failed:
 {code}
 Running org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.538 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
   Time elapsed: 0.969 sec   FAILURE!
 java.lang.AssertionError: local dir parent not created with proper 
 permissions expected:rwxr-xr-x but was:rwxrwxr-x
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:104)
 {code}
 I found it was because testDiskSpaceUtilizationLimit ran before 
 testCreateDirectories when running test, then directory dirA was created in 
 test function testDiskSpaceUtilizationLimit. When testCreateDirectories tried 
 to create dirA with specified permission, it found dirA has already been 
 there and it did nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch 0 after YARN-2182

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156228#comment-14156228
 ] 

Hadoop QA commented on YARN-2562:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672512/YARN-2562.5.patch
  against trunk revision 9e40de6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5233//console

This message is automatically generated.

 ContainerId@toString() is unreadable for epoch 0 after YARN-2182
 -

 Key: YARN-2562
 URL: https://issues.apache.org/jira/browse/YARN-2562
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, 
 YARN-2562.4.patch, YARN-2562.5.patch


 ContainerID string format is unreadable for RMs that restarted at least once 
 (epoch  0) after YARN-2182. For e.g, 
 container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2640) TestDirectoryCollection.testCreateDirectories failed

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156230#comment-14156230
 ] 

Hadoop QA commented on YARN-2640:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672513/YARN-2640.patch
  against trunk revision 9e40de6.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5234//console

This message is automatically generated.

 TestDirectoryCollection.testCreateDirectories failed
 

 Key: YARN-2640
 URL: https://issues.apache.org/jira/browse/YARN-2640
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-2640.patch


 When running test mvn test -Dtest=TestDirectoryCollection, it failed:
 {code}
 Running org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.538 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
   Time elapsed: 0.969 sec   FAILURE!
 java.lang.AssertionError: local dir parent not created with proper 
 permissions expected:rwxr-xr-x but was:rwxrwxr-x
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:104)
 {code}
 I found it was because testDiskSpaceUtilizationLimit ran before 
 testCreateDirectories when running test, then directory dirA was created in 
 test function testDiskSpaceUtilizationLimit. When testCreateDirectories tried 
 to create dirA with specified permission, it found dirA has already been 
 there and it did nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2640) TestDirectoryCollection.testCreateDirectories failed

2014-10-02 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-2640:
---
Attachment: YARN-2640.2.patch

 TestDirectoryCollection.testCreateDirectories failed
 

 Key: YARN-2640
 URL: https://issues.apache.org/jira/browse/YARN-2640
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-2640.2.patch, YARN-2640.patch


 When running test mvn test -Dtest=TestDirectoryCollection, it failed:
 {code}
 Running org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.538 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
   Time elapsed: 0.969 sec   FAILURE!
 java.lang.AssertionError: local dir parent not created with proper 
 permissions expected:rwxr-xr-x but was:rwxrwxr-x
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:104)
 {code}
 I found it was because testDiskSpaceUtilizationLimit ran before 
 testCreateDirectories when running test, then directory dirA was created in 
 test function testDiskSpaceUtilizationLimit. When testCreateDirectories tried 
 to create dirA with specified permission, it found dirA has already been 
 there and it did nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2613) NMClient doesn't have retries for supporting rolling-upgrades

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156330#comment-14156330
 ] 

Hudson commented on YARN-2613:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #698 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/698/])
YARN-2613. Support retry in NMClient for rolling-upgrades. (Contributed by Jian 
He) (junping_du: rev 0708827a935d190d439854e08bb5a655d7daa606)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestNMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/ContainerManagementProtocolProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/NMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
* hadoop-yarn-project/CHANGES.txt


 NMClient doesn't have retries for supporting rolling-upgrades
 -

 Key: YARN-2613
 URL: https://issues.apache.org/jira/browse/YARN-2613
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2613.1.patch, YARN-2613.2.patch, YARN-2613.3.patch


 While NM is rolling upgrade, client should retry NM until it comes up. This 
 jira is to add a NMProxy (similar to RMProxy) with retry implementation to 
 support rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156357#comment-14156357
 ] 

Hudson commented on YARN-1063:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #698 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/698/])
YARN-1063. Augmented Hadoop common winutils to have the ability to create 
containers as domain users. Contributed by Remus Rusanu. (vinodkv: rev 
5ca97f1e60b8a7848f6eadd15f6c08ed390a8cda)
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestWinUtils.java
* hadoop-common-project/hadoop-common/src/main/winutils/symlink.c
* hadoop-common-project/hadoop-common/src/main/winutils/chown.c
* hadoop-common-project/hadoop-common/src/main/winutils/task.c
* hadoop-yarn-project/CHANGES.txt
* hadoop-common-project/hadoop-common/src/main/winutils/include/winutils.h
* hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c


 Winutils needs ability to create task as domain user
 

 Key: YARN-1063
 URL: https://issues.apache.org/jira/browse/YARN-1063
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
 Environment: Windows
Reporter: Kyle Leckie
Assignee: Remus Rusanu
  Labels: security, windows
 Fix For: 2.6.0

 Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, 
 YARN-1063.5.patch, YARN-1063.6.patch, YARN-1063.patch


 h1. Summary:
 Securing a Hadoop cluster requires constructing some form of security 
 boundary around the processes executed in YARN containers. Isolation based on 
 Windows user isolation seems most feasible. This approach is similar to the 
 approach taken by the existing LinuxContainerExecutor. The current patch to 
 winutils.exe adds the ability to create a process as a domain user. 
 h1. Alternative Methods considered:
 h2. Process rights limited by security token restriction:
 On Windows access decisions are made by examining the security token of a 
 process. It is possible to spawn a process with a restricted security token. 
 Any of the rights granted by SIDs of the default token may be restricted. It 
 is possible to see this in action by examining the security tone of a 
 sandboxed process launch be a web browser. Typically the launched process 
 will have a fully restricted token and need to access machine resources 
 through a dedicated broker process that enforces a custom security policy. 
 This broker process mechanism would break compatibility with the typical 
 Hadoop container process. The Container process must be able to utilize 
 standard function calls for disk and network IO. I performed some work 
 looking at ways to ACL the local files to the specific launched without 
 granting rights to other processes launched on the same machine but found 
 this to be an overly complex solution. 
 h2. Relying on APP containers:
 Recent versions of windows have the ability to launch processes within an 
 isolated container. Application containers are supported for execution of 
 WinRT based executables. This method was ruled out due to the lack of 
 official support for standard windows APIs. At some point in the future 
 windows may support functionality similar to BSD jails or Linux containers, 
 at that point support for containers should be added.
 h1. Create As User Feature Description:
 h2. Usage:
 A new sub command was added to the set of task commands. Here is the syntax:
 winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE]
 Some notes:
 * The username specified is in the format of user@domain
 * The machine executing this command must be joined to the domain of the user 
 specified
 * The domain controller must allow the account executing the command access 
 to the user information. For this join the account to the predefined group 
 labeled Pre-Windows 2000 Compatible Access
 * The account running the command must have several rights on the local 
 machine. These can be managed manually using secpol.msc: 
 ** Act as part of the operating system - SE_TCB_NAME
 ** Replace a process-level token - SE_ASSIGNPRIMARYTOKEN_NAME
 ** Adjust memory quotas for a process - SE_INCREASE_QUOTA_NAME
 * The launched process will not have rights to the desktop so will not be 
 able to display any information or create UI.
 * The launched process will have no network credentials. Any access of 
 network resources that requires domain authentication will fail.
 h2. Implementation:
 Winutils performs the following steps:
 # Enable the required privileges for the current process.
 # Register as a trusted process with the Local Security Authority (LSA).
 # Create a new logon for the user passed on the command line.
 # Load/Create a profile on the local machine for the new logon.
 # Create a new environment 

[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156349#comment-14156349
 ] 

Hudson commented on YARN-2630:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #698 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/698/])
YARN-2630. Prevented previous AM container status from being acquired by the 
current restarted AM. Contributed by Jian He. (zjshen: rev 
52bbe0f11bc8e97df78a1ab9b63f4eff65fd7a76)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
* hadoop-yarn-project/CHANGES.txt


 TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
 -

 Key: YARN-2630
 URL: https://issues.apache.org/jira/browse/YARN-2630
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, 
 YARN-2630.4.patch


 The problem is that after YARN-1372, in work-preserving AM restart, the 
 re-launched AM will also receive previously failed AM container. But 
 DistributedShell logic is not expecting this extra completed container.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156343#comment-14156343
 ] 

Hudson commented on YARN-1972:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #698 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/698/])
YARN-1972. Added a secure container-executor for Windows. Contributed by Remus 
Rusanu. (vinodkv: rev ba7f31c2ee8d23ecb183f88920ef06053c0b9769)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java


 Implement secure Windows Container Executor
 ---

 Key: YARN-1972
 URL: https://issues.apache.org/jira/browse/YARN-1972
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Fix For: 2.6.0

 Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, 
 YARN-1972.delta.4.patch, YARN-1972.delta.5-branch-2.patch, 
 YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, YARN-1972.trunk.5.patch


 h1. Windows Secure Container Executor (WCE)
 YARN-1063 adds the necessary infrasturcture to launch a process as a domain 
 user as a solution for the problem of having a security boundary between 
 processes executed in YARN containers and the Hadoop services. The WCE is a 
 container executor that leverages the winutils capabilities introduced in 
 YARN-1063 and launches containers as an OS process running as the job 
 submitter user. A description of the S4U infrastructure used by YARN-1063 
 alternatives considered can be read on that JIRA.
 The WCE is based on the DefaultContainerExecutor. It relies on the DCE to 
 drive the flow of execution, but it overwrrides some emthods to the effect of:
 * change the DCE created user cache directories to be owned by the job user 
 and by the nodemanager group.
 * changes the actual container run command to use the 'createAsUser' command 
 of winutils task instead of 'create'
 * runs the localization as standalone process instead of an in-process Java 
 method call. This in turn relies on the winutil createAsUser feature to run 
 the localization as the job user.
  
 When compared to LinuxContainerExecutor (LCE), the WCE has some minor 
 differences:
 * it does no delegate the creation of the user cache directories to the 
 native implementation.
 * it does no require special handling to be able to delete user files
 The approach on the WCE came from a practical trial-and-error approach. I had 
 to iron out some issues around the Windows script shell limitations (command 
 line length) to get it to work, the biggest issue being the huge CLASSPATH 
 that is commonplace in Hadoop environment container executions. The job 
 container itself is already dealing with this via a so called 'classpath 
 jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch 
 as a separate container the same issue had to be resolved and I used the same 
 'classpath jar' approach.
 h2. Deployment Requirements
 To use the WCE one needs to set the 
 `yarn.nodemanager.container-executor.class` to 
 

[jira] [Commented] (YARN-2446) Using TimelineNamespace to shield the entities of a user

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156334#comment-14156334
 ] 

Hudson commented on YARN-2446:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #698 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/698/])
YARN-2446. Augmented Timeline service APIs to start taking in domains as a 
parameter while posting entities and events. Contributed by Zhijie Shen. 
(vinodkv: rev 9e40de6af7959ac7bb5f4e4d2833ca14ea457614)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/timeline/TestTimelineRecords.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TimelineStoreTestUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServicesWithSSL.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/security/TestTimelineACLsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/MemoryTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineACLsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineDataManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelinePutResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineEntity.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestLeveldbTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java
* hadoop-yarn-project/CHANGES.txt


 Using TimelineNamespace to shield the entities of a user
 

 Key: YARN-2446
 URL: https://issues.apache.org/jira/browse/YARN-2446
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.6.0

 Attachments: YARN-2446.1.patch, YARN-2446.2.patch, YARN-2446.3.patch


 Given YARN-2102 adds TimelineNamespace, we can make use of it to shield the 
 entities, preventing them from being accessed or affected by other users' 
 operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2640) TestDirectoryCollection.testCreateDirectories failed

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156364#comment-14156364
 ] 

Hadoop QA commented on YARN-2640:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672530/YARN-2640.2.patch
  against trunk revision 9e40de6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5235//console

This message is automatically generated.

 TestDirectoryCollection.testCreateDirectories failed
 

 Key: YARN-2640
 URL: https://issues.apache.org/jira/browse/YARN-2640
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-2640.2.patch, YARN-2640.patch


 When running test mvn test -Dtest=TestDirectoryCollection, it failed:
 {code}
 Running org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.538 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
   Time elapsed: 0.969 sec   FAILURE!
 java.lang.AssertionError: local dir parent not created with proper 
 permissions expected:rwxr-xr-x but was:rwxrwxr-x
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:104)
 {code}
 I found it was because testDiskSpaceUtilizationLimit ran before 
 testCreateDirectories when running test, then directory dirA was created in 
 test function testDiskSpaceUtilizationLimit. When testCreateDirectories tried 
 to create dirA with specified permission, it found dirA has already been 
 there and it did nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156435#comment-14156435
 ] 

Hudson commented on YARN-1063:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1889 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1889/])
YARN-1063. Augmented Hadoop common winutils to have the ability to create 
containers as domain users. Contributed by Remus Rusanu. (vinodkv: rev 
5ca97f1e60b8a7848f6eadd15f6c08ed390a8cda)
* hadoop-common-project/hadoop-common/src/main/winutils/symlink.c
* hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c
* hadoop-common-project/hadoop-common/src/main/winutils/chown.c
* hadoop-common-project/hadoop-common/src/main/winutils/include/winutils.h
* hadoop-common-project/hadoop-common/src/main/winutils/task.c
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestWinUtils.java
* hadoop-yarn-project/CHANGES.txt


 Winutils needs ability to create task as domain user
 

 Key: YARN-1063
 URL: https://issues.apache.org/jira/browse/YARN-1063
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
 Environment: Windows
Reporter: Kyle Leckie
Assignee: Remus Rusanu
  Labels: security, windows
 Fix For: 2.6.0

 Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, 
 YARN-1063.5.patch, YARN-1063.6.patch, YARN-1063.patch


 h1. Summary:
 Securing a Hadoop cluster requires constructing some form of security 
 boundary around the processes executed in YARN containers. Isolation based on 
 Windows user isolation seems most feasible. This approach is similar to the 
 approach taken by the existing LinuxContainerExecutor. The current patch to 
 winutils.exe adds the ability to create a process as a domain user. 
 h1. Alternative Methods considered:
 h2. Process rights limited by security token restriction:
 On Windows access decisions are made by examining the security token of a 
 process. It is possible to spawn a process with a restricted security token. 
 Any of the rights granted by SIDs of the default token may be restricted. It 
 is possible to see this in action by examining the security tone of a 
 sandboxed process launch be a web browser. Typically the launched process 
 will have a fully restricted token and need to access machine resources 
 through a dedicated broker process that enforces a custom security policy. 
 This broker process mechanism would break compatibility with the typical 
 Hadoop container process. The Container process must be able to utilize 
 standard function calls for disk and network IO. I performed some work 
 looking at ways to ACL the local files to the specific launched without 
 granting rights to other processes launched on the same machine but found 
 this to be an overly complex solution. 
 h2. Relying on APP containers:
 Recent versions of windows have the ability to launch processes within an 
 isolated container. Application containers are supported for execution of 
 WinRT based executables. This method was ruled out due to the lack of 
 official support for standard windows APIs. At some point in the future 
 windows may support functionality similar to BSD jails or Linux containers, 
 at that point support for containers should be added.
 h1. Create As User Feature Description:
 h2. Usage:
 A new sub command was added to the set of task commands. Here is the syntax:
 winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE]
 Some notes:
 * The username specified is in the format of user@domain
 * The machine executing this command must be joined to the domain of the user 
 specified
 * The domain controller must allow the account executing the command access 
 to the user information. For this join the account to the predefined group 
 labeled Pre-Windows 2000 Compatible Access
 * The account running the command must have several rights on the local 
 machine. These can be managed manually using secpol.msc: 
 ** Act as part of the operating system - SE_TCB_NAME
 ** Replace a process-level token - SE_ASSIGNPRIMARYTOKEN_NAME
 ** Adjust memory quotas for a process - SE_INCREASE_QUOTA_NAME
 * The launched process will not have rights to the desktop so will not be 
 able to display any information or create UI.
 * The launched process will have no network credentials. Any access of 
 network resources that requires domain authentication will fail.
 h2. Implementation:
 Winutils performs the following steps:
 # Enable the required privileges for the current process.
 # Register as a trusted process with the Local Security Authority (LSA).
 # Create a new logon for the user passed on the command line.
 # Load/Create a profile on the local machine for the new logon.
 # Create a new environment 

[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156427#comment-14156427
 ] 

Hudson commented on YARN-2630:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1889 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1889/])
YARN-2630. Prevented previous AM container status from being acquired by the 
current restarted AM. Contributed by Jian He. (zjshen: rev 
52bbe0f11bc8e97df78a1ab9b63f4eff65fd7a76)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


 TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
 -

 Key: YARN-2630
 URL: https://issues.apache.org/jira/browse/YARN-2630
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, 
 YARN-2630.4.patch


 The problem is that after YARN-1372, in work-preserving AM restart, the 
 re-launched AM will also receive previously failed AM container. But 
 DistributedShell logic is not expecting this extra completed container.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156421#comment-14156421
 ] 

Hudson commented on YARN-1972:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1889 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1889/])
YARN-1972. Added a secure container-executor for Windows. Contributed by Remus 
Rusanu. (vinodkv: rev ba7f31c2ee8d23ecb183f88920ef06053c0b9769)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java


 Implement secure Windows Container Executor
 ---

 Key: YARN-1972
 URL: https://issues.apache.org/jira/browse/YARN-1972
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Fix For: 2.6.0

 Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, 
 YARN-1972.delta.4.patch, YARN-1972.delta.5-branch-2.patch, 
 YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, YARN-1972.trunk.5.patch


 h1. Windows Secure Container Executor (WCE)
 YARN-1063 adds the necessary infrasturcture to launch a process as a domain 
 user as a solution for the problem of having a security boundary between 
 processes executed in YARN containers and the Hadoop services. The WCE is a 
 container executor that leverages the winutils capabilities introduced in 
 YARN-1063 and launches containers as an OS process running as the job 
 submitter user. A description of the S4U infrastructure used by YARN-1063 
 alternatives considered can be read on that JIRA.
 The WCE is based on the DefaultContainerExecutor. It relies on the DCE to 
 drive the flow of execution, but it overwrrides some emthods to the effect of:
 * change the DCE created user cache directories to be owned by the job user 
 and by the nodemanager group.
 * changes the actual container run command to use the 'createAsUser' command 
 of winutils task instead of 'create'
 * runs the localization as standalone process instead of an in-process Java 
 method call. This in turn relies on the winutil createAsUser feature to run 
 the localization as the job user.
  
 When compared to LinuxContainerExecutor (LCE), the WCE has some minor 
 differences:
 * it does no delegate the creation of the user cache directories to the 
 native implementation.
 * it does no require special handling to be able to delete user files
 The approach on the WCE came from a practical trial-and-error approach. I had 
 to iron out some issues around the Windows script shell limitations (command 
 line length) to get it to work, the biggest issue being the huge CLASSPATH 
 that is commonplace in Hadoop environment container executions. The job 
 container itself is already dealing with this via a so called 'classpath 
 jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch 
 as a separate container the same issue had to be resolved and I used the same 
 'classpath jar' approach.
 h2. Deployment Requirements
 To use the WCE one needs to set the 
 `yarn.nodemanager.container-executor.class` to 
 

[jira] [Commented] (YARN-2613) NMClient doesn't have retries for supporting rolling-upgrades

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156408#comment-14156408
 ] 

Hudson commented on YARN-2613:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1889 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1889/])
YARN-2613. Support retry in NMClient for rolling-upgrades. (Contributed by Jian 
He) (junping_du: rev 0708827a935d190d439854e08bb5a655d7daa606)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestNMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/ContainerManagementProtocolProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/NMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java
* hadoop-yarn-project/CHANGES.txt


 NMClient doesn't have retries for supporting rolling-upgrades
 -

 Key: YARN-2613
 URL: https://issues.apache.org/jira/browse/YARN-2613
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2613.1.patch, YARN-2613.2.patch, YARN-2613.3.patch


 While NM is rolling upgrade, client should retry NM until it comes up. This 
 jira is to add a NMProxy (similar to RMProxy) with retry implementation to 
 support rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2446) Using TimelineNamespace to shield the entities of a user

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156412#comment-14156412
 ] 

Hudson commented on YARN-2446:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1889 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1889/])
YARN-2446. Augmented Timeline service APIs to start taking in domains as a 
parameter while posting entities and events. Contributed by Zhijie Shen. 
(vinodkv: rev 9e40de6af7959ac7bb5f4e4d2833ca14ea457614)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServicesWithSSL.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/timeline/TestTimelineRecords.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelinePutResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestLeveldbTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TimelineStoreTestUtils.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineDataManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineACLsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/security/TestTimelineACLsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/MemoryTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineEntity.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java


 Using TimelineNamespace to shield the entities of a user
 

 Key: YARN-2446
 URL: https://issues.apache.org/jira/browse/YARN-2446
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.6.0

 Attachments: YARN-2446.1.patch, YARN-2446.2.patch, YARN-2446.3.patch


 Given YARN-2102 adds TimelineNamespace, we can make use of it to shield the 
 entities, preventing them from being accessed or affected by other users' 
 operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2562) ContainerId@toString() is unreadable for epoch 0 after YARN-2182

2014-10-02 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2562:
-
Attachment: YARN-2562.5-2.patch

 ContainerId@toString() is unreadable for epoch 0 after YARN-2182
 -

 Key: YARN-2562
 URL: https://issues.apache.org/jira/browse/YARN-2562
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, 
 YARN-2562.4.patch, YARN-2562.5-2.patch, YARN-2562.5.patch


 ContainerID string format is unreadable for RMs that restarted at least once 
 (epoch  0) after YARN-2182. For e.g, 
 container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2640) TestDirectoryCollection.testCreateDirectories failed

2014-10-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156461#comment-14156461
 ] 

Tsuyoshi OZAWA commented on YARN-2640:
--

[~hex108], thanks for your contribution. Can we close this jira as duplicated 
issue of YARN-1979?

 TestDirectoryCollection.testCreateDirectories failed
 

 Key: YARN-2640
 URL: https://issues.apache.org/jira/browse/YARN-2640
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-2640.2.patch, YARN-2640.patch


 When running test mvn test -Dtest=TestDirectoryCollection, it failed:
 {code}
 Running org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.538 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
   Time elapsed: 0.969 sec   FAILURE!
 java.lang.AssertionError: local dir parent not created with proper 
 permissions expected:rwxr-xr-x but was:rwxrwxr-x
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:104)
 {code}
 I found it was because testDiskSpaceUtilizationLimit ran before 
 testCreateDirectories when running test, then directory dirA was created in 
 test function testDiskSpaceUtilizationLimit. When testCreateDirectories tried 
 to create dirA with specified permission, it found dirA has already been 
 there and it did nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1979) TestDirectoryCollection fails when the umask is unusual

2014-10-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156462#comment-14156462
 ] 

Tsuyoshi OZAWA commented on YARN-1979:
--

[~djp], do you mind taking a look at latest patch? Some users report same issue 
like YARN-2640.

 TestDirectoryCollection fails when the umask is unusual
 ---

 Key: YARN-1979
 URL: https://issues.apache.org/jira/browse/YARN-1979
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-1979.2.patch, YARN-1979.txt


 I've seen this fail in Windows where the default permissions are matching up 
 to 700.
 {code}
 ---
 Test set: org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 ---
 Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.015 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
   Time elapsed: 0.422 sec   FAILURE!
 java.lang.AssertionError: local dir parent 
 Y:\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection\dirA
  not created with proper permissions expected:rwxr-xr-x but was:rwx--
 at org.junit.Assert.fail(Assert.java:93)
 at org.junit.Assert.failNotEquals(Assert.java:647)
 at org.junit.Assert.assertEquals(Assert.java:128)
 at 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:106)
 {code}
 The clash is between testDiskSpaceUtilizationLimit() and 
 testCreateDirectories().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRMRestart fails with FairScheduler

2014-10-02 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156490#comment-14156490
 ] 

Wei Yan commented on YARN-2635:
---

All tests passed locally. The TestDirectoryCollection failure looks related to 
YARN-1979, YARN-2640.

 TestRMRestart fails with FairScheduler
 --

 Key: YARN-2635
 URL: https://issues.apache.org/jira/browse/YARN-2635
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2635-1.patch


 If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
 TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2446) Using TimelineNamespace to shield the entities of a user

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156528#comment-14156528
 ] 

Hudson commented on YARN-2446:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1914 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1914/])
YARN-2446. Augmented Timeline service APIs to start taking in domains as a 
parameter while posting entities and events. Contributed by Zhijie Shen. 
(vinodkv: rev 9e40de6af7959ac7bb5f4e4d2833ca14ea457614)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/security/TestTimelineACLsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TimelineStoreTestUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/MemoryTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestLeveldbTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/timeline/TestTimelineRecords.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineEntity.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelinePutResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineACLsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServicesWithSSL.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineDataManager.java


 Using TimelineNamespace to shield the entities of a user
 

 Key: YARN-2446
 URL: https://issues.apache.org/jira/browse/YARN-2446
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.6.0

 Attachments: YARN-2446.1.patch, YARN-2446.2.patch, YARN-2446.3.patch


 Given YARN-2102 adds TimelineNamespace, we can make use of it to shield the 
 entities, preventing them from being accessed or affected by other users' 
 operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156537#comment-14156537
 ] 

Hudson commented on YARN-1972:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1914 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1914/])
YARN-1972. Added a secure container-executor for Windows. Contributed by Remus 
Rusanu. (vinodkv: rev ba7f31c2ee8d23ecb183f88920ef06053c0b9769)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* hadoop-yarn-project/CHANGES.txt
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerExecutor.java


 Implement secure Windows Container Executor
 ---

 Key: YARN-1972
 URL: https://issues.apache.org/jira/browse/YARN-1972
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Fix For: 2.6.0

 Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, 
 YARN-1972.delta.4.patch, YARN-1972.delta.5-branch-2.patch, 
 YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, YARN-1972.trunk.5.patch


 h1. Windows Secure Container Executor (WCE)
 YARN-1063 adds the necessary infrasturcture to launch a process as a domain 
 user as a solution for the problem of having a security boundary between 
 processes executed in YARN containers and the Hadoop services. The WCE is a 
 container executor that leverages the winutils capabilities introduced in 
 YARN-1063 and launches containers as an OS process running as the job 
 submitter user. A description of the S4U infrastructure used by YARN-1063 
 alternatives considered can be read on that JIRA.
 The WCE is based on the DefaultContainerExecutor. It relies on the DCE to 
 drive the flow of execution, but it overwrrides some emthods to the effect of:
 * change the DCE created user cache directories to be owned by the job user 
 and by the nodemanager group.
 * changes the actual container run command to use the 'createAsUser' command 
 of winutils task instead of 'create'
 * runs the localization as standalone process instead of an in-process Java 
 method call. This in turn relies on the winutil createAsUser feature to run 
 the localization as the job user.
  
 When compared to LinuxContainerExecutor (LCE), the WCE has some minor 
 differences:
 * it does no delegate the creation of the user cache directories to the 
 native implementation.
 * it does no require special handling to be able to delete user files
 The approach on the WCE came from a practical trial-and-error approach. I had 
 to iron out some issues around the Windows script shell limitations (command 
 line length) to get it to work, the biggest issue being the huge CLASSPATH 
 that is commonplace in Hadoop environment container executions. The job 
 container itself is already dealing with this via a so called 'classpath 
 jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch 
 as a separate container the same issue had to be resolved and I used the same 
 'classpath jar' approach.
 h2. Deployment Requirements
 To use the WCE one needs to set the 
 `yarn.nodemanager.container-executor.class` to 
 

[jira] [Commented] (YARN-2613) NMClient doesn't have retries for supporting rolling-upgrades

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156524#comment-14156524
 ] 

Hudson commented on YARN-2613:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1914 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1914/])
YARN-2613. Support retry in NMClient for rolling-upgrades. (Contributed by Jian 
He) (junping_du: rev 0708827a935d190d439854e08bb5a655d7daa606)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/ContainerManagementProtocolProxy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestNMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/NMProxy.java


 NMClient doesn't have retries for supporting rolling-upgrades
 -

 Key: YARN-2613
 URL: https://issues.apache.org/jira/browse/YARN-2613
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2613.1.patch, YARN-2613.2.patch, YARN-2613.3.patch


 While NM is rolling upgrade, client should retry NM until it comes up. This 
 jira is to add a NMProxy (similar to RMProxy) with retry implementation to 
 support rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156552#comment-14156552
 ] 

Hudson commented on YARN-1063:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1914 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1914/])
YARN-1063. Augmented Hadoop common winutils to have the ability to create 
containers as domain users. Contributed by Remus Rusanu. (vinodkv: rev 
5ca97f1e60b8a7848f6eadd15f6c08ed390a8cda)
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestWinUtils.java
* hadoop-common-project/hadoop-common/src/main/winutils/chown.c
* hadoop-common-project/hadoop-common/src/main/winutils/symlink.c
* hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c
* hadoop-common-project/hadoop-common/src/main/winutils/include/winutils.h
* hadoop-common-project/hadoop-common/src/main/winutils/task.c
* hadoop-yarn-project/CHANGES.txt


 Winutils needs ability to create task as domain user
 

 Key: YARN-1063
 URL: https://issues.apache.org/jira/browse/YARN-1063
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
 Environment: Windows
Reporter: Kyle Leckie
Assignee: Remus Rusanu
  Labels: security, windows
 Fix For: 2.6.0

 Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, 
 YARN-1063.5.patch, YARN-1063.6.patch, YARN-1063.patch


 h1. Summary:
 Securing a Hadoop cluster requires constructing some form of security 
 boundary around the processes executed in YARN containers. Isolation based on 
 Windows user isolation seems most feasible. This approach is similar to the 
 approach taken by the existing LinuxContainerExecutor. The current patch to 
 winutils.exe adds the ability to create a process as a domain user. 
 h1. Alternative Methods considered:
 h2. Process rights limited by security token restriction:
 On Windows access decisions are made by examining the security token of a 
 process. It is possible to spawn a process with a restricted security token. 
 Any of the rights granted by SIDs of the default token may be restricted. It 
 is possible to see this in action by examining the security tone of a 
 sandboxed process launch be a web browser. Typically the launched process 
 will have a fully restricted token and need to access machine resources 
 through a dedicated broker process that enforces a custom security policy. 
 This broker process mechanism would break compatibility with the typical 
 Hadoop container process. The Container process must be able to utilize 
 standard function calls for disk and network IO. I performed some work 
 looking at ways to ACL the local files to the specific launched without 
 granting rights to other processes launched on the same machine but found 
 this to be an overly complex solution. 
 h2. Relying on APP containers:
 Recent versions of windows have the ability to launch processes within an 
 isolated container. Application containers are supported for execution of 
 WinRT based executables. This method was ruled out due to the lack of 
 official support for standard windows APIs. At some point in the future 
 windows may support functionality similar to BSD jails or Linux containers, 
 at that point support for containers should be added.
 h1. Create As User Feature Description:
 h2. Usage:
 A new sub command was added to the set of task commands. Here is the syntax:
 winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE]
 Some notes:
 * The username specified is in the format of user@domain
 * The machine executing this command must be joined to the domain of the user 
 specified
 * The domain controller must allow the account executing the command access 
 to the user information. For this join the account to the predefined group 
 labeled Pre-Windows 2000 Compatible Access
 * The account running the command must have several rights on the local 
 machine. These can be managed manually using secpol.msc: 
 ** Act as part of the operating system - SE_TCB_NAME
 ** Replace a process-level token - SE_ASSIGNPRIMARYTOKEN_NAME
 ** Adjust memory quotas for a process - SE_INCREASE_QUOTA_NAME
 * The launched process will not have rights to the desktop so will not be 
 able to display any information or create UI.
 * The launched process will have no network credentials. Any access of 
 network resources that requires domain authentication will fail.
 h2. Implementation:
 Winutils performs the following steps:
 # Enable the required privileges for the current process.
 # Register as a trusted process with the Local Security Authority (LSA).
 # Create a new logon for the user passed on the command line.
 # Load/Create a profile on the local machine for the new logon.
 # Create a new 

[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156543#comment-14156543
 ] 

Hudson commented on YARN-2630:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1914 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1914/])
YARN-2630. Prevented previous AM container status from being acquired by the 
current restarted AM. Contributed by Jian He. (zjshen: rev 
52bbe0f11bc8e97df78a1ab9b63f4eff65fd7a76)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto


 TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
 -

 Key: YARN-2630
 URL: https://issues.apache.org/jira/browse/YARN-2630
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, 
 YARN-2630.4.patch


 The problem is that after YARN-1372, in work-preserving AM restart, the 
 re-launched AM will also receive previously failed AM container. But 
 DistributedShell logic is not expecting this extra completed container.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1979) TestDirectoryCollection fails when the umask is unusual

2014-10-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156618#comment-14156618
 ] 

Junping Du commented on YARN-1979:
--

Thanks [~ozawa] for reminding me on this. Yes. I do forget this JIRA.
+1. Committing it now. 

 TestDirectoryCollection fails when the umask is unusual
 ---

 Key: YARN-1979
 URL: https://issues.apache.org/jira/browse/YARN-1979
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-1979.2.patch, YARN-1979.txt


 I've seen this fail in Windows where the default permissions are matching up 
 to 700.
 {code}
 ---
 Test set: org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 ---
 Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.015 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
   Time elapsed: 0.422 sec   FAILURE!
 java.lang.AssertionError: local dir parent 
 Y:\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection\dirA
  not created with proper permissions expected:rwxr-xr-x but was:rwx--
 at org.junit.Assert.fail(Assert.java:93)
 at org.junit.Assert.failNotEquals(Assert.java:647)
 at org.junit.Assert.assertEquals(Assert.java:128)
 at 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:106)
 {code}
 The clash is between testDiskSpaceUtilizationLimit() and 
 testCreateDirectories().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields

2014-10-02 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2615:
-
Attachment: YARN-2615-v2.patch

In v2 patch,
- Fix test failures and audit warning.
- Add more tests for RMDelegationToken and TimelineDelegationToken.

 ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended 
 fields
 

 Key: YARN-2615
 URL: https://issues.apache.org/jira/browse/YARN-2615
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Attachments: YARN-2615-v2.patch, YARN-2615.patch


 As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier 
 and DelegationTokenIdentifier should also be updated in the same way to allow 
 fields get extended in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1979) TestDirectoryCollection fails when the umask is unusual

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156647#comment-14156647
 ] 

Hudson commented on YARN-1979:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #6174 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6174/])
YARN-1979. TestDirectoryCollection fails when the umask is unusual. 
(Contributed by Vinod Kumar Vavilapalli and Tsuyoshi OZAWA) (junping_du: rev 
c7cee9b4551918d5d35bf4e9dc73982a050c73ba)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java


 TestDirectoryCollection fails when the umask is unusual
 ---

 Key: YARN-1979
 URL: https://issues.apache.org/jira/browse/YARN-1979
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.7.0

 Attachments: YARN-1979.2.patch, YARN-1979.txt


 I've seen this fail in Windows where the default permissions are matching up 
 to 700.
 {code}
 ---
 Test set: org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 ---
 Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.015 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
   Time elapsed: 0.422 sec   FAILURE!
 java.lang.AssertionError: local dir parent 
 Y:\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection\dirA
  not created with proper permissions expected:rwxr-xr-x but was:rwx--
 at org.junit.Assert.fail(Assert.java:93)
 at org.junit.Assert.failNotEquals(Assert.java:647)
 at org.junit.Assert.assertEquals(Assert.java:128)
 at 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:106)
 {code}
 The clash is between testDiskSpaceUtilizationLimit() and 
 testCreateDirectories().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1979) TestDirectoryCollection fails when the umask is unusual

2014-10-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156653#comment-14156653
 ] 

Tsuyoshi OZAWA commented on YARN-1979:
--

Thanks Vinod for the contribution and Junping for the review!

 TestDirectoryCollection fails when the umask is unusual
 ---

 Key: YARN-1979
 URL: https://issues.apache.org/jira/browse/YARN-1979
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.7.0

 Attachments: YARN-1979.2.patch, YARN-1979.txt


 I've seen this fail in Windows where the default permissions are matching up 
 to 700.
 {code}
 ---
 Test set: org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 ---
 Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.015 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
   Time elapsed: 0.422 sec   FAILURE!
 java.lang.AssertionError: local dir parent 
 Y:\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection\dirA
  not created with proper permissions expected:rwxr-xr-x but was:rwx--
 at org.junit.Assert.fail(Assert.java:93)
 at org.junit.Assert.failNotEquals(Assert.java:647)
 at org.junit.Assert.assertEquals(Assert.java:128)
 at 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:106)
 {code}
 The clash is between testDiskSpaceUtilizationLimit() and 
 testCreateDirectories().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156651#comment-14156651
 ] 

Hadoop QA commented on YARN-2615:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672553/YARN-2615-v2.patch
  against trunk revision c7cee9b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5237//console

This message is automatically generated.

 ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended 
 fields
 

 Key: YARN-2615
 URL: https://issues.apache.org/jira/browse/YARN-2615
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Attachments: YARN-2615-v2.patch, YARN-2615.patch


 As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier 
 and DelegationTokenIdentifier should also be updated in the same way to allow 
 fields get extended in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields

2014-10-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156655#comment-14156655
 ] 

Tsuyoshi OZAWA commented on YARN-2615:
--

[~djp], currently, maybe the build about YARN looks broken on Jenkins CI. I 
faced same issue on YARN-2562.

 ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended 
 fields
 

 Key: YARN-2615
 URL: https://issues.apache.org/jira/browse/YARN-2615
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Attachments: YARN-2615-v2.patch, YARN-2615.patch


 As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier 
 and DelegationTokenIdentifier should also be updated in the same way to allow 
 fields get extended in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM

2014-10-02 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156761#comment-14156761
 ] 

Jian He commented on YARN-2617:
---

YARN-2640 seems resolved in  YARN-1979 already.

 NM does not need to send finished container whose APP is not running to RM
 --

 Key: YARN-2617
 URL: https://issues.apache.org/jira/browse/YARN-2617
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jun Gong
Assignee: Jun Gong
 Fix For: 2.6.0

 Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, 
 YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.6.patch, 
 YARN-2617.patch


 We([~chenchun]) are testing RM work preserving restart and found the 
 following logs when we ran a simple MapReduce task PI. NM continuously 
 reported completed containers whose Application had already finished while AM 
 had finished. 
 {code}
 2014-09-26 17:00:42,228 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 2014-09-26 17:00:42,228 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 2014-09-26 17:00:43,230 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 2014-09-26 17:00:43,230 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 2014-09-26 17:00:44,233 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 2014-09-26 17:00:44,233 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {code}
 In the patch for YARN-1372, ApplicationImpl on NM should guarantee to  clean 
 up already completed applications. But it will only remove appId from  
 'app.context.getApplications()' when ApplicaitonImpl received evnet 
 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might 
 receive this event for a long time or could not receive. 
 * For NonAggregatingLogHandler, it wait for 
 YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, 
 then it will be scheduled to delete Application logs and send the event.
 * For LogAggregationService, it might fail(e.g. if user does not have HDFS 
 write permission), and it will not send the event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156772#comment-14156772
 ] 

Hudson commented on YARN-2617:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6176 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6176/])
YARN-2617. Fixed NM to not send duplicate container status whose app is not 
running. Contributed by Jun Gong (jianhe: rev 
3ef1cf187faeb530e74606dd7113fd1ba08140d7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java


 NM does not need to send finished container whose APP is not running to RM
 --

 Key: YARN-2617
 URL: https://issues.apache.org/jira/browse/YARN-2617
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jun Gong
Assignee: Jun Gong
 Fix For: 2.6.0

 Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, 
 YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.6.patch, 
 YARN-2617.patch


 We([~chenchun]) are testing RM work preserving restart and found the 
 following logs when we ran a simple MapReduce task PI. NM continuously 
 reported completed containers whose Application had already finished while AM 
 had finished. 
 {code}
 2014-09-26 17:00:42,228 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 2014-09-26 17:00:42,228 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 2014-09-26 17:00:43,230 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 2014-09-26 17:00:43,230 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 2014-09-26 17:00:44,233 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 2014-09-26 17:00:44,233 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {code}
 In the patch for YARN-1372, ApplicationImpl on NM should guarantee to  clean 
 up already completed applications. But it will only remove appId from  
 'app.context.getApplications()' when ApplicaitonImpl received evnet 
 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might 
 receive this event for a long time or could not receive. 
 * For NonAggregatingLogHandler, it wait for 
 YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, 
 then it will be scheduled to delete Application logs and send the event.
 * For LogAggregationService, it might fail(e.g. if user does not have HDFS 
 write permission), and it will not send the event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2527) NPE in ApplicationACLsManager

2014-10-02 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated YARN-2527:
---
Attachment: YARN-2527.patch

Thanks for the code, [~zjshen]. 
I have updated the patch based on the comment.

 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: YARN-2527.patch, YARN-2527.patch, YARN-2527.patch


 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2527) NPE in ApplicationACLsManager

2014-10-02 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated YARN-2527:
---
Attachment: (was: YARN-2527.patch)

 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: YARN-2527.patch, YARN-2527.patch


 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2014-10-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156805#comment-14156805
 ] 

Karthik Kambatla commented on YARN-2624:


The patch looks good to me. Would like input from someone more familiar with 
the NM restart code. [~jlowe], [~djp] - can either of you take a look? We would 
like to get this committed soon. 

 Resource Localization fails on a cluster due to existing cache directories
 --

 Key: YARN-2624
 URL: https://issues.apache.org/jira/browse/YARN-2624
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Blocker
 Attachments: YARN-2624.001.patch, YARN-2624.001.patch


 We have found resource localization fails on a cluster with following error 
 in certain cases.
 {noformat}
 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { 
 hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
  1412027745352, FILE, null 
 },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
 java.io.IOException: Rename cannot overwrite non empty destination directory 
 /data/yarn/nm/filecache/27
   at 
 org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
   at 
 org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2254) TestRMWebServicesAppsModification should run against both CS and FS

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156803#comment-14156803
 ] 

Hudson commented on YARN-2254:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #6177 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6177/])
YARN-2254. TestRMWebServicesAppsModification should run against both CS and FS. 
(Zhihai Xu via kasha) (kasha: rev 5e0b49da9caa53814581508e589f3704592cf335)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java


 TestRMWebServicesAppsModification should run against both CS and FS
 ---

 Key: YARN-2254
 URL: https://issues.apache.org/jira/browse/YARN-2254
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor
  Labels: test
 Fix For: 2.7.0

 Attachments: YARN-2254.000.patch, YARN-2254.001.patch, 
 YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch


 TestRMWebServicesAppsModification skips the test, if the scheduler is not 
 CapacityScheduler.
 change TestRMWebServicesAppsModification to support both CapacityScheduler 
 and FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2527) NPE in ApplicationACLsManager

2014-10-02 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated YARN-2527:
---
Attachment: YARN-2527.patch

 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: YARN-2527.patch, YARN-2527.patch, YARN-2527.patch


 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs

2014-10-02 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156814#comment-14156814
 ] 

Siqi Li commented on YARN-1414:
---

Sure, I will submit a rebased patch shortly.

 with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
 -

 Key: YARN-1414
 URL: https://issues.apache.org/jira/browse/YARN-1414
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Affects Versions: 2.0.5-alpha
Reporter: Siqi Li
Assignee: Siqi Li
 Fix For: 2.2.0

 Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2014-10-02 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156824#comment-14156824
 ] 

Jason Lowe commented on YARN-2624:
--

Thanks for catching and fixing this, Anubhav!  My apologies for missing this 
scenario in the original JIRA.

+1 lgtm.  Committing this.


 Resource Localization fails on a cluster due to existing cache directories
 --

 Key: YARN-2624
 URL: https://issues.apache.org/jira/browse/YARN-2624
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Blocker
 Attachments: YARN-2624.001.patch, YARN-2624.001.patch


 We have found resource localization fails on a cluster with following error 
 in certain cases.
 {noformat}
 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { 
 hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
  1412027745352, FILE, null 
 },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
 java.io.IOException: Rename cannot overwrite non empty destination directory 
 /data/yarn/nm/filecache/27
   at 
 org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
   at 
 org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2014-10-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156835#comment-14156835
 ] 

Karthik Kambatla commented on YARN-2624:


Thanks for super-quick turnaround, Jason. 

 Resource Localization fails on a cluster due to existing cache directories
 --

 Key: YARN-2624
 URL: https://issues.apache.org/jira/browse/YARN-2624
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Blocker
 Attachments: YARN-2624.001.patch, YARN-2624.001.patch


 We have found resource localization fails on a cluster with following error 
 in certain cases.
 {noformat}
 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { 
 hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
  1412027745352, FILE, null 
 },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
 java.io.IOException: Rename cannot overwrite non empty destination directory 
 /data/yarn/nm/filecache/27
   at 
 org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
   at 
 org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRMRestart fails with FairScheduler

2014-10-02 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156836#comment-14156836
 ] 

Ray Chiang commented on YARN-2635:
--

Looks good to me.  Ran cleanly in my tree.  +1

 TestRMRestart fails with FairScheduler
 --

 Key: YARN-2635
 URL: https://issues.apache.org/jira/browse/YARN-2635
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2635-1.patch


 If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
 TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156841#comment-14156841
 ] 

Hudson commented on YARN-2624:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #6178 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6178/])
YARN-2624. Resource Localization fails on a cluster due to existing cache 
directories. Contributed by Anubhav Dhoot (jlowe: rev 
29f520052e2b02f44979980e446acc0dccd96d54)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java


 Resource Localization fails on a cluster due to existing cache directories
 --

 Key: YARN-2624
 URL: https://issues.apache.org/jira/browse/YARN-2624
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Blocker
 Attachments: YARN-2624.001.patch, YARN-2624.001.patch


 We have found resource localization fails on a cluster with following error 
 in certain cases.
 {noformat}
 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { 
 hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
  1412027745352, FILE, null 
 },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
 java.io.IOException: Rename cannot overwrite non empty destination directory 
 /data/yarn/nm/filecache/27
   at 
 org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
   at 
 org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2638) Let TestRM run with all types of schedulers (FIFO, Capacity, Fair)

2014-10-02 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156867#comment-14156867
 ] 

Ray Chiang commented on YARN-2638:
--

This patch fixes the test for me.  +1

 Let TestRM run with all types of schedulers (FIFO, Capacity, Fair)
 --

 Key: YARN-2638
 URL: https://issues.apache.org/jira/browse/YARN-2638
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2638-1.patch


 TestRM fails when using FairScheduler or FifoScheduler. The failures not 
 shown in trunk as the trunk uses the default capacity scheduler. We need to 
 let TestRM run with all types of schedulers, to make sure any new change 
 wouldn't break any scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2014-10-02 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156869#comment-14156869
 ] 

Anubhav Dhoot commented on YARN-2624:
-

Thanks [~jlowe]!

 Resource Localization fails on a cluster due to existing cache directories
 --

 Key: YARN-2624
 URL: https://issues.apache.org/jira/browse/YARN-2624
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2624.001.patch, YARN-2624.001.patch


 We have found resource localization fails on a cluster with following error 
 in certain cases.
 {noformat}
 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { 
 hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
  1412027745352, FILE, null 
 },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
 java.io.IOException: Rename cannot overwrite non empty destination directory 
 /data/yarn/nm/filecache/27
   at 
 org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
   at 
 org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2254) TestRMWebServicesAppsModification should run against both CS and FS

2014-10-02 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156888#comment-14156888
 ] 

zhihai xu commented on YARN-2254:
-

thanks [~kasha] for reviewing and committing the patch.

 TestRMWebServicesAppsModification should run against both CS and FS
 ---

 Key: YARN-2254
 URL: https://issues.apache.org/jira/browse/YARN-2254
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor
  Labels: test
 Fix For: 2.7.0

 Attachments: YARN-2254.000.patch, YARN-2254.001.patch, 
 YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch


 TestRMWebServicesAppsModification skips the test, if the scheduler is not 
 CapacityScheduler.
 change TestRMWebServicesAppsModification to support both CapacityScheduler 
 and FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156890#comment-14156890
 ] 

Hadoop QA commented on YARN-2527:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672583/YARN-2527.patch
  against trunk revision 5e0b49d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5238//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5238//console

This message is automatically generated.

 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: YARN-2527.patch, YARN-2527.patch, YARN-2527.patch


 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs

2014-10-02 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156892#comment-14156892
 ] 

Siqi Li commented on YARN-1414:
---

I just found out that this problem has been fixed in the trunk. I am going to 
close this jira

 with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
 -

 Key: YARN-1414
 URL: https://issues.apache.org/jira/browse/YARN-1414
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Affects Versions: 2.0.5-alpha
Reporter: Siqi Li
Assignee: Siqi Li
 Fix For: 2.2.0

 Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2639) TestClientToAMTokens should run with all types of schedulers

2014-10-02 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2639:
--
Attachment: YARN-2639-2.patch

re-trigger the jenkins

 TestClientToAMTokens should run with all types of schedulers
 

 Key: YARN-2639
 URL: https://issues.apache.org/jira/browse/YARN-2639
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2639-1.patch, YARN-2639-2.patch


 TestClientToAMTokens fails with FairScheduler now. We should let 
 TestClientToAMTokens run with all kinds of schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2180) In-memory backing store for cache manager

2014-10-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156905#comment-14156905
 ] 

Karthik Kambatla commented on YARN-2180:


Looks mostly good, but for these minor comments:
# App-checker and the store implementations aren't related:
## the app-checker config should be appended to SHARED_CACHE_PREFIX and 
IN_MEMORY_STORE
## the variable names should be updated accordingly.
## InMemorySCMStore#createAppCheckerService should move to a util class - how 
about changing SharedCacheStructureUtil to SharedCacheUtil and adding this 
method there? 
# Can we create a follow-up blocker sub-task to revisit all the config names 
before we include sharedcache work in a release? 


 In-memory backing store for cache manager
 -

 Key: YARN-2180
 URL: https://issues.apache.org/jira/browse/YARN-2180
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2180-trunk-v1.patch, YARN-2180-trunk-v2.patch, 
 YARN-2180-trunk-v3.patch, YARN-2180-trunk-v4.patch, YARN-2180-trunk-v5.patch, 
 YARN-2180-trunk-v6.patch


 Implement an in-memory backing store for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRMRestart fails with FairScheduler

2014-10-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156933#comment-14156933
 ] 

Karthik Kambatla commented on YARN-2635:


+1. Committing this. 

 TestRMRestart fails with FairScheduler
 --

 Key: YARN-2635
 URL: https://issues.apache.org/jira/browse/YARN-2635
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2635-1.patch


 If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
 TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2635) TestRMRestart should run with all schedulers

2014-10-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2635:
---
Summary: TestRMRestart should run with all schedulers  (was: TestRMRestart 
fails with FairScheduler)

 TestRMRestart should run with all schedulers
 

 Key: YARN-2635
 URL: https://issues.apache.org/jira/browse/YARN-2635
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2635-1.patch


 If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
 TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2638) TestRM should run with all schedulers

2014-10-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2638:
---
Summary: TestRM should run with all schedulers  (was: Let TestRM run with 
all types of schedulers (FIFO, Capacity, Fair))

 TestRM should run with all schedulers
 -

 Key: YARN-2638
 URL: https://issues.apache.org/jira/browse/YARN-2638
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2638-1.patch


 TestRM fails when using FairScheduler or FifoScheduler. The failures not 
 shown in trunk as the trunk uses the default capacity scheduler. We need to 
 let TestRM run with all types of schedulers, to make sure any new change 
 wouldn't break any scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2634) Test failure for TestClientRMTokens

2014-10-02 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-2634:
-

Assignee: Jian He

 Test failure for TestClientRMTokens
 ---

 Key: YARN-2634
 URL: https://issues.apache.org/jira/browse/YARN-2634
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Junping Du
Assignee: Jian He
Priority: Blocker

 The test get failed as below:
 {noformat}
 ---
 Test set: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
 ---
 Tests run: 6, Failures: 3, Errors: 2, Skipped: 0, Time elapsed: 60.184 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
 testShortCircuitRenewCancelDifferentHostSamePort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
   Time elapsed: 22.693 sec   FAILURE!
 java.lang.AssertionError: expected:getProxy but was:null
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostSamePort(TestClientRMTokens.java:272)
 testShortCircuitRenewCancelDifferentHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
   Time elapsed: 20.087 sec   FAILURE!
 java.lang.AssertionError: expected:getProxy but was:null
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostDifferentPort(TestClientRMTokens.java:283)
 testShortCircuitRenewCancel(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
   Time elapsed: 0.031 sec   ERROR!
 java.lang.NullPointerException: null
 at 
 org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:148)
 at 
 org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.renew(RMDelegationTokenIdentifier.java:101)
 at org.apache.hadoop.security.token.Token.renew(Token.java:377)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:309)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancel(TestClientRMTokens.java:241)
 testShortCircuitRenewCancelSameHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
   Time elapsed: 0.061 sec   FAILURE!
 java.lang.AssertionError: expected:getProxy but was:null
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelSameHostDifferentPort(TestClientRMTokens.java:261)
 testShortCircuitRenewCancelWildcardAddress(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
   Time elapsed: 0.07 sec   ERROR!
 java.lang.NullPointerException: null
 at org.apache.hadoop.net.NetUtils.isLocalAddress(NetUtils.java:684)
 at 
 org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:149)
   
   
1,1   Top
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields

2014-10-02 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156951#comment-14156951
 ] 

Jian He commented on YARN-2615:
---

looks good, only  few minor things:
- {{ClientToAMTokenIdentifierForTest}}, the same code overrides from 
{{ClientToAMTokenIdentifier}} may be removed ? similarly for 
{{RMDelegationTokenIdentifierForTest}}
-  this code can be removed.
{code}
byte[] tokenIdentifierContent = token.getIdentifier();
ClientToAMTokenIdentifier tokenIdentifier = new ClientToAMTokenIdentifier();
DataInputBuffer dib = new DataInputBuffer();
dib.reset(tokenIdentifierContent, tokenIdentifierContent.length);
tokenIdentifier.readFields(dib);
{code}



 ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended 
 fields
 

 Key: YARN-2615
 URL: https://issues.apache.org/jira/browse/YARN-2615
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Attachments: YARN-2615-v2.patch, YARN-2615.patch


 As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier 
 and DelegationTokenIdentifier should also be updated in the same way to allow 
 fields get extended in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRMRestart should run with all schedulers

2014-10-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156963#comment-14156963
 ] 

Karthik Kambatla commented on YARN-2635:


Just saw YARN-2638 as well. On second thought, it might be better to club the 
two JIRAs and implement a base class for RM tests that run against all 
schedulers.

And, schedulerType in these tests should probably be an enum so subclasses 
don't have to know the order.

 TestRMRestart should run with all schedulers
 

 Key: YARN-2635
 URL: https://issues.apache.org/jira/browse/YARN-2635
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2635-1.patch


 If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
 TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2639) TestClientToAMTokens should run with all types of schedulers

2014-10-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-2639.

Resolution: Duplicate

Can we fix this also as part of YARN-2635. 

 TestClientToAMTokens should run with all types of schedulers
 

 Key: YARN-2639
 URL: https://issues.apache.org/jira/browse/YARN-2639
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2639-1.patch, YARN-2639-2.patch


 TestClientToAMTokens fails with FairScheduler now. We should let 
 TestClientToAMTokens run with all kinds of schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2639) TestClientToAMTokens should run with all types of schedulers

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157012#comment-14157012
 ] 

Hadoop QA commented on YARN-2639:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672593/YARN-2639-2.patch
  against trunk revision 29f5200.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5239//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5239//console

This message is automatically generated.

 TestClientToAMTokens should run with all types of schedulers
 

 Key: YARN-2639
 URL: https://issues.apache.org/jira/browse/YARN-2639
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2639-1.patch, YARN-2639-2.patch


 TestClientToAMTokens fails with FairScheduler now. We should let 
 TestClientToAMTokens run with all kinds of schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-10-02 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1198:
--
Attachment: YARN-1198.11.patch

Attaching patch .11, this is based on .10 (nee .7), the preferred approach, 
with the a factoring change to decrease the impact - the HeadroomProvider is 
now limited to just the CapacityScheduler area / FiCaSchedulerApp.  It's 
actually possible to remove the HeadroomProvider altogether in favor of adding 
more members to the scheduler app, but I think it actually looks better 
factored this way (the functional result would be the same).

 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Craig Welch
 Attachments: YARN-1198.1.patch, YARN-1198.10.patch, 
 YARN-1198.11.patch, YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, 
 YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch, 
 YARN-1198.9.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157067#comment-14157067
 ] 

Hadoop QA commented on YARN-1198:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672614/YARN-1198.11.patch
  against trunk revision a56f3ec.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5240//console

This message is automatically generated.

 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Craig Welch
 Attachments: YARN-1198.1.patch, YARN-1198.10.patch, 
 YARN-1198.11.patch, YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, 
 YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch, 
 YARN-1198.9.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2634) Test failure for TestClientRMTokens

2014-10-02 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157079#comment-14157079
 ] 

Jian He commented on YARN-2634:
---

[~djp], I took latest trunk and ran locally, it actually passes. Would you mind 
checking again ? thx

 Test failure for TestClientRMTokens
 ---

 Key: YARN-2634
 URL: https://issues.apache.org/jira/browse/YARN-2634
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Junping Du
Assignee: Jian He
Priority: Blocker

 The test get failed as below:
 {noformat}
 ---
 Test set: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
 ---
 Tests run: 6, Failures: 3, Errors: 2, Skipped: 0, Time elapsed: 60.184 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
 testShortCircuitRenewCancelDifferentHostSamePort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
   Time elapsed: 22.693 sec   FAILURE!
 java.lang.AssertionError: expected:getProxy but was:null
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostSamePort(TestClientRMTokens.java:272)
 testShortCircuitRenewCancelDifferentHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
   Time elapsed: 20.087 sec   FAILURE!
 java.lang.AssertionError: expected:getProxy but was:null
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostDifferentPort(TestClientRMTokens.java:283)
 testShortCircuitRenewCancel(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
   Time elapsed: 0.031 sec   ERROR!
 java.lang.NullPointerException: null
 at 
 org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:148)
 at 
 org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.renew(RMDelegationTokenIdentifier.java:101)
 at org.apache.hadoop.security.token.Token.renew(Token.java:377)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:309)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancel(TestClientRMTokens.java:241)
 testShortCircuitRenewCancelSameHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
   Time elapsed: 0.061 sec   FAILURE!
 java.lang.AssertionError: expected:getProxy but was:null
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelSameHostDifferentPort(TestClientRMTokens.java:261)
 testShortCircuitRenewCancelWildcardAddress(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
   Time elapsed: 0.07 sec   ERROR!
 java.lang.NullPointerException: null
 at org.apache.hadoop.net.NetUtils.isLocalAddress(NetUtils.java:684)
 at 
 org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:149)
   
   
1,1   Top
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2408) Resource Request REST API for YARN

2014-10-02 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated YARN-2408:
-
Attachment: YARN-2408-5.patch

 Resource Request REST API for YARN
 --

 Key: YARN-2408
 URL: https://issues.apache.org/jira/browse/YARN-2408
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: webapp
Reporter: Renan DelValle
  Labels: features
 Attachments: YARN-2408-5.patch, YARN-2408.4.patch


 I’m proposing a new REST API for YARN which exposes a snapshot of the 
 Resource Requests that exist inside of the Scheduler. My motivation behind 
 this new feature is to allow external software to monitor the amount of 
 resources being requested to gain more insightful information into cluster 
 usage than is already provided. The API can also be used by external software 
 to detect a starved application and alert the appropriate users and/or sys 
 admin so that the problem may be remedied.
 Here is the proposed API (a JSON counterpart is also available):
 {code:xml}
 resourceRequests
   MB7680/MB
   VCores7/VCores
   appMaster
 applicationIdapplication_1412191664217_0001/applicationId
 
 applicationAttemptIdappattempt_1412191664217_0001_01/applicationAttemptId
 queueNamedefault/queueName
 totalMB6144/totalMB
 totalVCores6/totalVCores
 numResourceRequests3/numResourceRequests
 requests
   request
 MB1024/MB
 VCores1/VCores
 numContainers6/numContainers
 relaxLocalitytrue/relaxLocality
 priority20/priority
 resourceNames
   resourceNamelocalMachine/resourceName
   resourceName/default-rack/resourceName
   resourceName*/resourceName
 /resourceNames
   /request
 /requests
   /appMaster
   appMaster
   ...
   /appMaster
 /resourceRequests
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2408) Resource Request REST API for YARN

2014-10-02 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated YARN-2408:
-
Attachment: (was: YARN-2408-5.patch)

 Resource Request REST API for YARN
 --

 Key: YARN-2408
 URL: https://issues.apache.org/jira/browse/YARN-2408
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: webapp
Reporter: Renan DelValle
  Labels: features

 I’m proposing a new REST API for YARN which exposes a snapshot of the 
 Resource Requests that exist inside of the Scheduler. My motivation behind 
 this new feature is to allow external software to monitor the amount of 
 resources being requested to gain more insightful information into cluster 
 usage than is already provided. The API can also be used by external software 
 to detect a starved application and alert the appropriate users and/or sys 
 admin so that the problem may be remedied.
 Here is the proposed API (a JSON counterpart is also available):
 {code:xml}
 resourceRequests
   MB7680/MB
   VCores7/VCores
   appMaster
 applicationIdapplication_1412191664217_0001/applicationId
 
 applicationAttemptIdappattempt_1412191664217_0001_01/applicationAttemptId
 queueNamedefault/queueName
 totalMB6144/totalMB
 totalVCores6/totalVCores
 numResourceRequests3/numResourceRequests
 requests
   request
 MB1024/MB
 VCores1/VCores
 numContainers6/numContainers
 relaxLocalitytrue/relaxLocality
 priority20/priority
 resourceNames
   resourceNamelocalMachine/resourceName
   resourceName/default-rack/resourceName
   resourceName*/resourceName
 /resourceNames
   /request
 /requests
   /appMaster
   appMaster
   ...
   /appMaster
 /resourceRequests
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2408) Resource Request REST API for YARN

2014-10-02 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated YARN-2408:
-
Attachment: (was: YARN-2408.4.patch)

 Resource Request REST API for YARN
 --

 Key: YARN-2408
 URL: https://issues.apache.org/jira/browse/YARN-2408
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: webapp
Reporter: Renan DelValle
  Labels: features

 I’m proposing a new REST API for YARN which exposes a snapshot of the 
 Resource Requests that exist inside of the Scheduler. My motivation behind 
 this new feature is to allow external software to monitor the amount of 
 resources being requested to gain more insightful information into cluster 
 usage than is already provided. The API can also be used by external software 
 to detect a starved application and alert the appropriate users and/or sys 
 admin so that the problem may be remedied.
 Here is the proposed API (a JSON counterpart is also available):
 {code:xml}
 resourceRequests
   MB7680/MB
   VCores7/VCores
   appMaster
 applicationIdapplication_1412191664217_0001/applicationId
 
 applicationAttemptIdappattempt_1412191664217_0001_01/applicationAttemptId
 queueNamedefault/queueName
 totalMB6144/totalMB
 totalVCores6/totalVCores
 numResourceRequests3/numResourceRequests
 requests
   request
 MB1024/MB
 VCores1/VCores
 numContainers6/numContainers
 relaxLocalitytrue/relaxLocality
 priority20/priority
 resourceNames
   resourceNamelocalMachine/resourceName
   resourceName/default-rack/resourceName
   resourceName*/resourceName
 /resourceNames
   /request
 /requests
   /appMaster
   appMaster
   ...
   /appMaster
 /resourceRequests
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2468) Log handling for LRS

2014-10-02 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2468:

Attachment: YARN-2468.10.patch

 Log handling for LRS
 

 Key: YARN-2468
 URL: https://issues.apache.org/jira/browse/YARN-2468
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2468.1.patch, YARN-2468.10.patch, 
 YARN-2468.2.patch, YARN-2468.3.patch, YARN-2468.3.rebase.2.patch, 
 YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, YARN-2468.4.patch, 
 YARN-2468.5.1.patch, YARN-2468.5.1.patch, YARN-2468.5.2.patch, 
 YARN-2468.5.3.patch, YARN-2468.5.4.patch, YARN-2468.5.patch, 
 YARN-2468.6.1.patch, YARN-2468.6.patch, YARN-2468.7.1.patch, 
 YARN-2468.7.patch, YARN-2468.8.patch, YARN-2468.9.1.patch, YARN-2468.9.patch


 Currently, when application is finished, NM will start to do the log 
 aggregation. But for Long running service applications, this is not ideal. 
 The problems we have are:
 1) LRS applications are expected to run for a long time (weeks, months).
 2) Currently, all the container logs (from one NM) will be written into a 
 single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2468) Log handling for LRS

2014-10-02 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157133#comment-14157133
 ] 

Xuan Gong commented on YARN-2468:
-

new patch addressed all the comments

 Log handling for LRS
 

 Key: YARN-2468
 URL: https://issues.apache.org/jira/browse/YARN-2468
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2468.1.patch, YARN-2468.10.patch, 
 YARN-2468.2.patch, YARN-2468.3.patch, YARN-2468.3.rebase.2.patch, 
 YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, YARN-2468.4.patch, 
 YARN-2468.5.1.patch, YARN-2468.5.1.patch, YARN-2468.5.2.patch, 
 YARN-2468.5.3.patch, YARN-2468.5.4.patch, YARN-2468.5.patch, 
 YARN-2468.6.1.patch, YARN-2468.6.patch, YARN-2468.7.1.patch, 
 YARN-2468.7.patch, YARN-2468.8.patch, YARN-2468.9.1.patch, YARN-2468.9.patch


 Currently, when application is finished, NM will start to do the log 
 aggregation. But for Long running service applications, this is not ideal. 
 The problems we have are:
 1) LRS applications are expected to run for a long time (weeks, months).
 2) Currently, all the container logs (from one NM) will be written into a 
 single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2468) Log handling for LRS

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157199#comment-14157199
 ] 

Hadoop QA commented on YARN-2468:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672626/YARN-2468.10.patch
  against trunk revision a56f3ec.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5241//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5241//console

This message is automatically generated.

 Log handling for LRS
 

 Key: YARN-2468
 URL: https://issues.apache.org/jira/browse/YARN-2468
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2468.1.patch, YARN-2468.10.patch, 
 YARN-2468.2.patch, YARN-2468.3.patch, YARN-2468.3.rebase.2.patch, 
 YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, YARN-2468.4.patch, 
 YARN-2468.5.1.patch, YARN-2468.5.1.patch, YARN-2468.5.2.patch, 
 YARN-2468.5.3.patch, YARN-2468.5.4.patch, YARN-2468.5.patch, 
 YARN-2468.6.1.patch, YARN-2468.6.patch, YARN-2468.7.1.patch, 
 YARN-2468.7.patch, YARN-2468.8.patch, YARN-2468.9.1.patch, YARN-2468.9.patch


 Currently, when application is finished, NM will start to do the log 
 aggregation. But for Long running service applications, this is not ideal. 
 The problems we have are:
 1) LRS applications are expected to run for a long time (weeks, months).
 2) Currently, all the container logs (from one NM) will be written into a 
 single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-10-02 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1198:
--
Attachment: YARN-1198.11-with-1857.patch

Patch combining the last .11 with the latest 1857 patch, to make it easy to 
check them out together.  Tests changed/added for both issues are present and 
pass (unchanged)

 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Craig Welch
 Attachments: YARN-1198.1.patch, YARN-1198.10.patch, 
 YARN-1198.11-with-1857.patch, YARN-1198.11.patch, YARN-1198.2.patch, 
 YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, 
 YARN-1198.7.patch, YARN-1198.8.patch, YARN-1198.9.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157217#comment-14157217
 ] 

Hadoop QA commented on YARN-1198:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12672649/YARN-1198.11-with-1857.patch
  against trunk revision f679ca3.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5243//console

This message is automatically generated.

 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Craig Welch
 Attachments: YARN-1198.1.patch, YARN-1198.10.patch, 
 YARN-1198.11-with-1857.patch, YARN-1198.11.patch, YARN-1198.2.patch, 
 YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, 
 YARN-1198.7.patch, YARN-1198.8.patch, YARN-1198.9.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created

2014-10-02 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157234#comment-14157234
 ] 

Jason Lowe commented on YARN-2414:
--

Ran into this as well.  Any update, [~leftnoteasy]?

 RM web UI: app page will crash if app is failed before any attempt has been 
 created
 ---

 Key: YARN-2414
 URL: https://issues.apache.org/jira/browse/YARN-2414
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Zhijie Shen
Assignee: Wangda Tan

 {code}
 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/app/application_1407887030038_0001
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
   at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 Caused by: java.lang.NullPointerException
   at 
 

[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager

2014-10-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157245#comment-14157245
 ] 

Zhijie Shen commented on YARN-2527:
---

+1, will commit the patch

 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: YARN-2527.patch, YARN-2527.patch, YARN-2527.patch


 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRMRestart should run with all schedulers

2014-10-02 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157246#comment-14157246
 ] 

Ray Chiang commented on YARN-2635:
--

Tested TestRM/TestRMRestart/TestClientToAMTokens.  All three tests now pass 
cleanly using FairScheduler.  +1

 TestRMRestart should run with all schedulers
 

 Key: YARN-2635
 URL: https://issues.apache.org/jira/browse/YARN-2635
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2635-1.patch, YARN-2635-2.patch


 If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
 TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-10-02 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157248#comment-14157248
 ] 

Craig Welch commented on YARN-1680:
---

[~airbots] thanks for your updated WIP patch - I've not looked at it 
extensively yet, but at first glance it looks good to me.  On the original 
patch I noticed that there seems to be a facility for blacklisting racks as 
well as nodes, and I was concerned that that needed to be addressed as well.  
It may be in this patch, but it did not look like it to me.  I do think it can 
be without too much difficulty - I think putting the additions (and removals) 
into sets and then checking to see if the node's rack is in the set during the 
node iteration would do the trick (I may be off here, but that looks like it 
would work to me.)

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith
Assignee: Chen He
 Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
 YARN-1680-v2.patch, YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157265#comment-14157265
 ] 

Karthik Kambatla commented on YARN-1879:


Thanks for working on this, Tsuyoshi. Review comments on the latest patch:
# Are there cases when we don't want RetryCache enabled? IMO, we should always 
use the RetryCache (no harm). If we decide on having a config, the default 
should be true.
# I would set DEFAULT_RM_RETRY_CACHE_EXPIRY_MS to {{10 * 60 * 1000}} instead of 
60, and the corresponding comment (10 mins) can be removed or moved to the 
same line.
# TestApplicationMasterServiceRetryCache has a few lines longer than 80 chars. 


 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
 ---

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
 YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
 YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
 YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.2-wip.patch, 
 YARN-1879.2.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, 
 YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157269#comment-14157269
 ] 

Hudson commented on YARN-2527:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6182 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6182/])
YARN-2527. Fixed the potential NPE in ApplicationACLsManager and added test 
cases for it. Contributed by Benoy Antony. (zjshen: rev 
1c93025a1b370db46e345161dbc15e03f829823f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/security/ApplicationACLsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/server/security/TestApplicationACLsManager.java


 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Fix For: 2.6.0

 Attachments: YARN-2527.patch, YARN-2527.patch, YARN-2527.patch


 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-10-02 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157271#comment-14157271
 ] 

Craig Welch commented on YARN-1680:
---

[~john.jian.fang] I should probably not have referred to the cluster level 
adjustments as blacklisting.  What I see is a mechanism (state machine, 
events, including adding and removing nodes and the unhealthy state/the 
health monitor) that, I think, ultimately result in the 
CapacityScheduler.addNode() and removeNode() calls, which modify the 
clusterResource value.  In any case, the blacklisting functionality we are 
addressing here definitely looks to be application specific needs to be 
addressed at that level.  The issue isn't, so far as I know, related to any 
blacklisting/node health issues outside the one in play here, as those should 
work properly for headroom as they adjust the cluster resource.  The problem is 
that the application blacklist activity does not adjust the cluster resource 
and was previously not involved in the headroom calculation.  If it's not the 
case that cluster level adjustments are being made for nodes then this 
blacklisting will result in duplication among applications as they 
independently discover problems with nodes and blacklist them, but that is not 
a new characteristic of the way the system works.

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith
Assignee: Chen He
 Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
 YARN-1680-v2.patch, YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRMRestart should run with all schedulers

2014-10-02 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157274#comment-14157274
 ] 

Ray Chiang commented on YARN-2635:
--

Oops, pending Jenkins of course.

 TestRMRestart should run with all schedulers
 

 Key: YARN-2635
 URL: https://issues.apache.org/jira/browse/YARN-2635
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2635-1.patch, YARN-2635-2.patch


 If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
 TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-10-02 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157286#comment-14157286
 ] 

Craig Welch commented on YARN-1680:
---

This does bring up what I think could be an issue, I'm not sure if it was what 
you were getting at before or not, [~john.jian.fang], but we could well be 
introducing a new bug here unless we are careful.  I don't see any connection 
between the scheduler level resource adjustments and the application level 
adjustments, so if an application had problems with a node and blacklisted it, 
and then the cluster did, the resource value of the node would be effectively 
removed from the headroom 2x (once when the application adds it to it's new 
blacklist reduction, and a second time when the cluster removes it's value 
from the clusterResource).  I think this could be a problem, I think it could 
be addressed, but it's something to think about and I don't think the current 
approach addresses this- [~airbots], [~jlowe], thoughts?

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith
Assignee: Chen He
 Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
 YARN-1680-v2.patch, YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2468) Log handling for LRS

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157291#comment-14157291
 ] 

Hadoop QA commented on YARN-2468:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672626/YARN-2468.10.patch
  against trunk revision f679ca3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5244//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5244//console

This message is automatically generated.

 Log handling for LRS
 

 Key: YARN-2468
 URL: https://issues.apache.org/jira/browse/YARN-2468
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2468.1.patch, YARN-2468.10.patch, 
 YARN-2468.2.patch, YARN-2468.3.patch, YARN-2468.3.rebase.2.patch, 
 YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, YARN-2468.4.patch, 
 YARN-2468.5.1.patch, YARN-2468.5.1.patch, YARN-2468.5.2.patch, 
 YARN-2468.5.3.patch, YARN-2468.5.4.patch, YARN-2468.5.patch, 
 YARN-2468.6.1.patch, YARN-2468.6.patch, YARN-2468.7.1.patch, 
 YARN-2468.7.patch, YARN-2468.8.patch, YARN-2468.9.1.patch, YARN-2468.9.patch


 Currently, when application is finished, NM will start to do the log 
 aggregation. But for Long running service applications, this is not ideal. 
 The problems we have are:
 1) LRS applications are expected to run for a long time (weeks, months).
 2) Currently, all the container logs (from one NM) will be written into a 
 single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRMRestart should run with all schedulers

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157296#comment-14157296
 ] 

Hadoop QA commented on YARN-2635:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672637/YARN-2635-2.patch
  against trunk revision 6ac1051.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5242//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5242//console

This message is automatically generated.

 TestRMRestart should run with all schedulers
 

 Key: YARN-2635
 URL: https://issues.apache.org/jira/browse/YARN-2635
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2635-1.patch, YARN-2635-2.patch


 If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
 TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2598) GHS should show N/A instead of null for the inaccessible information

2014-10-02 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2598:
--
Attachment: YARN-2598.2.patch

Rebase the patch against the latest trunk

 GHS should show N/A instead of null for the inaccessible information
 

 Key: YARN-2598
 URL: https://issues.apache.org/jira/browse/YARN-2598
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2598.1.patch, YARN-2598.2.patch


 When the user doesn't have the access to an application, the app attempt 
 information is not visible to the user. ClientRMService will output N/A, but 
 GHS is showing null, which is not user-friendly.
 {code}
 14/09/24 22:07:20 INFO impl.TimelineClientImpl: Timeline service address: 
 http://nn.example.com:8188/ws/v1/timeline/
 14/09/24 22:07:20 INFO client.RMProxy: Connecting to ResourceManager at 
 nn.example.com/240.0.0.11:8050
 14/09/24 22:07:21 INFO client.AHSProxy: Connecting to Application History 
 server at nn.example.com/240.0.0.11:10200
 Application Report : 
   Application-Id : application_1411586934799_0001
   Application-Name : Sleep job
   Application-Type : MAPREDUCE
   User : hrt_qa
   Queue : default
   Start-Time : 1411586956012
   Finish-Time : 1411586989169
   Progress : 100%
   State : FINISHED
   Final-State : SUCCEEDED
   Tracking-URL : null
   RPC Port : -1
   AM Host : null
   Aggregate Resource Allocation : N/A
   Diagnostics : null
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157302#comment-14157302
 ] 

Steve Loughran commented on YARN-913:
-

Failing test is still the (believed unrelated) 
Running 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
Tests run: 11, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 379.565 sec 
 FAILURE! - in 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
testDSRestartWithPreviousRunningContainers(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 38.715 sec   FAILURE!
java.lang.AssertionError: client failed
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSRestartWithPreviousRunningContainers(TestDistributedShell.java:319)

 Add a way to register long-lived services in a YARN cluster
 ---

 Key: YARN-913
 URL: https://issues.apache.org/jira/browse/YARN-913
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Affects Versions: 2.5.0, 2.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
 YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
 YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
 YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
 YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
 YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
 YARN-913-016.patch, yarnregistry.pdf, yarnregistry.tla


 In a YARN cluster you can't predict where services will come up -or on what 
 ports. The services need to work those things out as they come up and then 
 publish them somewhere.
 Applications need to be able to find the service instance they are to bond to 
 -and not any others in the cluster.
 Some kind of service registry -in the RM, in ZK, could do this. If the RM 
 held the write access to the ZK nodes, it would be more secure than having 
 apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157311#comment-14157311
 ] 

Hudson commented on YARN-2628:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #6183 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6183/])
YARN-2628. Capacity scheduler with DominantResourceCalculator carries out 
reservation even though slots are free. Contributed by Varun Vasudev (jianhe: 
rev 054f28552687e9b9859c0126e16a2066e20ead3f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt


 Capacity scheduler with DominantResourceCalculator carries out reservation 
 even though slots are free
 -

 Key: YARN-2628
 URL: https://issues.apache.org/jira/browse/YARN-2628
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.5.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.6.0

 Attachments: apache-yarn-2628.0.patch, apache-yarn-2628.1.patch


 We've noticed that if you run the CapacityScheduler with the 
 DominantResourceCalculator, sometimes apps will end up with containers in a 
 reserved state even though free slots are available.
 The root cause seems to be this piece of code from CapacityScheduler.java -
 {noformat}
 // Try to schedule more if there are no reservations to fulfill
 if (node.getReservedContainer() == null) {
   if (Resources.greaterThanOrEqual(calculator, getClusterResource(),
   node.getAvailableResource(), minimumAllocation)) {
 if (LOG.isDebugEnabled()) {
   LOG.debug(Trying to schedule on node:  + node.getNodeName() +
   , available:  + node.getAvailableResource());
 }
 root.assignContainers(clusterResource, node, false);
   }
 } else {
   LOG.info(Skipping scheduling since node  + node.getNodeID() + 
is reserved by application  + 
   
 node.getReservedContainer().getContainerId().getApplicationAttemptId()
   );
 }
 {noformat}
 The code is meant to check if a node has any slots available for containers . 
 Since it uses the greaterThanOrEqual function, we end up in situation where 
 greaterThanOrEqual returns true, even though we may not have enough CPU or 
 memory to actually run the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs

2014-10-02 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157336#comment-14157336
 ] 

Sandy Ryza commented on YARN-1414:
--

Awesome

 with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
 -

 Key: YARN-1414
 URL: https://issues.apache.org/jira/browse/YARN-1414
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Affects Versions: 2.0.5-alpha
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2598) GHS should show N/A instead of null for the inaccessible information

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157349#comment-14157349
 ] 

Hadoop QA commented on YARN-2598:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672667/YARN-2598.2.patch
  against trunk revision 054f285.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5245//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5245//console

This message is automatically generated.

 GHS should show N/A instead of null for the inaccessible information
 

 Key: YARN-2598
 URL: https://issues.apache.org/jira/browse/YARN-2598
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2598.1.patch, YARN-2598.2.patch


 When the user doesn't have the access to an application, the app attempt 
 information is not visible to the user. ClientRMService will output N/A, but 
 GHS is showing null, which is not user-friendly.
 {code}
 14/09/24 22:07:20 INFO impl.TimelineClientImpl: Timeline service address: 
 http://nn.example.com:8188/ws/v1/timeline/
 14/09/24 22:07:20 INFO client.RMProxy: Connecting to ResourceManager at 
 nn.example.com/240.0.0.11:8050
 14/09/24 22:07:21 INFO client.AHSProxy: Connecting to Application History 
 server at nn.example.com/240.0.0.11:10200
 Application Report : 
   Application-Id : application_1411586934799_0001
   Application-Name : Sleep job
   Application-Type : MAPREDUCE
   User : hrt_qa
   Queue : default
   Start-Time : 1411586956012
   Finish-Time : 1411586989169
   Progress : 100%
   State : FINISHED
   Final-State : SUCCEEDED
   Tracking-URL : null
   RPC Port : -1
   AM Host : null
   Aggregate Resource Allocation : N/A
   Diagnostics : null
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager

2014-10-02 Thread Benoy Antony (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157431#comment-14157431
 ] 

Benoy Antony commented on YARN-2527:


Thanks a lot, [~zjshen].

 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Fix For: 2.6.0

 Attachments: YARN-2527.patch, YARN-2527.patch, YARN-2527.patch


 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2562) ContainerId@toString() is unreadable for epoch 0 after YARN-2182

2014-10-02 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2562:
-
Attachment: YARN-2562.5-4.patch

 ContainerId@toString() is unreadable for epoch 0 after YARN-2182
 -

 Key: YARN-2562
 URL: https://issues.apache.org/jira/browse/YARN-2562
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, 
 YARN-2562.4.patch, YARN-2562.5-2.patch, YARN-2562.5-4.patch, YARN-2562.5.patch


 ContainerID string format is unreadable for RMs that restarted at least once 
 (epoch  0) after YARN-2182. For e.g, 
 container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart

2014-10-02 Thread Santosh Marella (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157461#comment-14157461
 ] 

Santosh Marella commented on YARN-556:
--

Referencing YARN-2476 here to ensure the specific scenario mentioned there is 
fixed as part of this JIRA.

 RM Restart phase 2 - Work preserving restart
 

 Key: YARN-556
 URL: https://issues.apache.org/jira/browse/YARN-556
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: Work Preserving RM Restart.pdf, 
 WorkPreservingRestartPrototype.001.patch, YARN-1372.prelim.patch


 YARN-128 covered storing the state needed for the RM to recover critical 
 information. This umbrella jira will track changes needed to recover the 
 running state of the cluster so that work can be preserved across RM restarts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch 0 after YARN-2182

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157490#comment-14157490
 ] 

Hadoop QA commented on YARN-2562:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672691/YARN-2562.5-4.patch
  against trunk revision 054f285.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5246//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5246//console

This message is automatically generated.

 ContainerId@toString() is unreadable for epoch 0 after YARN-2182
 -

 Key: YARN-2562
 URL: https://issues.apache.org/jira/browse/YARN-2562
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, 
 YARN-2562.4.patch, YARN-2562.5-2.patch, YARN-2562.5-4.patch, YARN-2562.5.patch


 ContainerID string format is unreadable for RMs that restarted at least once 
 (epoch  0) after YARN-2182. For e.g, 
 container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRMRestart should run with all schedulers

2014-10-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157515#comment-14157515
 ] 

Karthik Kambatla commented on YARN-2635:


By the way, these tests take a long time to run. Do we want to run against all 
three schedulers? Or, would it be enough to run against CS and FS?

 TestRMRestart should run with all schedulers
 

 Key: YARN-2635
 URL: https://issues.apache.org/jira/browse/YARN-2635
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch


 If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
 TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-10-02 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157524#comment-14157524
 ] 

Craig Welch commented on YARN-1198:
---

FYI, it's not possible to call the getAndCalculateHeadroom because nothing can 
synchronize on the queue during the allocation call without deadlocking - this 
is why it's necessary to break out the headroom they way it is here and store 
some items (such as the LeafQueue.User, which comes from the usermanager and 
syncs on the queu) to avoid any synchronization on the queue itself during the 
final headroom calculation in the allocate/getHeadroom step.  It's not a bad 
thing to do anyway, to reduce the number of operations (somewhat) in that final 
headroom calculation - but it is also why we can't just call the 
getAndCalculateHeadroom as such (unchanged) in allocate()

 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Craig Welch
 Attachments: YARN-1198.1.patch, YARN-1198.10.patch, 
 YARN-1198.11-with-1857.patch, YARN-1198.11.patch, YARN-1198.2.patch, 
 YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, 
 YARN-1198.7.patch, YARN-1198.8.patch, YARN-1198.9.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2641) improve node decommission latency in RM.

2014-10-02 Thread zhihai xu (JIRA)
zhihai xu created YARN-2641:
---

 Summary: improve node decommission latency in RM.
 Key: YARN-2641
 URL: https://issues.apache.org/jira/browse/YARN-2641
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu


improve node decommission latency in RM. 
Currently the node decommission only happened after RM received nodeHeartbeat 
from the Node Manager. The node heartbeat interval is configurable. The default 
value is 1 second.
It will be better to do the decommission during RM Refresh(NodesListManager) 
instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2640) TestDirectoryCollection.testCreateDirectories failed

2014-10-02 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157570#comment-14157570
 ] 

Jun Gong commented on YARN-2640:


[~ozawa], thank you for telling me. Close it now.

 TestDirectoryCollection.testCreateDirectories failed
 

 Key: YARN-2640
 URL: https://issues.apache.org/jira/browse/YARN-2640
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-2640.2.patch, YARN-2640.patch


 When running test mvn test -Dtest=TestDirectoryCollection, it failed:
 {code}
 Running org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.538 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
   Time elapsed: 0.969 sec   FAILURE!
 java.lang.AssertionError: local dir parent not created with proper 
 permissions expected:rwxr-xr-x but was:rwxrwxr-x
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:104)
 {code}
 I found it was because testDiskSpaceUtilizationLimit ran before 
 testCreateDirectories when running test, then directory dirA was created in 
 test function testDiskSpaceUtilizationLimit. When testCreateDirectories tried 
 to create dirA with specified permission, it found dirA has already been 
 there and it did nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2612) Some completed containers are not reported to NM

2014-10-02 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong resolved YARN-2612.

Resolution: Duplicate

 Some completed containers are not reported to NM
 

 Key: YARN-2612
 URL: https://issues.apache.org/jira/browse/YARN-2612
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jun Gong
 Fix For: 2.6.0


 We are testing RM work preserving restart and found the following logs when 
 we ran a simple MapReduce task PI. Some completed containers which already 
 pulled by AM never reported back to NM, so NM continuously report the 
 completed containers while AM had finished. 
 {code}
 2014-09-26 17:00:42,228 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 2014-09-26 17:00:42,228 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 2014-09-26 17:00:43,230 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 2014-09-26 17:00:43,230 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 2014-09-26 17:00:44,233 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 2014-09-26 17:00:44,233 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {code}
 In YARN-1372, NM will report completed containers to RM until it gets ACK 
 from RM.  If AM does not call allocate, which means that AM does not ack RM, 
 RM will not ack NM. We([~chenchun]) have observed these two cases when 
 running Mapreduce task 'pi':
 1) RM sends completed containers to AM. After receiving it, AM thinks it has 
 done the work and does not need resource, so it does not call allocate.
 2) When AM finishes, it could not ack to RM because AM itself has not 
 finished yet.
 We think when RMAppAttempt call BaseFinalTransition, it means AppAttempt 
 finishes, then RM could send this AppAttempt's completed containers to NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRMRestart should run with all schedulers

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157579#comment-14157579
 ] 

Hadoop QA commented on YARN-2635:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672709/yarn-2635-3.patch
  against trunk revision 054f285.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5247//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5247//console

This message is automatically generated.

 TestRMRestart should run with all schedulers
 

 Key: YARN-2635
 URL: https://issues.apache.org/jira/browse/YARN-2635
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch


 If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
 TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >