[jira] [Commented] (YARN-2241) ZKRMStateStore: On startup, show nicer messages when znodes already exist

2014-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049665#comment-14049665
 ] 

Hadoop QA commented on YARN-2241:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653540/YARN-2241.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4174//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4174//console

This message is automatically generated.

 ZKRMStateStore: On startup, show nicer messages when znodes already exist
 -

 Key: YARN-2241
 URL: https://issues.apache.org/jira/browse/YARN-2241
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Robert Kanter
Assignee: Robert Kanter
Priority: Minor
 Attachments: YARN-2241.patch, YARN-2241.patch


 When using the RMZKStateStore, if you restart the RM, you get a bunch of 
 stack traces with messages like 
 {{org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists for /rmstore}}.  This is expected as these nodes already exist 
 from before.  We should catch these and print nicer messages.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2139) Add support for disk IO isolation/scheduling for containers

2014-07-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049758#comment-14049758
 ] 

Steve Loughran commented on YARN-2139:
--

looks a good first draft
# unless this does address HDFS, call out that this is local disk IO
# this is really disk io bandwidth, so it should use an option, like 
vlocaldiskIObandwidth. This will avoid confusion (make clear its not HDFS), and 
add scope for the addition of future options: IOPs and actual allocation of 
entire disks to containers
# what's the testability of this feature?

 Add support for disk IO isolation/scheduling for containers
 ---

 Key: YARN-2139
 URL: https://issues.apache.org/jira/browse/YARN-2139
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: Disk_IO_Scheduling_Design_1.pdf






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice

2014-07-02 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2246:
--

Component/s: webapp

 Job History Link in RM UI is redirecting to the URL which contains Job Id 
 twice
 ---

 Key: YARN-2246
 URL: https://issues.apache.org/jira/browse/YARN-2246
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 3.0.0, 0.23.11, 2.5.0
Reporter: Devaraj K
Assignee: Devaraj K
 Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch


 {code:xml}
 http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Moved] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice

2014-07-02 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen moved MAPREDUCE-4064 to YARN-2246:
--

  Component/s: (was: mrv2)
Affects Version/s: (was: 0.23.1)
   2.5.0
   0.23.11
   3.0.0
  Key: YARN-2246  (was: MAPREDUCE-4064)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 Job History Link in RM UI is redirecting to the URL which contains Job Id 
 twice
 ---

 Key: YARN-2246
 URL: https://issues.apache.org/jira/browse/YARN-2246
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.11, 2.5.0
Reporter: Devaraj K
Assignee: Devaraj K
 Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch


 {code:xml}
 http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice

2014-07-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049813#comment-14049813
 ] 

Zhijie Shen commented on YARN-2246:
---

Move the ticket to YARN, as the root cause sound like the YARN issue.

 Job History Link in RM UI is redirecting to the URL which contains Job Id 
 twice
 ---

 Key: YARN-2246
 URL: https://issues.apache.org/jira/browse/YARN-2246
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 3.0.0, 0.23.11, 2.5.0
Reporter: Devaraj K
Assignee: Devaraj K
 Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch


 {code:xml}
 http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2233) Implement web services to create, renew and cancel delegation tokens

2014-07-02 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2233:


Attachment: apache-yarn-2233.1.patch

{quote}
1.

bq. It should be noted that when cancelling a token, the token to be cancelled 
is specified by setting a header.

Any reason for specifying the token in head? If there's something 
non-intuitive, maybe we should have some in-code comments for other developers?
{quote}

I've added comments to the code explaining why this is. Jetty doesn't allow 
request bodies for DELETE methods.

{quote}
2. RPC get delegation token API doesn't have these fields, but it seems to be 
nice have. We may want to file a Jira.
{noformat}
+long currentExpiration = ident.getIssueDate() + tokenRenewInterval;
+long maxValidity = ident.getMaxDate();
{noformat}
{quote}

Fixed this. I've left the fields out for now to match the RPC response. I'll 
file tickets to add the information to both interfaces.

{quote}
3. Is it possible to reuse KerberosTestUtils in hadoop-auth?
{quote}

I missed this. hadoop-auth doesn't export test jars for us to use. I've changed 
the pom.xml to start generating test-jars for hadoop-auth and used 
KerberosTestUtils from there.

{quote}
4. Is this supposed to test invalid request body? It doesn't look like the 
invalid body construction in the later tests.
{noformat}
+response =
+resource().path(ws).path(v1).path(cluster)
+  .path(delegation-token).accept(contentType)
+  .entity(dtoken, mediaType).post(ClientResponse.class);
+assertEquals(Status.BAD_REQUEST, response.getClientResponseStatus());
{noformat}
{quote}

This is actually a test with the renewer missing from the request body, hence 
the BAD_REQUEST.

{quote}
1. No need of == ture.

{noformat}
+if (usePrincipal == true) {
{noformat}

Similarly,
{noformat}
+if (KerberosAuthenticationHandler.TYPE.equals(authType) == false) {
{noformat}
{quote}

Fixed.

{quote}
2. If I remember it correctly, callerUGI.doAs will throw 
UndeclaredThrowableException, which wraps the real raised exception. However, 
UndeclaredThrowableException is an RE, this code cannot capture it.
{noformat}
+try {
+  resp =
+  callerUGI
+.doAs(new PrivilegedExceptionActionGetDelegationTokenResponse() {
+  @Override
+  public GetDelegationTokenResponse run() throws IOException,
+  YarnException {
+GetDelegationTokenRequest createReq =
+GetDelegationTokenRequest.newInstance(renewer);
+return rm.getClientRMService().getDelegationToken(createReq);
+  }
+});
+} catch (Exception e) {
+  LOG.info(Create delegation token request failed, e);
+  throw e;
+}
{noformat}
{quote}

I'm unsure about this. RE is a sub-class of Exception. Why won't this code work?

{quote}
3. Cannot return respToken simply? The framework should generate OK status 
automatically, right?
{noformat}
+return Response.status(Status.OK).entity(respToken).build();
{noformat}
{quote}

There are a few cases where we need to send a FORBIDDEN response back and the 
GenericExceptionHandler doesn't return FORBIDDEN responses.

{quote}
4. You can call tk.decodeIdentifier directly.
{noformat}
+RMDelegationTokenIdentifier ident = new RMDelegationTokenIdentifier();
+ByteArrayInputStream buf = new ByteArrayInputStream(tk.getIdentifier());
+DataInputStream in = new DataInputStream(buf);
+ident.readFields(in);
{noformat}
{quote}

Fixed. Thanks for this, cleaned up bunch of boilerplate code.

 Implement web services to create, renew and cancel delegation tokens
 

 Key: YARN-2233
 URL: https://issues.apache.org/jira/browse/YARN-2233
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Attachments: apache-yarn-2233.0.patch, apache-yarn-2233.1.patch


 Implement functionality to create, renew and cancel delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2243) Order of arguments for Preconditions.checkNotNull() is wrong in SchedulerApplicationAttempt ctor

2014-07-02 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned YARN-2243:
---

Assignee: Devaraj K

Good catch [~yuzhih...@gmail.com].

 Order of arguments for Preconditions.checkNotNull() is wrong in 
 SchedulerApplicationAttempt ctor
 

 Key: YARN-2243
 URL: https://issues.apache.org/jira/browse/YARN-2243
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Devaraj K
Priority: Minor

 {code}
   public SchedulerApplicationAttempt(ApplicationAttemptId 
 applicationAttemptId, 
   String user, Queue queue, ActiveUsersManager activeUsersManager,
   RMContext rmContext) {
 Preconditions.checkNotNull(RMContext should not be null, rmContext);
 {code}
 Order of arguments is wrong for Preconditions.checkNotNull().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2243) Order of arguments for Preconditions.checkNotNull() is wrong in SchedulerApplicationAttempt ctor

2014-07-02 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-2243:


Attachment: YARN-2243.patch

Attaching trivial patch to fix this issue.

 Order of arguments for Preconditions.checkNotNull() is wrong in 
 SchedulerApplicationAttempt ctor
 

 Key: YARN-2243
 URL: https://issues.apache.org/jira/browse/YARN-2243
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Devaraj K
Priority: Minor
 Attachments: YARN-2243.patch


 {code}
   public SchedulerApplicationAttempt(ApplicationAttemptId 
 applicationAttemptId, 
   String user, Queue queue, ActiveUsersManager activeUsersManager,
   RMContext rmContext) {
 Preconditions.checkNotNull(RMContext should not be null, rmContext);
 {code}
 Order of arguments is wrong for Preconditions.checkNotNull().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2204) TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler

2014-07-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049836#comment-14049836
 ] 

Hudson commented on YARN-2204:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #601 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/601/])
YARN-2204. Explicitly enable vmem check in 
TestContainersMonitor#testContainerKillOnMemoryOverflow. (Anubhav Dhoot via 
kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1607231)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java


 TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler
 ---

 Key: YARN-2204
 URL: https://issues.apache.org/jira/browse/YARN-2204
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Robert Kanter
Assignee: Robert Kanter
Priority: Trivial
 Fix For: 2.5.0

 Attachments: YARN-2204.patch, YARN-2204_addendum.patch, 
 YARN-2204_addendum.patch


 TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-07-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049833#comment-14049833
 ] 

Hudson commented on YARN-2022:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #601 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/601/])
YARN-2022 Preempting an Application Master container can be kept as least 
priority when multiple applications are marked for preemption by 
ProportionalCapacityPreemptionPolicy (Sunil G via mayank) (mayank: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1607227)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 2.5.0

 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.10.patch, 
 YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, 
 YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, 
 Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1713) Implement getnewapplication and submitapp as part of RM web service

2014-07-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049834#comment-14049834
 ] 

Hudson commented on YARN-1713:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #601 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/601/])
YARN-1713. Added get-new-app and submit-app functionality to RM web services. 
Contributed by Varun Vasudev. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1607216)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/GenericExceptionHandler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ApplicationSubmissionContextInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ContainerLaunchContextInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CredentialsInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/LocalResourceInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NewApplication.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ResourceInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm


 Implement getnewapplication and submitapp as part of RM web service
 ---

 Key: YARN-1713
 URL: https://issues.apache.org/jira/browse/YARN-1713
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Fix For: 2.5.0

 Attachments: apache-yarn-1713.10.patch, apache-yarn-1713.3.patch, 
 apache-yarn-1713.4.patch, apache-yarn-1713.5.patch, apache-yarn-1713.6.patch, 
 apache-yarn-1713.7.patch, apache-yarn-1713.8.patch, apache-yarn-1713.9.patch, 
 apache-yarn-1713.cumulative.2.patch, apache-yarn-1713.cumulative.3.patch, 
 apache-yarn-1713.cumulative.4.patch, apache-yarn-1713.cumulative.patch, 
 apache-yarn-1713.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2242) Improve exception information on AM launch crashes

2014-07-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049839#comment-14049839
 ] 

Tsuyoshi OZAWA commented on YARN-2242:
--

[~zjshen], thank you for the notification. I agree with [~gtCarrera] - these 
patches can be helpful for users. And, I checked the patch and confirmed that 
these patches are bisect each other. Therefore, we can work separately. [~djp], 
could you also review YARN-2013? 

 Improve exception information on AM launch crashes
 --

 Key: YARN-2242
 URL: https://issues.apache.org/jira/browse/YARN-2242
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2242-070114-1.patch, YARN-2242-070114.patch


 Now on each time AM Container crashes during launch, both the console and the 
 webpage UI only report a ShellExitCodeExecption. This is not only unhelpful, 
 but sometimes confusing. With the help of log aggregator, container logs are 
 actually aggregated, and can be very helpful for debugging. One possible way 
 to improve the whole process is to send a pointer to the aggregated logs to 
 the programmer when reporting exception information. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2243) Order of arguments for Preconditions.checkNotNull() is wrong in SchedulerApplicationAttempt ctor

2014-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049838#comment-14049838
 ] 

Hadoop QA commented on YARN-2243:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653564/YARN-2243.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4175//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4175//console

This message is automatically generated.

 Order of arguments for Preconditions.checkNotNull() is wrong in 
 SchedulerApplicationAttempt ctor
 

 Key: YARN-2243
 URL: https://issues.apache.org/jira/browse/YARN-2243
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Devaraj K
Priority: Minor
 Attachments: YARN-2243.patch


 {code}
   public SchedulerApplicationAttempt(ApplicationAttemptId 
 applicationAttemptId, 
   String user, Queue queue, ActiveUsersManager activeUsersManager,
   RMContext rmContext) {
 Preconditions.checkNotNull(RMContext should not be null, rmContext);
 {code}
 Order of arguments is wrong for Preconditions.checkNotNull().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2013) The diagnostics is always the ExitCodeException stack when the container crashes

2014-07-02 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2013:
-

Attachment: YARN-2013.3-2.patch

Attached same patch for confirming that the patch doesn't have any conflicts. 

 The diagnostics is always the ExitCodeException stack when the container 
 crashes
 

 Key: YARN-2013
 URL: https://issues.apache.org/jira/browse/YARN-2013
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2013.1.patch, YARN-2013.2.patch, 
 YARN-2013.3-2.patch, YARN-2013.3.patch


 When a container crashes, ExitCodeException will be thrown from Shell. 
 Default/LinuxContainerExecutor captures the exception, put the exception 
 stack into the diagnostic. Therefore, the exception stack is always the same. 
 {code}
 String diagnostics = Exception from container-launch: \n
 + StringUtils.stringifyException(e) + \n + shExec.getOutput();
 container.handle(new ContainerDiagnosticsUpdateEvent(containerId,
 diagnostics));
 {code}
 In addition, it seems that the exception always has a empty message as 
 there's no message from stderr. Hence the diagnostics is not of much use for 
 users to analyze the reason of container crash.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2013) The diagnostics is always the ExitCodeException stack when the container crashes

2014-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049861#comment-14049861
 ] 

Hadoop QA commented on YARN-2013:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653571/YARN-2013.3-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4176//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4176//console

This message is automatically generated.

 The diagnostics is always the ExitCodeException stack when the container 
 crashes
 

 Key: YARN-2013
 URL: https://issues.apache.org/jira/browse/YARN-2013
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2013.1.patch, YARN-2013.2.patch, 
 YARN-2013.3-2.patch, YARN-2013.3.patch


 When a container crashes, ExitCodeException will be thrown from Shell. 
 Default/LinuxContainerExecutor captures the exception, put the exception 
 stack into the diagnostic. Therefore, the exception stack is always the same. 
 {code}
 String diagnostics = Exception from container-launch: \n
 + StringUtils.stringifyException(e) + \n + shExec.getOutput();
 container.handle(new ContainerDiagnosticsUpdateEvent(containerId,
 diagnostics));
 {code}
 In addition, it seems that the exception always has a empty message as 
 there's no message from stderr. Hence the diagnostics is not of much use for 
 users to analyze the reason of container crash.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-07-02 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049864#comment-14049864
 ] 

Sunil G commented on YARN-2022:
---

Thank you Mayank, Vinod and Wangda Tan for the reviews.

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 2.5.0

 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.10.patch, 
 YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, 
 YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, 
 Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-07-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049906#comment-14049906
 ] 

Hudson commented on YARN-2022:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1819 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1819/])
YARN-2022 Preempting an Application Master container can be kept as least 
priority when multiple applications are marked for preemption by 
ProportionalCapacityPreemptionPolicy (Sunil G via mayank) (mayank: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1607227)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 2.5.0

 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.10.patch, 
 YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, 
 YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, 
 Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1713) Implement getnewapplication and submitapp as part of RM web service

2014-07-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049907#comment-14049907
 ] 

Hudson commented on YARN-1713:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1819 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1819/])
YARN-1713. Added get-new-app and submit-app functionality to RM web services. 
Contributed by Varun Vasudev. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1607216)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/GenericExceptionHandler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ApplicationSubmissionContextInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ContainerLaunchContextInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CredentialsInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/LocalResourceInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NewApplication.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ResourceInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm


 Implement getnewapplication and submitapp as part of RM web service
 ---

 Key: YARN-1713
 URL: https://issues.apache.org/jira/browse/YARN-1713
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Fix For: 2.5.0

 Attachments: apache-yarn-1713.10.patch, apache-yarn-1713.3.patch, 
 apache-yarn-1713.4.patch, apache-yarn-1713.5.patch, apache-yarn-1713.6.patch, 
 apache-yarn-1713.7.patch, apache-yarn-1713.8.patch, apache-yarn-1713.9.patch, 
 apache-yarn-1713.cumulative.2.patch, apache-yarn-1713.cumulative.3.patch, 
 apache-yarn-1713.cumulative.4.patch, apache-yarn-1713.cumulative.patch, 
 apache-yarn-1713.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2204) TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler

2014-07-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049909#comment-14049909
 ] 

Hudson commented on YARN-2204:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1819 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1819/])
YARN-2204. Explicitly enable vmem check in 
TestContainersMonitor#testContainerKillOnMemoryOverflow. (Anubhav Dhoot via 
kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1607231)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java


 TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler
 ---

 Key: YARN-2204
 URL: https://issues.apache.org/jira/browse/YARN-2204
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Robert Kanter
Assignee: Robert Kanter
Priority: Trivial
 Fix For: 2.5.0

 Attachments: YARN-2204.patch, YARN-2204_addendum.patch, 
 YARN-2204_addendum.patch


 TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2243) Order of arguments for Preconditions.checkNotNull() is wrong in SchedulerApplicationAttempt ctor

2014-07-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049959#comment-14049959
 ] 

Ted Yu commented on YARN-2243:
--

Thanks for taking care of this.

 Order of arguments for Preconditions.checkNotNull() is wrong in 
 SchedulerApplicationAttempt ctor
 

 Key: YARN-2243
 URL: https://issues.apache.org/jira/browse/YARN-2243
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Devaraj K
Priority: Minor
 Attachments: YARN-2243.patch


 {code}
   public SchedulerApplicationAttempt(ApplicationAttemptId 
 applicationAttemptId, 
   String user, Queue queue, ActiveUsersManager activeUsersManager,
   RMContext rmContext) {
 Preconditions.checkNotNull(RMContext should not be null, rmContext);
 {code}
 Order of arguments is wrong for Preconditions.checkNotNull().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-07-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049974#comment-14049974
 ] 

Hudson commented on YARN-2022:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1792 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1792/])
YARN-2022 Preempting an Application Master container can be kept as least 
priority when multiple applications are marked for preemption by 
ProportionalCapacityPreemptionPolicy (Sunil G via mayank) (mayank: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1607227)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 2.5.0

 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.10.patch, 
 YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, 
 YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, 
 Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2204) TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler

2014-07-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049977#comment-14049977
 ] 

Hudson commented on YARN-2204:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1792 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1792/])
YARN-2204. Explicitly enable vmem check in 
TestContainersMonitor#testContainerKillOnMemoryOverflow. (Anubhav Dhoot via 
kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1607231)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java


 TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler
 ---

 Key: YARN-2204
 URL: https://issues.apache.org/jira/browse/YARN-2204
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Robert Kanter
Assignee: Robert Kanter
Priority: Trivial
 Fix For: 2.5.0

 Attachments: YARN-2204.patch, YARN-2204_addendum.patch, 
 YARN-2204_addendum.patch


 TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1713) Implement getnewapplication and submitapp as part of RM web service

2014-07-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049975#comment-14049975
 ] 

Hudson commented on YARN-1713:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1792 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1792/])
YARN-1713. Added get-new-app and submit-app functionality to RM web services. 
Contributed by Varun Vasudev. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1607216)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/GenericExceptionHandler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ApplicationSubmissionContextInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ContainerLaunchContextInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CredentialsInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/LocalResourceInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NewApplication.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ResourceInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm


 Implement getnewapplication and submitapp as part of RM web service
 ---

 Key: YARN-1713
 URL: https://issues.apache.org/jira/browse/YARN-1713
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Fix For: 2.5.0

 Attachments: apache-yarn-1713.10.patch, apache-yarn-1713.3.patch, 
 apache-yarn-1713.4.patch, apache-yarn-1713.5.patch, apache-yarn-1713.6.patch, 
 apache-yarn-1713.7.patch, apache-yarn-1713.8.patch, apache-yarn-1713.9.patch, 
 apache-yarn-1713.cumulative.2.patch, apache-yarn-1713.cumulative.3.patch, 
 apache-yarn-1713.cumulative.4.patch, apache-yarn-1713.cumulative.patch, 
 apache-yarn-1713.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2233) Implement web services to create, renew and cancel delegation tokens

2014-07-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050042#comment-14050042
 ] 

Zhijie Shen commented on YARN-2233:
---

Almost good to me. Just some nits:

1. This won't happen inside renewDelegationToken, as it is already validated 
before.
{code}
+if (tokenData.getToken().isEmpty()) {
+  throw new BadRequestException(Empty token in request);
+}
{code}

2. It seems that some of the fields in DelegationToken are no longer necessary.

3. assertValidToken seems not to be necessary.

 Implement web services to create, renew and cancel delegation tokens
 

 Key: YARN-2233
 URL: https://issues.apache.org/jira/browse/YARN-2233
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Attachments: apache-yarn-2233.0.patch, apache-yarn-2233.1.patch


 Implement functionality to create, renew and cancel delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used

2014-07-02 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2065:
-

Attachment: YARN-2065-003.patch

correction: this is the version that compiles against trunk.

I've tested this on the slider minicluster test that kills a container while 
the AM is down and verifies that hbase comes up to the right #of nodes ...this 
patch fixes it



 AM cannot create new containers after restart-NM token from previous attempt 
 used
 -

 Key: YARN-2065
 URL: https://issues.apache.org/jira/browse/YARN-2065
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Steve Loughran
Assignee: Jian He
 Attachments: YARN-2065-002.patch, YARN-2065-003.patch, 
 YARN-2065.1.patch


 Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot 
 create new containers.
 The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it 
 kills the AM, then kills a container while the AM is down, which triggers a 
 reallocation of a container, leading to this failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-675) In YarnClient, pull AM logs on AM container failure

2014-07-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050260#comment-14050260
 ] 

Zhijie Shen commented on YARN-675:
--

Sure, reassign it to you

 In YarnClient, pull AM logs on AM container failure
 ---

 Key: YARN-675
 URL: https://issues.apache.org/jira/browse/YARN-675
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Zhijie Shen

 Similar to MAPREDUCE-4362, when an AM container fails, it would be helpful to 
 pull its logs from the NM to the client so that they can be displayed 
 immediately to the user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-675) In YarnClient, pull AM logs on AM container failure

2014-07-02 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-675:
-

Assignee: Li Lu  (was: Zhijie Shen)

 In YarnClient, pull AM logs on AM container failure
 ---

 Key: YARN-675
 URL: https://issues.apache.org/jira/browse/YARN-675
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Li Lu

 Similar to MAPREDUCE-4362, when an AM container fails, it would be helpful to 
 pull its logs from the NM to the client so that they can be displayed 
 immediately to the user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2233) Implement web services to create, renew and cancel delegation tokens

2014-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050270#comment-14050270
 ] 

Hadoop QA commented on YARN-2233:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12653563/apache-yarn-2233.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-auth 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4177//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4177//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4177//console

This message is automatically generated.

 Implement web services to create, renew and cancel delegation tokens
 

 Key: YARN-2233
 URL: https://issues.apache.org/jira/browse/YARN-2233
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Attachments: apache-yarn-2233.0.patch, apache-yarn-2233.1.patch


 Implement functionality to create, renew and cancel delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used

2014-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050272#comment-14050272
 ] 

Hadoop QA commented on YARN-2065:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653606/YARN-2065-003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4178//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4178//console

This message is automatically generated.

 AM cannot create new containers after restart-NM token from previous attempt 
 used
 -

 Key: YARN-2065
 URL: https://issues.apache.org/jira/browse/YARN-2065
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Steve Loughran
Assignee: Jian He
 Attachments: YARN-2065-002.patch, YARN-2065-003.patch, 
 YARN-2065.1.patch


 Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot 
 create new containers.
 The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it 
 kills the AM, then kills a container while the AM is down, which triggers a 
 reallocation of a container, leading to this failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used

2014-07-02 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050286#comment-14050286
 ] 

Jian He commented on YARN-2065:
---

thanks for the testing, Steve!

 AM cannot create new containers after restart-NM token from previous attempt 
 used
 -

 Key: YARN-2065
 URL: https://issues.apache.org/jira/browse/YARN-2065
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Steve Loughran
Assignee: Jian He
 Attachments: YARN-2065-002.patch, YARN-2065-003.patch, 
 YARN-2065.1.patch


 Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot 
 create new containers.
 The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it 
 kills the AM, then kills a container while the AM is down, which triggers a 
 reallocation of a container, leading to this failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens

2014-07-02 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2211:


Attachment: YARN-2211.1.patch

 RMStateStore needs to save AMRMToken master key for recovery when RM 
 restart/failover happens 
 --

 Key: YARN-2211
 URL: https://issues.apache.org/jira/browse/YARN-2211
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2211.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2208) AMRMTokenManager need to have a way to roll over AMRMToken

2014-07-02 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2208:


Attachment: YARN-2208.2.patch

 AMRMTokenManager need to have a way to roll over AMRMToken
 --

 Key: YARN-2208
 URL: https://issues.apache.org/jira/browse/YARN-2208
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2208.1.patch, YARN-2208.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically

2014-07-02 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2212:


Attachment: YARN-2212.1.patch

 ApplicationMaster needs to find a way to update the AMRMToken periodically
 --

 Key: YARN-2212
 URL: https://issues.apache.org/jira/browse/YARN-2212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2212.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2237) MRAppMaster changes for AMRMToken roll-up

2014-07-02 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2237:


Attachment: YARN-2237.1.patch

 MRAppMaster changes for AMRMToken roll-up
 -

 Key: YARN-2237
 URL: https://issues.apache.org/jira/browse/YARN-2237
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2237.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2139) Add support for disk IO isolation/scheduling for containers

2014-07-02 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050301#comment-14050301
 ] 

Wei Yan commented on YARN-2139:
---

Thanks for the comments, [~ste...@apache.org].
For you mentioned HDFS read/write problem, we leave it solved by the network 
part, as we also need handle the hdfs replicate traffic. I agree that we should 
avoid confuction with HDFS ''fs''.

The idea of vdisks follows the vcores, where each physical cpu core is measured 
as some number of vcores. One concern about using real number is that users 
cannot specify their task requirements easily. One way may solve that is to 
provide several levels (low, moderate, high, etc) instead of real numbers. This 
is also similar to the discussions YARN-1024 on how to measure the cpu 
capacity. We can define the how many IOPs/bandwidth map to 1 vdisks.

For the testability, currently I have: (1) For fairshare, start several tasks 
with same operations, put them in a single node, and check their I/O 
performance whether follows fairsharing;  (2) I/O performance isolation for a 
given task, in a fully loaded cluster, we replay the given task several times, 
and verify when its I/O performance is stable. Here the task can do lots of 
local disk read and directly write operation, and the most time is used to do 
the I/O.
Any good testing ideas?



 Add support for disk IO isolation/scheduling for containers
 ---

 Key: YARN-2139
 URL: https://issues.apache.org/jira/browse/YARN-2139
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: Disk_IO_Scheduling_Design_1.pdf






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2208) AMRMTokenManager need to have a way to roll over AMRMToken

2014-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050380#comment-14050380
 ] 

Hadoop QA commented on YARN-2208:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653615/YARN-2208.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4179//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4179//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4179//console

This message is automatically generated.

 AMRMTokenManager need to have a way to roll over AMRMToken
 --

 Key: YARN-2208
 URL: https://issues.apache.org/jira/browse/YARN-2208
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2208.1.patch, YARN-2208.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2208) AMRMTokenManager need to have a way to roll over AMRMToken

2014-07-02 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2208:


Attachment: YARN-2208.3.patch

Fix findbugs and testcase failures

 AMRMTokenManager need to have a way to roll over AMRMToken
 --

 Key: YARN-2208
 URL: https://issues.apache.org/jira/browse/YARN-2208
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2208.1.patch, YARN-2208.2.patch, YARN-2208.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens

2014-07-02 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2211:


Attachment: YARN-2211.2.patch

 RMStateStore needs to save AMRMToken master key for recovery when RM 
 restart/failover happens 
 --

 Key: YARN-2211
 URL: https://issues.apache.org/jira/browse/YARN-2211
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2211.1.patch, YARN-2211.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used

2014-07-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050521#comment-14050521
 ] 

Steve Loughran commented on YARN-2065:
--

With Jenkins happy, I'm +1 on this patch; it fixes what it says it does

 AM cannot create new containers after restart-NM token from previous attempt 
 used
 -

 Key: YARN-2065
 URL: https://issues.apache.org/jira/browse/YARN-2065
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Steve Loughran
Assignee: Jian He
 Attachments: YARN-2065-002.patch, YARN-2065-003.patch, 
 YARN-2065.1.patch


 Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot 
 create new containers.
 The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it 
 kills the AM, then kills a container while the AM is down, which triggers a 
 reallocation of a container, leading to this failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2139) Add support for disk IO isolation/scheduling for containers

2014-07-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050530#comment-14050530
 ] 

Karthik Kambatla commented on YARN-2139:


bq. this is really disk io bandwidth, so it should use an option, like 
vlocaldiskIObandwidth. This will avoid confusion (make clear its not HDFS), and 
add scope for the addition of future options: IOPs and actual allocation of 
entire disks to containers

Good point. The document should probably discuss this in more detail. I think 
we should separate out the resource model used for requests and scheduling from 
the way we enforce it. 

For the former, I believe vdisks is a good candidate. Users find it hard to 
specify disk IO requirements in terms of IOPS and bandwidth; e.g. my MR task 
*needs* 200 MBps. vdisks, on the other hand, represent a share of the node and 
the IO parallelism (in a somewhat vague sense) the task can make use of. 
Furthermore, it is hard to guarantee a particular bandwidth or performance as 
they depend on the amount of parallelism and degree of randomness the disk 
accesses have. 

That said, I see value in making the enforcement pluggable. This JIRA could add 
the cgroups-based disk-share enforcment. In the future, we could explore other 
options. 

 Add support for disk IO isolation/scheduling for containers
 ---

 Key: YARN-2139
 URL: https://issues.apache.org/jira/browse/YARN-2139
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: Disk_IO_Scheduling_Design_1.pdf






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2175) Container localization has no timeouts and tasks can be stuck there for a long time

2014-07-02 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050533#comment-14050533
 ] 

Anubhav Dhoot commented on YARN-2175:
-

We have seen it happen when the source file system had issues. Some jobs would 
intermittently take a long time to fail and would succeed in rerun because the 
jars were put in a new distributed cache location when rerun. Without this 
timeout we have no lever to mitigate underlying HDFS/Hardware issues out in 
production until the root cause is identified and fixed. 
Also in comparison with the mapreduce.task.timeout this seems very focussed on 
a specific operation - localization. I would expect this timeout would be 
defaulted to a large value in production (say 30 min) and used only to mitigate 
when a issue occurs in production.

 Container localization has no timeouts and tasks can be stuck there for a 
 long time
 ---

 Key: YARN-2175
 URL: https://issues.apache.org/jira/browse/YARN-2175
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot

 There are no timeouts that can be used to limit the time taken by various 
 container startup operations. Localization for example could take a long time 
 and there is no automated way to kill an task if its stuck in these states. 
 These may have nothing to do with the task itself and could be an issue 
 within the platform.
 Ideally there should be configurable limits for various states within the 
 NodeManager to limit various states. The RM does not care about most of these 
 and its only between AM and the NM. We can start by making these global 
 configurable defaults and in future we can make it fancier by letting AM 
 override them in the start container request. 
 This jira will be used to limit localization time and we can open others if 
 we feel we need to limit other operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2175) Container localization has no timeouts and tasks can be stuck there for a long time

2014-07-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050539#comment-14050539
 ] 

Karthik Kambatla commented on YARN-2175:


I ll let Anubhav provide details on the particular instance we ran into this. 

bq. We should try to address the right individual problem with its solution 
before we put a band-aid that may still be useful for issues that we cannot 
just address directly if any.
Having worked on several MR1 production issues, I see your point. I agree we 
should look into and address individual problems. That said, I also believe in 
failsafes to avoid bringing down a production cluster or failing a critical job 
altogether in the face of hardware issues. That gives us time to fix the 
individual issues correctly when we encounter them, instead of hurrying for a 
hot fix. 

 Container localization has no timeouts and tasks can be stuck there for a 
 long time
 ---

 Key: YARN-2175
 URL: https://issues.apache.org/jira/browse/YARN-2175
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot

 There are no timeouts that can be used to limit the time taken by various 
 container startup operations. Localization for example could take a long time 
 and there is no automated way to kill an task if its stuck in these states. 
 These may have nothing to do with the task itself and could be an issue 
 within the platform.
 Ideally there should be configurable limits for various states within the 
 NodeManager to limit various states. The RM does not care about most of these 
 and its only between AM and the NM. We can start by making these global 
 configurable defaults and in future we can make it fancier by letting AM 
 override them in the start container request. 
 This jira will be used to limit localization time and we can open others if 
 we feel we need to limit other operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2175) Container localization has no timeouts and tasks can be stuck there for a long time

2014-07-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050543#comment-14050543
 ] 

Karthik Kambatla commented on YARN-2175:


In MR1, mapred.task.timeout handles localization as well and that has worked 
very well for our customers. Should we do the same for MR2 as well? 

 Container localization has no timeouts and tasks can be stuck there for a 
 long time
 ---

 Key: YARN-2175
 URL: https://issues.apache.org/jira/browse/YARN-2175
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot

 There are no timeouts that can be used to limit the time taken by various 
 container startup operations. Localization for example could take a long time 
 and there is no automated way to kill an task if its stuck in these states. 
 These may have nothing to do with the task itself and could be an issue 
 within the platform.
 Ideally there should be configurable limits for various states within the 
 NodeManager to limit various states. The RM does not care about most of these 
 and its only between AM and the NM. We can start by making these global 
 configurable defaults and in future we can make it fancier by letting AM 
 override them in the start container request. 
 This jira will be used to limit localization time and we can open others if 
 we feel we need to limit other operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2233) Implement web services to create, renew and cancel delegation tokens

2014-07-02 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050556#comment-14050556
 ] 

Varun Vasudev commented on YARN-2233:
-

The test case failure is due to YARN-2232. It fixes a bug that one of the test 
cases relies on.

 Implement web services to create, renew and cancel delegation tokens
 

 Key: YARN-2233
 URL: https://issues.apache.org/jira/browse/YARN-2233
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Attachments: apache-yarn-2233.0.patch, apache-yarn-2233.1.patch


 Implement functionality to create, renew and cancel delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used

2014-07-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050562#comment-14050562
 ] 

Hudson commented on YARN-2065:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5808 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5808/])
YARN-2065 AM cannot create new containers after restart (stevel: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1607441)
* /hadoop/common/trunk
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java


 AM cannot create new containers after restart-NM token from previous attempt 
 used
 -

 Key: YARN-2065
 URL: https://issues.apache.org/jira/browse/YARN-2065
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Steve Loughran
Assignee: Jian He
 Fix For: 2.5.0

 Attachments: YARN-2065-002.patch, YARN-2065-003.patch, 
 YARN-2065.1.patch


 Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot 
 create new containers.
 The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it 
 kills the AM, then kills a container while the AM is down, which triggers a 
 reallocation of a container, leading to this failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2233) Implement web services to create, renew and cancel delegation tokens

2014-07-02 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2233:


Attachment: apache-yarn-2233.2.patch

{quote}
1. This won't happen inside renewDelegationToken, as it is already validated 
before.
{noformat}
+if (tokenData.getToken().isEmpty()) {
+  throw new BadRequestException(Empty token in request);
+}
{noformat}

2. It seems that some of the fields in DelegationToken are no longer necessary.

3. assertValidToken seems not to be necessary.
{quote}

Fixed all 3. I also fixed the FindBug warnings that were caused.

 Implement web services to create, renew and cancel delegation tokens
 

 Key: YARN-2233
 URL: https://issues.apache.org/jira/browse/YARN-2233
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Attachments: apache-yarn-2233.0.patch, apache-yarn-2233.1.patch, 
 apache-yarn-2233.2.patch


 Implement functionality to create, renew and cancel delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2208) AMRMTokenManager need to have a way to roll over AMRMToken

2014-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050572#comment-14050572
 ] 

Hadoop QA commented on YARN-2208:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653638/YARN-2208.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4180//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4180//console

This message is automatically generated.

 AMRMTokenManager need to have a way to roll over AMRMToken
 --

 Key: YARN-2208
 URL: https://issues.apache.org/jira/browse/YARN-2208
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2208.1.patch, YARN-2208.2.patch, YARN-2208.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2240) yarn logs can get corrupted if the aggregator does not have permissions to the log file it tries to read

2014-07-02 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050580#comment-14050580
 ] 

Mit Desai commented on YARN-2240:
-

Aggregated Logs Comment

[~vinodkv], here is the error on which it fails.

{noformat}
2014-06-10 22:06:34,940 [LogAggregationService #1922] ERROR
logaggregation.AggregatedLogFormat: Error aggregating log file. Log file
:
/grid/0/tmp/yarn-logs/application_1401475649625_135179/container_1401475649625_135179_01_01/history.txt.appattempt_1401475649625_135179_01/grid/0/tmp/yarn-logs/application_1401475649625_135179/container_1401475649625_135179_01_01/history.txt.appattempt_1401475649625_135179_01
(Permission denied)
java.io.FileNotFoundException:
/grid/0/tmp/yarn-logs/application_1401475649625_135179/container_1401475649625_135179_01_01/history.txt.appattempt_1401475649625_135179_01
(Permission denied)
 at java.io.FileInputStream.open(Native Method)
 at java.io.FileInputStream.init(FileInputStream.java:138)
 at
org.apache.hadoop.io.SecureIOUtils.forceSecureOpenForRead(SecureIOUtils.java:215)
 at
org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:204)
 at
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogValue.write(AggregatedLogFormat.java:196)
 at
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:311)
 at
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:130)
 at
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:166)
 at
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:140)
 at
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$2.run(LogAggregationService.java:354)
 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
{noformat}


I managed to get into the logs and found that the length for the logs it was 
reporting was 111K and the corrupted aggregated would read something like this.
The portion of the aggregated logs where there is the problem is here.
{noformat}
[...]
LogType: history.txt.appattempt_1401475649625_135179_01
LogLength: 111686
Log Contents:
Error aggregating log file. Log file :
/grid/0/tmp/yarn-logs/application_1401475649625_135179/container_1401475649625_135179_01_01/history.txt.appattempt_1401475649625_135179_01/grid/0/tmp/yarn-logs/application_1401475649625_135179/container_1401475649625_135179_01_01/history.txt.appattempt_1401475649625_135179_01
(Permission
denied)stderr0!stderr_dag_1401475649625_135179_10stderr_dag_1401475649625_135179_1_post0stdout0!stdout_dag_1401475649625_135179_10stdout_dag_1401475649625_135179_1_post0syslog102042014-06-10
22:05:58,519 INFO [main] org.apache.tez.dag.app.DAGAppMaster: Created
DAGAppMaster for application appattempt_1401475649625_135179_01
[...]
{noformat}

 yarn logs can get corrupted if the aggregator does not have permissions to 
 the log file it tries to read
 

 Key: YARN-2240
 URL: https://issues.apache.org/jira/browse/YARN-2240
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Mit Desai

 When the log aggregator is aggregating the logs, it writes the file length 
 first. Then tries to open the log file and if it does not have permission to 
 do that, it ends up just writing an error message to the aggregated logs.
 The mismatch between the file length and the actual length here makes the 
 aggregated logs corrupted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2241) ZKRMStateStore: On startup, show nicer messages when znodes already exist

2014-07-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050621#comment-14050621
 ] 

Karthik Kambatla commented on YARN-2241:


+1

 ZKRMStateStore: On startup, show nicer messages when znodes already exist
 -

 Key: YARN-2241
 URL: https://issues.apache.org/jira/browse/YARN-2241
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Robert Kanter
Assignee: Robert Kanter
Priority: Minor
 Attachments: YARN-2241.patch, YARN-2241.patch


 When using the RMZKStateStore, if you restart the RM, you get a bunch of 
 stack traces with messages like 
 {{org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists for /rmstore}}.  This is expected as these nodes already exist 
 from before.  We should catch these and print nicer messages.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2247) Allow RM web services users to authenticate using delegation tokens

2014-07-02 Thread Varun Vasudev (JIRA)
Varun Vasudev created YARN-2247:
---

 Summary: Allow RM web services users to authenticate using 
delegation tokens
 Key: YARN-2247
 URL: https://issues.apache.org/jira/browse/YARN-2247
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev


The RM webapp should allow users to authenticate using delegation tokens to 
maintain parity with RPC.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2247) Allow RM web services users to authenticate using delegation tokens

2014-07-02 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2247:


Attachment: apache-yarn-2247.0.patch

 Allow RM web services users to authenticate using delegation tokens
 ---

 Key: YARN-2247
 URL: https://issues.apache.org/jira/browse/YARN-2247
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2247.0.patch


 The RM webapp should allow users to authenticate using delegation tokens to 
 maintain parity with RPC.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2241) ZKRMStateStore: On startup, show nicer messages if znodes already exist

2014-07-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2241:
---

Summary: ZKRMStateStore: On startup, show nicer messages if znodes already 
exist  (was: ZKRMStateStore: On startup, show nicer messages when znodes 
already exist)

 ZKRMStateStore: On startup, show nicer messages if znodes already exist
 ---

 Key: YARN-2241
 URL: https://issues.apache.org/jira/browse/YARN-2241
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Robert Kanter
Assignee: Robert Kanter
Priority: Minor
 Attachments: YARN-2241.patch, YARN-2241.patch


 When using the RMZKStateStore, if you restart the RM, you get a bunch of 
 stack traces with messages like 
 {{org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists for /rmstore}}.  This is expected as these nodes already exist 
 from before.  We should catch these and print nicer messages.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2241) ZKRMStateStore: On startup, show nicer messages if znodes already exist

2014-07-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050695#comment-14050695
 ] 

Hudson commented on YARN-2241:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5811 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5811/])
YARN-2241. ZKRMStateStore: On startup, show nicer messages if znodes already 
exist. (Robert Kanter via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1607473)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java


 ZKRMStateStore: On startup, show nicer messages if znodes already exist
 ---

 Key: YARN-2241
 URL: https://issues.apache.org/jira/browse/YARN-2241
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Robert Kanter
Assignee: Robert Kanter
Priority: Minor
 Fix For: 2.5.0

 Attachments: YARN-2241.patch, YARN-2241.patch


 When using the RMZKStateStore, if you restart the RM, you get a bunch of 
 stack traces with messages like 
 {{org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists for /rmstore}}.  This is expected as these nodes already exist 
 from before.  We should catch these and print nicer messages.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2248) Capacity Scheduler changes for moving apps between queues

2014-07-02 Thread Janos Matyas (JIRA)
Janos Matyas created YARN-2248:
--

 Summary: Capacity Scheduler changes for moving apps between queues
 Key: YARN-2248
 URL: https://issues.apache.org/jira/browse/YARN-2248
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Janos Matyas
Priority: Minor


We would like to have the capability (same as the Fair Scheduler has) to move 
applications between queues. 

We have made a baseline implementation and tests to start with - and we would 
like the community to review, come up with suggestions and finally have this 
contributed. 

The current implementation is available for 2.4.1 - so the first thing is that 
we'd need to identify the target version as there are differences between 2.4.* 
and 3.* interfaces.

The story behind is available at 
http://blog.sequenceiq.com/blog/2014/07/02/move-applications-between-queues/ 
and the baseline implementation and test at:

https://github.com/sequenceiq/hadoop-common/blob/branch-2.4.1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/a/ExtendedCapacityScheduler.java#L924

https://github.com/sequenceiq/hadoop-common/blob/branch-2.4.1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/a/TestExtendedCapacitySchedulerAppMove.java



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2248) Capacity Scheduler changes for moving apps between queues

2014-07-02 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050729#comment-14050729
 ] 

Vinod Kumar Vavilapalli commented on YARN-2248:
---

Do you mind attaching a patch against latest YARN trunk? Thanks..

 Capacity Scheduler changes for moving apps between queues
 -

 Key: YARN-2248
 URL: https://issues.apache.org/jira/browse/YARN-2248
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Janos Matyas
Assignee: Janos Matyas
Priority: Minor

 We would like to have the capability (same as the Fair Scheduler has) to move 
 applications between queues. 
 We have made a baseline implementation and tests to start with - and we would 
 like the community to review, come up with suggestions and finally have this 
 contributed. 
 The current implementation is available for 2.4.1 - so the first thing is 
 that we'd need to identify the target version as there are differences 
 between 2.4.* and 3.* interfaces.
 The story behind is available at 
 http://blog.sequenceiq.com/blog/2014/07/02/move-applications-between-queues/ 
 and the baseline implementation and test at:
 https://github.com/sequenceiq/hadoop-common/blob/branch-2.4.1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/a/ExtendedCapacityScheduler.java#L924
 https://github.com/sequenceiq/hadoop-common/blob/branch-2.4.1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/a/TestExtendedCapacitySchedulerAppMove.java



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2248) Capacity Scheduler changes for moving apps between queues

2014-07-02 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-2248:
-

Assignee: Janos Matyas

[~matyix], tx for opening this. Assigning it to you..

 Capacity Scheduler changes for moving apps between queues
 -

 Key: YARN-2248
 URL: https://issues.apache.org/jira/browse/YARN-2248
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Janos Matyas
Assignee: Janos Matyas
Priority: Minor

 We would like to have the capability (same as the Fair Scheduler has) to move 
 applications between queues. 
 We have made a baseline implementation and tests to start with - and we would 
 like the community to review, come up with suggestions and finally have this 
 contributed. 
 The current implementation is available for 2.4.1 - so the first thing is 
 that we'd need to identify the target version as there are differences 
 between 2.4.* and 3.* interfaces.
 The story behind is available at 
 http://blog.sequenceiq.com/blog/2014/07/02/move-applications-between-queues/ 
 and the baseline implementation and test at:
 https://github.com/sequenceiq/hadoop-common/blob/branch-2.4.1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/a/ExtendedCapacityScheduler.java#L924
 https://github.com/sequenceiq/hadoop-common/blob/branch-2.4.1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/a/TestExtendedCapacitySchedulerAppMove.java



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2232) ClientRMService doesn't allow delegation token owner to cancel their own token in secure mode

2014-07-02 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050738#comment-14050738
 ] 

Vinod Kumar Vavilapalli commented on YARN-2232:
---

Looks good, +1. Checking this in..

 ClientRMService doesn't allow delegation token owner to cancel their own 
 token in secure mode
 -

 Key: YARN-2232
 URL: https://issues.apache.org/jira/browse/YARN-2232
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2232.0.patch, apache-yarn-2232.1.patch, 
 apache-yarn-2232.2.patch


 The ClientRMSerivce doesn't allow delegation token owners to cancel their own 
 tokens. The root cause is this piece of code from the cancelDelegationToken 
 function -
 {noformat}
 String user = getRenewerForToken(token);
 ...
 private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) 
 throws IOException {
   UserGroupInformation user = UserGroupInformation.getCurrentUser();
   UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
   // we can always renew our own tokens
   return loginUser.getUserName().equals(user.getUserName())
   ? token.decodeIdentifier().getRenewer().toString()
   : user.getShortUserName();
 }
 {noformat}
 It ends up passing the user short name to the cancelToken function whereas 
 AbstractDelegationTokenSecretManager::cancelToken expects the full user name. 
 This bug occurs in secure mode and is not an issue with simple auth.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2232) ClientRMService doesn't allow delegation token owner to cancel their own token in secure mode

2014-07-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050753#comment-14050753
 ] 

Hudson commented on YARN-2232:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5812 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5812/])
YARN-2232. Fixed ResourceManager to allow DelegationToken owners to be able to 
cancel their own tokens in secure mode. Contributed by Varun Vasudev. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1607484)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java


 ClientRMService doesn't allow delegation token owner to cancel their own 
 token in secure mode
 -

 Key: YARN-2232
 URL: https://issues.apache.org/jira/browse/YARN-2232
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.5.0

 Attachments: apache-yarn-2232.0.patch, apache-yarn-2232.1.patch, 
 apache-yarn-2232.2.patch


 The ClientRMSerivce doesn't allow delegation token owners to cancel their own 
 tokens. The root cause is this piece of code from the cancelDelegationToken 
 function -
 {noformat}
 String user = getRenewerForToken(token);
 ...
 private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) 
 throws IOException {
   UserGroupInformation user = UserGroupInformation.getCurrentUser();
   UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
   // we can always renew our own tokens
   return loginUser.getUserName().equals(user.getUserName())
   ? token.decodeIdentifier().getRenewer().toString()
   : user.getShortUserName();
 }
 {noformat}
 It ends up passing the user short name to the cancelToken function whereas 
 AbstractDelegationTokenSecretManager::cancelToken expects the full user name. 
 This bug occurs in secure mode and is not an issue with simple auth.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2242) Improve exception information on AM launch crashes

2014-07-02 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2242:


Attachment: YARN-2242-070115.patch

Hi [~djp], in this new patch I modified an existing unit test, on testing 
allocation time AM crashes, to verify correct information has been added to the 
diagnostic information. I added them in a new private method that may be reused 
in future to verify more diagnostic information. For now, I'm testing the 
diagnosis contains the proxy URL, and mentions logs to users. 

 Improve exception information on AM launch crashes
 --

 Key: YARN-2242
 URL: https://issues.apache.org/jira/browse/YARN-2242
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2242-070114-1.patch, YARN-2242-070114.patch, 
 YARN-2242-070115.patch


 Now on each time AM Container crashes during launch, both the console and the 
 webpage UI only report a ShellExitCodeExecption. This is not only unhelpful, 
 but sometimes confusing. With the help of log aggregator, container logs are 
 actually aggregated, and can be very helpful for debugging. One possible way 
 to improve the whole process is to send a pointer to the aggregated logs to 
 the programmer when reporting exception information. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2242) Improve exception information on AM launch crashes

2014-07-02 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2242:


Attachment: YARN-2242-070115-1.patch

Better interface design for the private verification function on diagnostic 
information in the unit test. Now this function only takes the diagnostic info 
in. 

 Improve exception information on AM launch crashes
 --

 Key: YARN-2242
 URL: https://issues.apache.org/jira/browse/YARN-2242
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2242-070114-1.patch, YARN-2242-070114.patch, 
 YARN-2242-070115-1.patch, YARN-2242-070115.patch


 Now on each time AM Container crashes during launch, both the console and the 
 webpage UI only report a ShellExitCodeExecption. This is not only unhelpful, 
 but sometimes confusing. With the help of log aggregator, container logs are 
 actually aggregated, and can be very helpful for debugging. One possible way 
 to improve the whole process is to send a pointer to the aggregated logs to 
 the programmer when reporting exception information. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-07-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050784#comment-14050784
 ] 

Hudson commented on YARN-2022:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5813 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5813/])
YARN-2022. Fixing CHANGES.txt to be correctly placed. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1607486)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 2.5.0

 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.10.patch, 
 YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, 
 YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, 
 Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2242) Improve exception information on AM launch crashes

2014-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050826#comment-14050826
 ] 

Hadoop QA commented on YARN-2242:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12653699/YARN-2242-070115.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4181//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4181//console

This message is automatically generated.

 Improve exception information on AM launch crashes
 --

 Key: YARN-2242
 URL: https://issues.apache.org/jira/browse/YARN-2242
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2242-070114-1.patch, YARN-2242-070114.patch, 
 YARN-2242-070115-1.patch, YARN-2242-070115.patch


 Now on each time AM Container crashes during launch, both the console and the 
 webpage UI only report a ShellExitCodeExecption. This is not only unhelpful, 
 but sometimes confusing. With the help of log aggregator, container logs are 
 actually aggregated, and can be very helpful for debugging. One possible way 
 to improve the whole process is to send a pointer to the aggregated logs to 
 the programmer when reporting exception information. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2242) Improve exception information on AM launch crashes

2014-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050838#comment-14050838
 ] 

Hadoop QA commented on YARN-2242:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12653703/YARN-2242-070115-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4182//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4182//console

This message is automatically generated.

 Improve exception information on AM launch crashes
 --

 Key: YARN-2242
 URL: https://issues.apache.org/jira/browse/YARN-2242
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2242-070114-1.patch, YARN-2242-070114.patch, 
 YARN-2242-070115-1.patch, YARN-2242-070115.patch


 Now on each time AM Container crashes during launch, both the console and the 
 webpage UI only report a ShellExitCodeExecption. This is not only unhelpful, 
 but sometimes confusing. With the help of log aggregator, container logs are 
 actually aggregated, and can be very helpful for debugging. One possible way 
 to improve the whole process is to send a pointer to the aggregated logs to 
 the programmer when reporting exception information. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2229) Making ContainerId long type

2014-07-02 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2229:
-

Attachment: (was: YARN-2229.3.patch)

 Making ContainerId long type
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.2.patch, YARN-2229.2.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2249) AM resync may send container release request before container is actually recovered

2014-07-02 Thread Jian He (JIRA)
Jian He created YARN-2249:
-

 Summary: AM resync may send container release request before 
container is actually recovered
 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He


AM resync on RM restart will send outstanding resource requests, container 
release list etc. back to the new RM. It is possible that container release 
request is processed before the container is actually recovered.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2249) AM resync may send container release request before container is actually recovered

2014-07-02 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2249:
--

Description: AM resync on RM restart will send outstanding resource 
requests, container release list etc. back to the new RM. It is possible that 
RM receives the container release request  before the container is actually 
recovered.(was: AM resync on RM restart will send outstanding resource 
requests, container release list etc. back to the new RM. It is possible that 
container release request is processed before the container is actually 
recovered.  )

 AM resync may send container release request before container is actually 
 recovered
 ---

 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He

 AM resync on RM restart will send outstanding resource requests, container 
 release list etc. back to the new RM. It is possible that RM receives the 
 container release request  before the container is actually recovered.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2249) RM may receive container release request on AM resync before container is actually recovered

2014-07-02 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2249:
--

Summary: RM may receive container release request on AM resync before 
container is actually recovered  (was: AM resync may send container release 
request before container is actually recovered)

 RM may receive container release request on AM resync before container is 
 actually recovered
 

 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He

 AM resync on RM restart will send outstanding resource requests, container 
 release list etc. back to the new RM. It is possible that RM receives the 
 container release request  before the container is actually recovered.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue

2014-07-02 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2069:


Attachment: YARN-2069-trunk-3.patch

Rebasing and Updating the patch.

Thanks,
Mayank

 Add cross-user preemption within CapacityScheduler's leaf-queue
 ---

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch


 Preemption today only works across queues and moves around resources across 
 queues per demand and usage. We should also have user-level preemption within 
 a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-07-02 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050913#comment-14050913
 ] 

Jian He commented on YARN-1366:
---

looks good, +1

 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.10.patch, 
 YARN-1366.11.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, 
 YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.8.patch, 
 YARN-1366.9.patch, YARN-1366.patch, YARN-1366.prototype.patch, 
 YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2229) Making ContainerId long type

2014-07-02 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2229:
-

Attachment: YARN-2229.3.patch

 Making ContainerId long type
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.2.patch, YARN-2229.2.patch, 
 YARN-2229.3.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2229) Making ContainerId long type

2014-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050933#comment-14050933
 ] 

Hadoop QA commented on YARN-2229:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653729/YARN-2229.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4184//console

This message is automatically generated.

 Making ContainerId long type
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.2.patch, YARN-2229.2.patch, 
 YARN-2229.3.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue

2014-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050947#comment-14050947
 ] 

Hadoop QA commented on YARN-2069:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12653724/YARN-2069-trunk-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4183//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4183//console

This message is automatically generated.

 Add cross-user preemption within CapacityScheduler's leaf-queue
 ---

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch


 Preemption today only works across queues and moves around resources across 
 queues per demand and usage. We should also have user-level preemption within 
 a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2229) Making ContainerId long type

2014-07-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050961#comment-14050961
 ] 

Tsuyoshi OZAWA commented on YARN-2229:
--

[~jianhe], can you check whether trunk code with v3 patch can compile or not? 
It works on my local. It might be Jenkins CI problem.

 Making ContainerId long type
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.2.patch, YARN-2229.2.patch, 
 YARN-2229.3.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-07-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050965#comment-14050965
 ] 

Bikas Saha commented on YARN-1366:
--

Why are we returning the old allocateResponse to the user? What is the user 
expected to do with this allocateResponse that has a RESYNC command in it? 
Should we make a second call to allocate (after re-registering) and then send 
that response back up to the user?
{code}+// re register with RM
+registerApplicationMaster();
+return allocateResponse;
+  }{code}

There needs to be some clear documentation that if the user has not removed 
container requests that have already been satisfied, then the re-register may 
end up sending the entire ask list to the RM (including matched requests). 
Which would mean the RM could end up giving it a lot of new allocated 
containers.

 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.10.patch, 
 YARN-1366.11.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, 
 YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.8.patch, 
 YARN-1366.9.patch, YARN-1366.patch, YARN-1366.prototype.patch, 
 YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-07-02 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050975#comment-14050975
 ] 

Jian He commented on YARN-1366:
---

bq. Should we make a second call to allocate (after re-registering) and then 
send that response back up to the user?
Followup jira YARN-2209 will replace the resync command with a proper 
exception, in which case the returned allocate response should be null.
bq. There needs to be some clear documentation
make sense, Rohith, can you add some documentation about this , thx!


 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.10.patch, 
 YARN-1366.11.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, 
 YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.8.patch, 
 YARN-1366.9.patch, YARN-1366.patch, YARN-1366.prototype.patch, 
 YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-07-02 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050977#comment-14050977
 ] 

Jian He commented on YARN-1366:
---

bq. in which case the returned allocate response should be null
I meant an empty response..

 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.10.patch, 
 YARN-1366.11.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, 
 YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.8.patch, 
 YARN-1366.9.patch, YARN-1366.patch, YARN-1366.prototype.patch, 
 YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-07-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050978#comment-14050978
 ] 

Bikas Saha commented on YARN-1366:
--

Does a null response make sense for the user?

 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.10.patch, 
 YARN-1366.11.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, 
 YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.8.patch, 
 YARN-1366.9.patch, YARN-1366.patch, YARN-1366.prototype.patch, 
 YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2229) Making ContainerId long type

2014-07-02 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050990#comment-14050990
 ] 

Jian He commented on YARN-2229:
---

it failed with cannot find symbol for this method 
Long.compare(this.getEpoch(), other.getEpoch()); 

 Making ContainerId long type
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.2.patch, YARN-2229.2.patch, 
 YARN-2229.3.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-07-02 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051013#comment-14051013
 ] 

Rohith commented on YARN-1366:
--

bq. can you add some documentation about this 
Shall I add in java doc for AMRMClient#allocate()..? If it is in information 
guide, Could you please guide me how to add documentation.?

 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.10.patch, 
 YARN-1366.11.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, 
 YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.8.patch, 
 YARN-1366.9.patch, YARN-1366.patch, YARN-1366.prototype.patch, 
 YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2242) Improve exception information on AM launch crashes

2014-07-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051019#comment-14051019
 ] 

Junping Du commented on YARN-2242:
--

Patch looks good to me overall. Some minor fix is missing . at the end of 
diagnostics message. 
+1. Will commit it with this minor fix.

 Improve exception information on AM launch crashes
 --

 Key: YARN-2242
 URL: https://issues.apache.org/jira/browse/YARN-2242
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2242-070114-1.patch, YARN-2242-070114.patch, 
 YARN-2242-070115-1.patch, YARN-2242-070115.patch


 Now on each time AM Container crashes during launch, both the console and the 
 webpage UI only report a ShellExitCodeExecption. This is not only unhelpful, 
 but sometimes confusing. With the help of log aggregator, container logs are 
 actually aggregated, and can be very helpful for debugging. One possible way 
 to improve the whole process is to send a pointer to the aggregated logs to 
 the programmer when reporting exception information. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2229) Making ContainerId long type

2014-07-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051045#comment-14051045
 ] 

Tsuyoshi OZAWA commented on YARN-2229:
--

Thanks you, Jian. I found that Long.compare(long x, long y) is introduced in 
JDK 1.7. I'll update a patch not to use the method.

 Making ContainerId long type
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.2.patch, YARN-2229.2.patch, 
 YARN-2229.3.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-07-02 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051049#comment-14051049
 ] 

Jian He commented on YARN-1366:
---

add java doc for AMRMClient#allocate(). should be enough.
bq. Does a null response make sense for the user?
doing one more allocate makes the response more consistent.

 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.10.patch, 
 YARN-1366.11.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, 
 YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.8.patch, 
 YARN-1366.9.patch, YARN-1366.patch, YARN-1366.prototype.patch, 
 YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2229) Making ContainerId long type

2014-07-02 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2229:
-

Attachment: YARN-2229.4.patch

Updated a patch not to use Long.compare(long, long).

 Making ContainerId long type
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.2.patch, YARN-2229.2.patch, 
 YARN-2229.3.patch, YARN-2229.4.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2233) Implement web services to create, renew and cancel delegation tokens

2014-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051058#comment-14051058
 ] 

Hadoop QA commented on YARN-2233:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12653661/apache-yarn-2233.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-auth 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4185//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4185//console

This message is automatically generated.

 Implement web services to create, renew and cancel delegation tokens
 

 Key: YARN-2233
 URL: https://issues.apache.org/jira/browse/YARN-2233
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Attachments: apache-yarn-2233.0.patch, apache-yarn-2233.1.patch, 
 apache-yarn-2233.2.patch


 Implement functionality to create, renew and cancel delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2242) Improve exception information on AM launch crashes

2014-07-02 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051068#comment-14051068
 ] 

Li Lu commented on YARN-2242:
-

Thanks [~djp]! 

 Improve exception information on AM launch crashes
 --

 Key: YARN-2242
 URL: https://issues.apache.org/jira/browse/YARN-2242
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2242-070114-1.patch, YARN-2242-070114.patch, 
 YARN-2242-070115-1.patch, YARN-2242-070115.patch


 Now on each time AM Container crashes during launch, both the console and the 
 webpage UI only report a ShellExitCodeExecption. This is not only unhelpful, 
 but sometimes confusing. With the help of log aggregator, container logs are 
 actually aggregated, and can be very helpful for debugging. One possible way 
 to improve the whole process is to send a pointer to the aggregated logs to 
 the programmer when reporting exception information. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2242) Improve exception information on AM launch crashes

2014-07-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051066#comment-14051066
 ] 

Junping Du commented on YARN-2242:
--

bq. Zhijie Shen, thank you for the notification. I agree with Li Lu - these 
patches can be helpful for users. And, I checked the patch and confirmed that 
these patches are bisect each other. Therefore, we can work separately. Junping 
Du, could you also review YARN-2013?
Sure. [~ozawa], will look at YARN-2013 later.

 Improve exception information on AM launch crashes
 --

 Key: YARN-2242
 URL: https://issues.apache.org/jira/browse/YARN-2242
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2242-070114-1.patch, YARN-2242-070114.patch, 
 YARN-2242-070115-1.patch, YARN-2242-070115.patch


 Now on each time AM Container crashes during launch, both the console and the 
 webpage UI only report a ShellExitCodeExecption. This is not only unhelpful, 
 but sometimes confusing. With the help of log aggregator, container logs are 
 actually aggregated, and can be very helpful for debugging. One possible way 
 to improve the whole process is to send a pointer to the aggregated logs to 
 the programmer when reporting exception information. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2242) Improve exception information on AM launch crashes

2014-07-02 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2242:


Attachment: YARN-2242-070115-2.patch

Add the missing period in the end of the diagnostic information. 

 Improve exception information on AM launch crashes
 --

 Key: YARN-2242
 URL: https://issues.apache.org/jira/browse/YARN-2242
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2242-070114-1.patch, YARN-2242-070114.patch, 
 YARN-2242-070115-1.patch, YARN-2242-070115-2.patch, YARN-2242-070115.patch


 Now on each time AM Container crashes during launch, both the console and the 
 webpage UI only report a ShellExitCodeExecption. This is not only unhelpful, 
 but sometimes confusing. With the help of log aggregator, container logs are 
 actually aggregated, and can be very helpful for debugging. One possible way 
 to improve the whole process is to send a pointer to the aggregated logs to 
 the programmer when reporting exception information. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2062) Too many InvalidStateTransitionExceptions from NodeState.NEW on RM failover

2014-07-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2062:
---

Target Version/s: 2.6.0  (was: 2.5.0)

 Too many InvalidStateTransitionExceptions from NodeState.NEW on RM failover
 ---

 Key: YARN-2062
 URL: https://issues.apache.org/jira/browse/YARN-2062
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 On busy clusters, we see several 
 {{org.apache.hadoop.yarn.state.InvalidStateTransitonException}} for events 
 invoked against NEW nodes. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1354) Recover applications upon nodemanager restart

2014-07-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1354:
---

Target Version/s: 2.6.0  (was: 2.5.0)

 Recover applications upon nodemanager restart
 -

 Key: YARN-1354
 URL: https://issues.apache.org/jira/browse/YARN-1354
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1354-v1.patch, 
 YARN-1354-v2-and-YARN-1987-and-YARN-1362.patch, YARN-1354-v3.patch


 The set of active applications in the nodemanager context need to be 
 recovered for work-preserving nodemanager restart



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1342) Recover container tokens upon nodemanager restart

2014-07-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1342:
---

Target Version/s: 2.6.0  (was: 2.5.0)

 Recover container tokens upon nodemanager restart
 -

 Key: YARN-1342
 URL: https://issues.apache.org/jira/browse/YARN-1342
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1342.patch, YARN-1342v2.patch, 
 YARN-1342v3-and-YARN-1987.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2110) TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler

2014-07-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2110:
---

Target Version/s: 2.6.0  (was: 2.5.0)

 TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler
 ---

 Key: YARN-2110
 URL: https://issues.apache.org/jira/browse/YARN-2110
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Chen He
  Labels: test
 Attachments: YARN-2110-v2.patch, YARN-2110.patch


 The TestAMRestart#testAMRestartWithExistingContainers does a cast to 
 CapacityScheduler in a couple of places
 {code}
 ((CapacityScheduler) rm1.getResourceScheduler())
 {code}
 If run with FairScheduler as default scheduler the test throws 
 {code} java.lang.ClassCastException {code}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1106) The RM should point the tracking url to the RM app page if its empty

2014-07-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1106:
---

Target Version/s: 2.6.0  (was: 3.0.0, 2.5.0)

 The RM should point the tracking url to the RM app page if its empty
 

 Key: YARN-1106
 URL: https://issues.apache.org/jira/browse/YARN-1106
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1106.patch, YARN-1106.patch


 It would be nice if the Resourcemanager set the tracking url to the RM app 
 page if the application master doesn't pass one or passes the empty string.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1408:
---

Target Version/s: 2.6.0  (was: 2.5.0)

 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1856) cgroups based memory monitoring for containers

2014-07-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1856:
---

Target Version/s: 2.6.0  (was: 2.5.0)

 cgroups based memory monitoring for containers
 --

 Key: YARN-1856
 URL: https://issues.apache.org/jira/browse/YARN-1856
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2014-07-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1809:
---

Target Version/s: 2.6.0  (was: 2.5.0)

 Synchronize RM and Generic History Service Web-UIs
 --

 Key: YARN-1809
 URL: https://issues.apache.org/jira/browse/YARN-1809
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1809.1.patch, YARN-1809.2.patch, YARN-1809.3.patch, 
 YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch, 
 YARN-1809.7.patch, YARN-1809.8.patch, YARN-1809.9.patch


 After YARN-953, the web-UI of generic history service is provide more 
 information than that of RM, the details about app attempt and container. 
 It's good to provide similar web-UIs, but retrieve the data from separate 
 source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1844) yarn.log.server.url should have a default value

2014-07-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1844:
---

Target Version/s: 2.6.0  (was: 2.5.0)

 yarn.log.server.url should have a default value
 ---

 Key: YARN-1844
 URL: https://issues.apache.org/jira/browse/YARN-1844
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe

 Currently yarn.log.server.url must be configured properly by a user when log 
 aggregation is enabled so logs to continue to be served from their original 
 URL after they've been aggregated.  It would be nice if a default value for 
 this property could be provided that would work out of the box for at least 
 simple cluster setups (i.e.: already point to JHS or AHS accordingly).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1681) When banned.users is not set in LCE's container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS will receive unclear error message

2014-07-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1681:
---

Target Version/s: 2.6.0  (was: 2.5.0)

 When banned.users is not set in LCE's container-executor.cfg, submit job 
 with user in DEFAULT_BANNED_USERS will receive unclear error message
 ---

 Key: YARN-1681
 URL: https://issues.apache.org/jira/browse/YARN-1681
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Zhichun Wu
Assignee: Zhichun Wu
Priority: Minor
  Labels: container, usability
 Attachments: YARN-1681.patch


 When using LCE in a secure setup, if banned.users is not set in 
 container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS 
 (mapred, hdfs, bin, 0)  will receive unclear error message.
 for example, if we use hdfs to submit a mr job, we may see the following the 
 yarn app overview page:
 {code}
 appattempt_1391353981633_0003_02 exited with exitCode: -1000 due to: 
 Application application_1391353981633_0003 initialization failed 
 (exitCode=139) with output: 
 {code}
 while the prefer error message may look like:
 {code}
 appattempt_1391353981633_0003_02 exited with exitCode: -1000 due to: 
 Application application_1391353981633_0003 initialization failed 
 (exitCode=139) with output: Requested user hdfs is banned 
 {code}
 just a minor bug and I would like to start contributing to hadoop-common with 
 it:)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2157) Document YARN metrics

2014-07-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2157:
---

Target Version/s: 2.6.0  (was: 2.5.0)

 Document YARN metrics
 -

 Key: YARN-2157
 URL: https://issues.apache.org/jira/browse/YARN-2157
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
 Attachments: YARN-2157.2.patch, YARN-2157.patch


 YARN-side of HADOOP-6350. Add YARN metrics to Metrics document.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2162) Fair Scheduler :ability to optionally configure minResources and maxResources in terms of percentage

2014-07-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2162:
---

Target Version/s: 2.6.0  (was: 2.5.0)

 Fair Scheduler :ability to optionally configure minResources and maxResources 
 in terms of percentage
 

 Key: YARN-2162
 URL: https://issues.apache.org/jira/browse/YARN-2162
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Ashwin Shankar
  Labels: scheduler

 minResources and maxResources in fair scheduler configs are expressed in 
 terms of absolute numbers X mb, Y vcores. 
 As a result, when we expand or shrink our hadoop cluster, we need to 
 recalculate and change minResources/maxResources accordingly, which is pretty 
 inconvenient.
 We can circumvent this problem if we can optionally configure these 
 properties in terms of percentage of cluster capacity. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2209) Replace allocate#resync command with ApplicationMasterNotRegisteredException to indicate AM to re-register on RM restart

2014-07-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2209:
---

Target Version/s: 2.6.0  (was: 2.5.0)

 Replace allocate#resync command with ApplicationMasterNotRegisteredException 
 to indicate AM to re-register on RM restart
 

 Key: YARN-2209
 URL: https://issues.apache.org/jira/browse/YARN-2209
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2209.1.patch


 YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
 application to re-register on RM restart. we should do the same for 
 AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >