[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-28 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-1366:
-

Attachment: YARN-1366.7.patch

> AM should implement Resync with the ApplicationMasterService instead of 
> shutting down
> -
>
> Key: YARN-1366
> URL: https://issues.apache.org/jira/browse/YARN-1366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Rohith
> Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
> YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, 
> YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch
>
>
> The ApplicationMasterService currently sends a resync response to which the 
> AM responds by shutting down. The AM behavior is expected to change to 
> calling resyncing with the RM. Resync means resetting the allocate RPC 
> sequence number to 0 and the AM should send its entire outstanding request to 
> the RM. Note that if the AM is making its first allocate call to the RM then 
> things should proceed like normal without needing a resync. The RM will 
> return all containers that have completed since the RM last synced with the 
> AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-28 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047031#comment-14047031
 ] 

Tsuyoshi OZAWA commented on YARN-2052:
--

Thank you for the review, Jian. The test failure is not related.

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
> YARN-2052.11.patch, YARN-2052.12.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
> YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, 
> YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047026#comment-14047026
 ] 

Hadoop QA commented on YARN-2052:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653034/YARN-2052.12.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4131//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4131//console

This message is automatically generated.

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
> YARN-2052.11.patch, YARN-2052.12.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
> YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, 
> YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-28 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047020#comment-14047020
 ] 

Jian He commented on YARN-2052:
---

looks good, pending jenkins.

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
> YARN-2052.11.patch, YARN-2052.12.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
> YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, 
> YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-28 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2052:
-

Attachment: YARN-2052.12.patch

Sounds good. Updated a patch:

* Made EpochProto#epoch int64.
* Updated EpochPBImpl#getEpoch to return lower 32 bit.

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
> YARN-2052.11.patch, YARN-2052.12.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
> YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, 
> YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2223) NPE on ResourceManager recover

2014-06-28 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047017#comment-14047017
 ] 

Jian He commented on YARN-2223:
---

looks like the some attempt data is missing . Can you find out the list of 
attempt files are under the state-store directory for 
application_1398453545406_0001 ? 

> NPE on ResourceManager recover
> --
>
> Key: YARN-2223
> URL: https://issues.apache.org/jira/browse/YARN-2223
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.1
> Environment: JDK 8u5
>Reporter: Jon Bringhurst
>
> I upgraded two clusters from tag 2.2.0 to branch-2.4.1 (latest commit is 
> https://github.com/apache/hadoop-common/commit/c96c8e45a60651b677a1de338b7856a444dc0461).
> Both clusters have the same config (other than hostnames). Both are running 
> on JDK8u5 (I'm not sure if this is a factor here).
> One cluster started up without any errors. The other started up with the 
> following error on the RM:
> {noformat}
> 18:33:45,463  WARN RMAppImpl:331 - The specific max attempts: 0 for 
> application: 1 is invalid, because it is out of the range [1, 50]. Use the 
> global max attempts instead.
> 18:33:45,465  INFO RMAppImpl:651 - Recovering app: 
> application_1398450350082_0001 with 8 attempts and final state = KILLED
> 18:33:45,468  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_01 with final state: KILLED
> 18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_02 with final state: FAILED
> 18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_03 with final state: FAILED
> 18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_04 with final state: FAILED
> 18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_05 with final state: FAILED
> 18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_06 with final state: FAILED
> 18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_07 with final state: FAILED
> 18:33:45,481  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_08 with final state: FAILED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_01 State change from NEW to KILLED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_02 State change from NEW to FAILED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_03 State change from NEW to FAILED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_04 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_05 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_06 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_07 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_08 State change from NEW to FAILED
> 18:33:45,485  INFO RMAppImpl:639 - application_1398450350082_0001 State 
> change from NEW to KILLED
> 18:33:45,485  WARN RMAppImpl:331 - The specific max attempts: 0 for 
> application: 2 is invalid, because it is out of the range [1, 50]. Use the 
> global max attempts instead.
> 18:33:45,485  INFO RMAppImpl:651 - Recovering app: 
> application_1398450350082_0002 with 8 attempts and final state = KILLED
> 18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_01 with final state: KILLED
> 18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_02 with final state: FAILED
> 18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_03 with final state: FAILED
> 18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_04 with final state: FAILED
> 18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_05 with final state: FAILED
> 18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_06 with final state: FAILED
> 18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_07 with final state: FAILED
> 18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_08 with final state: FAILED
> 18:33:45,490  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0002_01 State change from NEW to KILLED
> 18:33:45,490 

[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count

2014-06-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047006#comment-14047006
 ] 

Hudson commented on YARN-614:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #5797 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5797/])
YARN-614. Changed ResourceManager to not count disk failure, node loss and RM 
restart towards app failures. Contributed by Xuan Gong (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606407)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


> Separate AM failures from hardware failure or YARN error and do not count 
> them to AM retry count
> 
>
> Key: YARN-614
> URL: https://issues.apache.org/jira/browse/YARN-614
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Fix For: 2.5.0
>
> Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
> YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, 
> YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.13.patch, 
> YARN-614.7.patch, YARN-614.8.patch, YARN-614.9.patch
>
>
> Attempts can fail due to a large number of user errors and they should not be 
> retried unnecessarily. The only reason YARN should retry an attempt is when 
> the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
> errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count

2014-06-28 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047002#comment-14047002
 ] 

Jian He commented on YARN-614:
--

Committed to trunk and branch-2, thanks Xuan!

> Separate AM failures from hardware failure or YARN error and do not count 
> them to AM retry count
> 
>
> Key: YARN-614
> URL: https://issues.apache.org/jira/browse/YARN-614
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Fix For: 2.5.0
>
> Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
> YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, 
> YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.13.patch, 
> YARN-614.7.patch, YARN-614.8.patch, YARN-614.9.patch
>
>
> Attempts can fail due to a large number of user errors and they should not be 
> retried unnecessarily. The only reason YARN should retry an attempt is when 
> the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
> errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-28 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047000#comment-14047000
 ] 

Jian He commented on YARN-2052:
---

found that may be we can change epochProto to use int64 also. For now 32 should 
be enough, but we never know when we need 64 in the future just like container 
Id.

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
> YARN-2052.11.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, 
> YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, 
> YARN-2052.9.patch, YARN-2052.9.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2014-06-28 Thread Adam Kawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kawa updated YARN-2230:


Component/s: scheduler
 client

> Fix description of yarn.scheduler.maximum-allocation-vcores in 
> yarn-default.xml (or code)
> -
>
> Key: YARN-2230
> URL: https://issues.apache.org/jira/browse/YARN-2230
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, scheduler
>Affects Versions: 2.4.0
>Reporter: Adam Kawa
>Priority: Minor
>
> When a user requests more vcores than the allocation limit (e.g. 
> mapreduce.map.cpu.vcores  is larger than 
> yarn.scheduler.maximum-allocation-vcores), then 
> InvalidResourceRequestException is thrown - 
> https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
> {code}
> if (resReq.getCapability().getVirtualCores() < 0 ||
> resReq.getCapability().getVirtualCores() >
> maximumResource.getVirtualCores()) {
>   throw new InvalidResourceRequestException("Invalid resource request"
>   + ", requested virtual cores < 0"
>   + ", or requested virtual cores > max configured"
>   + ", requestedVirtualCores="
>   + resReq.getCapability().getVirtualCores()
>   + ", maxVirtualCores=" + maximumResource.getVirtualCores());
> }
> {code}
> According to documentation - yarn-default.xml 
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
>  the request should be capped to the allocation limit.
> {code}
>   
> The maximum allocation for every container request at the RM,
> in terms of virtual CPU cores. Requests higher than this won't take 
> effect,
> and will get capped to this value.
> yarn.scheduler.maximum-allocation-vcores
> 32
>   
> {code}
> This means that:
> * Either documentation or code should be corrected (unless this exception is 
> handled elsewhere accordingly, but it looks that it is not).
> This behavior is confusing, because when such a job (with 
> mapreduce.map.cpu.vcores is larger than 
> yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
> progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
> {code}
> 2014-06-29 00:34:51,469 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
> Invalid resource ask by application appattempt_1403993411503_0002_01
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested virtual cores < 0, or requested virtual cores > 
> max configured, requestedVirtualCores=32, maxVirtualCores=3
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
> .
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:416)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
> {code}
> * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
> obvious to discover why a job does not make any progress.
> The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2014-06-28 Thread Adam Kawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kawa updated YARN-2230:


Affects Version/s: 2.4.0

> Fix description of yarn.scheduler.maximum-allocation-vcores in 
> yarn-default.xml (or code)
> -
>
> Key: YARN-2230
> URL: https://issues.apache.org/jira/browse/YARN-2230
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Adam Kawa
>Priority: Minor
>
> When a user requests more vcores than the allocation limit (e.g. 
> mapreduce.map.cpu.vcores  is larger than 
> yarn.scheduler.maximum-allocation-vcores), then 
> InvalidResourceRequestException is thrown - 
> https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
> {code}
> if (resReq.getCapability().getVirtualCores() < 0 ||
> resReq.getCapability().getVirtualCores() >
> maximumResource.getVirtualCores()) {
>   throw new InvalidResourceRequestException("Invalid resource request"
>   + ", requested virtual cores < 0"
>   + ", or requested virtual cores > max configured"
>   + ", requestedVirtualCores="
>   + resReq.getCapability().getVirtualCores()
>   + ", maxVirtualCores=" + maximumResource.getVirtualCores());
> }
> {code}
> According to documentation - yarn-default.xml 
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
>  the request should be capped to the allocation limit.
> {code}
>   
> The maximum allocation for every container request at the RM,
> in terms of virtual CPU cores. Requests higher than this won't take 
> effect,
> and will get capped to this value.
> yarn.scheduler.maximum-allocation-vcores
> 32
>   
> {code}
> This means that:
> * Either documentation or code should be corrected (unless this exception is 
> handled elsewhere accordingly, but it looks that it is not).
> This behavior is confusing, because when such a job (with 
> mapreduce.map.cpu.vcores is larger than 
> yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
> progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
> {code}
> 2014-06-29 00:34:51,469 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
> Invalid resource ask by application appattempt_1403993411503_0002_01
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested virtual cores < 0, or requested virtual cores > 
> max configured, requestedVirtualCores=32, maxVirtualCores=3
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
> .
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:416)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
> {code}
> * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
> obvious to discover why a job does not make any progress.
> The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2014-06-28 Thread Adam Kawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kawa updated YARN-2230:


Description: 
When a user requests more vcores than the allocation limit (e.g. 
mapreduce.map.cpu.vcores  is larger than 
yarn.scheduler.maximum-allocation-vcores), then InvalidResourceRequestException 
is thrown - 
https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
{code}
if (resReq.getCapability().getVirtualCores() < 0 ||
resReq.getCapability().getVirtualCores() >
maximumResource.getVirtualCores()) {
  throw new InvalidResourceRequestException("Invalid resource request"
  + ", requested virtual cores < 0"
  + ", or requested virtual cores > max configured"
  + ", requestedVirtualCores="
  + resReq.getCapability().getVirtualCores()
  + ", maxVirtualCores=" + maximumResource.getVirtualCores());
}
{code}

According to documentation - yarn-default.xml 
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
 the request should be capped to the allocation limit.
{code}
  
The maximum allocation for every container request at the RM,
in terms of virtual CPU cores. Requests higher than this won't take effect,
and will get capped to this value.
yarn.scheduler.maximum-allocation-vcores
32
  
{code}

This means that:
* Either documentation or code should be corrected (unless this exception is 
handled elsewhere accordingly, but it looks that it is not).

This behavior is confusing, because when such a job (with 
mapreduce.map.cpu.vcores is larger than 
yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
{code}
2014-06-29 00:34:51,469 WARN 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Invalid 
resource ask by application appattempt_1403993411503_0002_01
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request, requested virtual cores < 0, or requested virtual cores > max 
configured, requestedVirtualCores=32, maxVirtualCores=3
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
.
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
{code}

* IMHO, such an exception should be forwarded to client. Otherwise, it is non 
obvious to discover why a job does not make any progress.

The same looks to be related to memory.

  was:
When a user requests more vcores than the allocation limit (e.g. 
mapreduce.map.cpu.vcores  is larger than 
yarn.scheduler.maximum-allocation-vcores), then InvalidResourceRequestException 
is thrown - 
https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
{code}
if (resReq.getCapability().getVirtualCores() < 0 ||
resReq.getCapability().getVirtualCores() >
maximumResource.getVirtualCores()) {
  throw new InvalidResourceRequestException("Invalid resource request"
  + ", requested virtual cores < 0"
  + ", or requested virtual cores > max configured"
  + ", requestedVirtualCores="
  + resReq.getCapability().getVirtualCores()
  + ", maxVirtualCores=" + maximumResource.getVirtualCores());
}
{code}

According to documentation - yarn-default.xml 
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
 the request will be capped to the allocation limit:
{code}
  
The maximum allocation for every container request at the RM,
in terms of virtual CPU cores. Requests higher than this won't take effect,
and will get capped to this value.
yarn.scheduler.maximum-allocation-vcores
32
  
{code}

[jira] [Updated] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2014-06-28 Thread Adam Kawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kawa updated YARN-2230:


Summary: Fix description of yarn.scheduler.maximum-allocation-vcores in 
yarn-default.xml (or code)  (was: Fix description of 
yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code to show))

> Fix description of yarn.scheduler.maximum-allocation-vcores in 
> yarn-default.xml (or code)
> -
>
> Key: YARN-2230
> URL: https://issues.apache.org/jira/browse/YARN-2230
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Adam Kawa
>Priority: Minor
>
> When a user requests more vcores than the allocation limit (e.g. 
> mapreduce.map.cpu.vcores  is larger than 
> yarn.scheduler.maximum-allocation-vcores), then 
> InvalidResourceRequestException is thrown - 
> https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
> {code}
> if (resReq.getCapability().getVirtualCores() < 0 ||
> resReq.getCapability().getVirtualCores() >
> maximumResource.getVirtualCores()) {
>   throw new InvalidResourceRequestException("Invalid resource request"
>   + ", requested virtual cores < 0"
>   + ", or requested virtual cores > max configured"
>   + ", requestedVirtualCores="
>   + resReq.getCapability().getVirtualCores()
>   + ", maxVirtualCores=" + maximumResource.getVirtualCores());
> }
> {code}
> According to documentation - yarn-default.xml 
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
>  the request will be capped to the allocation limit:
> {code}
>   
> The maximum allocation for every container request at the RM,
> in terms of virtual CPU cores. Requests higher than this won't take 
> effect,
> and will get capped to this value.
> yarn.scheduler.maximum-allocation-vcores
> 32
>   
> {code}
> * Either documentation or code should be corrected (unless this exception is 
> handled elsewhere accordingly, but it looks that it is not).
> This behavior is confusing, because when such a job (with 
> mapreduce.map.cpu.vcores is larger than 
> yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
> progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
> {code}
> 2014-06-29 00:34:51,469 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
> Invalid resource ask by application appattempt_1403993411503_0002_01
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested virtual cores < 0, or requested virtual cores > 
> max configured, requestedVirtualCores=32, maxVirtualCores=3
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:416)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
> {code}
> * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
> obvious to discover why a job does not make any progress.
> The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code to show)

2014-06-28 Thread Adam Kawa (JIRA)
Adam Kawa created YARN-2230:
---

 Summary: Fix description of 
yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code to show)
 Key: YARN-2230
 URL: https://issues.apache.org/jira/browse/YARN-2230
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Adam Kawa
Priority: Minor


When a user requests more vcores than the allocation limit (e.g. 
mapreduce.map.cpu.vcores  is larger than 
yarn.scheduler.maximum-allocation-vcores), then InvalidResourceRequestException 
is thrown - 
https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
{code}
if (resReq.getCapability().getVirtualCores() < 0 ||
resReq.getCapability().getVirtualCores() >
maximumResource.getVirtualCores()) {
  throw new InvalidResourceRequestException("Invalid resource request"
  + ", requested virtual cores < 0"
  + ", or requested virtual cores > max configured"
  + ", requestedVirtualCores="
  + resReq.getCapability().getVirtualCores()
  + ", maxVirtualCores=" + maximumResource.getVirtualCores());
}
{code}

According to documentation - yarn-default.xml 
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
 the request will be capped to the allocation limit:
{code}
  
The maximum allocation for every container request at the RM,
in terms of virtual CPU cores. Requests higher than this won't take effect,
and will get capped to this value.
yarn.scheduler.maximum-allocation-vcores
32
  
{code}

* Either documentation or code should be corrected (unless this exception is 
handled elsewhere accordingly, but it looks that it is not).

This behavior is confusing, because when such a job (with 
mapreduce.map.cpu.vcores is larger than 
yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
{code}
2014-06-29 00:34:51,469 WARN 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Invalid 
resource ask by application appattempt_1403993411503_0002_01
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request, requested virtual cores < 0, or requested virtual cores > max 
configured, requestedVirtualCores=32, maxVirtualCores=3
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
{code}
* IMHO, such an exception should be forwarded to client. Otherwise, it is non 
obvious to discover why a job does not make any progress.

The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046900#comment-14046900
 ] 

Hadoop QA commented on YARN-1366:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653008/YARN-1366.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4130//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4130//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4130//console

This message is automatically generated.

> AM should implement Resync with the ApplicationMasterService instead of 
> shutting down
> -
>
> Key: YARN-1366
> URL: https://issues.apache.org/jira/browse/YARN-1366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Rohith
> Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
> YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.patch, 
> YARN-1366.prototype.patch, YARN-1366.prototype.patch
>
>
> The ApplicationMasterService currently sends a resync response to which the 
> AM responds by shutting down. The AM behavior is expected to change to 
> calling resyncing with the RM. Resync means resetting the allocate RPC 
> sequence number to 0 and the AM should send its entire outstanding request to 
> the RM. Note that if the AM is making its first allocate call to the RM then 
> things should proceed like normal without needing a resync. The RM will 
> return all containers that have completed since the RM last synced with the 
> AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2223) NPE on ResourceManager recover

2014-06-28 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2223:
-

Environment: JDK 8u5

> NPE on ResourceManager recover
> --
>
> Key: YARN-2223
> URL: https://issues.apache.org/jira/browse/YARN-2223
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.1
> Environment: JDK 8u5
>Reporter: Jon Bringhurst
>
> I upgraded two clusters from tag 2.2.0 to branch-2.4.1 (latest commit is 
> https://github.com/apache/hadoop-common/commit/c96c8e45a60651b677a1de338b7856a444dc0461).
> Both clusters have the same config (other than hostnames). Both are running 
> on JDK8u5 (I'm not sure if this is a factor here).
> One cluster started up without any errors. The other started up with the 
> following error on the RM:
> {noformat}
> 18:33:45,463  WARN RMAppImpl:331 - The specific max attempts: 0 for 
> application: 1 is invalid, because it is out of the range [1, 50]. Use the 
> global max attempts instead.
> 18:33:45,465  INFO RMAppImpl:651 - Recovering app: 
> application_1398450350082_0001 with 8 attempts and final state = KILLED
> 18:33:45,468  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_01 with final state: KILLED
> 18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_02 with final state: FAILED
> 18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_03 with final state: FAILED
> 18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_04 with final state: FAILED
> 18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_05 with final state: FAILED
> 18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_06 with final state: FAILED
> 18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_07 with final state: FAILED
> 18:33:45,481  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_08 with final state: FAILED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_01 State change from NEW to KILLED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_02 State change from NEW to FAILED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_03 State change from NEW to FAILED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_04 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_05 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_06 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_07 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_08 State change from NEW to FAILED
> 18:33:45,485  INFO RMAppImpl:639 - application_1398450350082_0001 State 
> change from NEW to KILLED
> 18:33:45,485  WARN RMAppImpl:331 - The specific max attempts: 0 for 
> application: 2 is invalid, because it is out of the range [1, 50]. Use the 
> global max attempts instead.
> 18:33:45,485  INFO RMAppImpl:651 - Recovering app: 
> application_1398450350082_0002 with 8 attempts and final state = KILLED
> 18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_01 with final state: KILLED
> 18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_02 with final state: FAILED
> 18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_03 with final state: FAILED
> 18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_04 with final state: FAILED
> 18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_05 with final state: FAILED
> 18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_06 with final state: FAILED
> 18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_07 with final state: FAILED
> 18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_08 with final state: FAILED
> 18:33:45,490  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0002_01 State change from NEW to KILLED
> 18:33:45,490  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0002_02 State change from NEW to FAILED
> 18:33:45,490  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0002_0

[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-28 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-1366:
-

Attachment: YARN-1366.6.patch

Attached updated the patch. Please review the patch

> AM should implement Resync with the ApplicationMasterService instead of 
> shutting down
> -
>
> Key: YARN-1366
> URL: https://issues.apache.org/jira/browse/YARN-1366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Rohith
> Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
> YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.patch, 
> YARN-1366.prototype.patch, YARN-1366.prototype.patch
>
>
> The ApplicationMasterService currently sends a resync response to which the 
> AM responds by shutting down. The AM behavior is expected to change to 
> calling resyncing with the RM. Resync means resetting the allocate RPC 
> sequence number to 0 and the AM should send its entire outstanding request to 
> the RM. Note that if the AM is making its first allocate call to the RM then 
> things should proceed like normal without needing a resync. The RM will 
> return all containers that have completed since the RM last synced with the 
> AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed

2014-06-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046880#comment-14046880
 ] 

Hudson commented on YARN-2104:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1815 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1815/])
YARN-2104. Scheduler queue filter failed to work because index of queue column 
changed. Contributed by Wangda Tan (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606265)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerPage.java


> Scheduler queue filter failed to work because index of queue column changed
> ---
>
> Key: YARN-2104
> URL: https://issues.apache.org/jira/browse/YARN-2104
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 3.0.0, 2.5.0
>
> Attachments: YARN-2104.patch
>
>
> YARN-563 added,
> {code}
> + th(".type", "Application Type”).
> {code}
> to application table, which makes queue’s column index from 3 to 4. And in 
> scheduler page, queue’s column index is hard coded to 3 when filter 
> application with queue’s name,
> {code}
>   "if (q == 'root') q = '';",
>   "else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';",
>   "$('#apps').dataTable().fnFilter(q, 3, true);",
> {code}
> So queue filter will not work for application page.
> Reproduce steps: (Thanks Bo Yang for pointing this)
> {code}
> 1) In default setup, there’s a default queue under root queue
> 2) Run an arbitrary application, you can find it in “Applications” page
> 3) Click “Default” queue in scheduler page
> 4) Click “Applications”, no application will show here
> 5) Click “Root” queue in scheduler page
> 6) Click “Applications”, application will show again
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2201) TestRMWebServicesAppsModification dependent on yarn-default.xml

2014-06-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046882#comment-14046882
 ] 

Hudson commented on YARN-2201:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1815 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1815/])
YARN-2201. Made TestRMWebServicesAppsModification be independent of the changes 
on yarn-default.xml. Contributed by Varun Vasudev. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606285)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java


> TestRMWebServicesAppsModification dependent on yarn-default.xml
> ---
>
> Key: YARN-2201
> URL: https://issues.apache.org/jira/browse/YARN-2201
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ray Chiang
>Assignee: Varun Vasudev
>  Labels: test
> Fix For: 2.5.0
>
> Attachments: apache-yarn-2201.0.patch, apache-yarn-2201.1.patch, 
> apache-yarn-2201.2.patch, apache-yarn-2201.3.patch
>
>
> TestRMWebServicesAppsModification.java has some errors that are 
> yarn-default.xml dependent.  By changing yarn-default.xml properties, I'm 
> seeing the following errors:
> 1) Changing yarn.resourcemanager.scheduler.class from 
> capacity.CapacityScheduler to fair.FairScheduler gives the error:
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 79.047 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> testSingleAppKillUnauthorized[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 3.22 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458)
> 2) Changing yarn.acl.enable from false to true results in the following 
> errors:
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> Tests run: 10, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 49.044 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> testSingleAppKill[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 2.986 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:287)
> testSingleAppKillInvalidState[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 2.258 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidState(TestRMWebServicesAppsModification.java:369)
> testSingleAppKillUnauthorized[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 2.263 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458)
> testSingleAppKillInvalidId[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 0.214 sec  <<< FAILURE!
> java.

[jira] [Commented] (YARN-2204) TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler

2014-06-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046883#comment-14046883
 ] 

Hudson commented on YARN-2204:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1815 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1815/])
YARN-2204. Addendum patch. TestAMRestart#testAMRestartWithExistingContainers 
assumes CapacityScheduler. (Robert Kanter via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606168)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


> TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler
> ---
>
> Key: YARN-2204
> URL: https://issues.apache.org/jira/browse/YARN-2204
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Trivial
> Fix For: 2.5.0
>
> Attachments: YARN-2204.patch, YARN-2204_addendum.patch, 
> YARN-2204_addendum.patch
>
>
> TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2201) TestRMWebServicesAppsModification dependent on yarn-default.xml

2014-06-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046869#comment-14046869
 ] 

Hudson commented on YARN-2201:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1788 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1788/])
YARN-2201. Made TestRMWebServicesAppsModification be independent of the changes 
on yarn-default.xml. Contributed by Varun Vasudev. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606285)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java


> TestRMWebServicesAppsModification dependent on yarn-default.xml
> ---
>
> Key: YARN-2201
> URL: https://issues.apache.org/jira/browse/YARN-2201
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ray Chiang
>Assignee: Varun Vasudev
>  Labels: test
> Fix For: 2.5.0
>
> Attachments: apache-yarn-2201.0.patch, apache-yarn-2201.1.patch, 
> apache-yarn-2201.2.patch, apache-yarn-2201.3.patch
>
>
> TestRMWebServicesAppsModification.java has some errors that are 
> yarn-default.xml dependent.  By changing yarn-default.xml properties, I'm 
> seeing the following errors:
> 1) Changing yarn.resourcemanager.scheduler.class from 
> capacity.CapacityScheduler to fair.FairScheduler gives the error:
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 79.047 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> testSingleAppKillUnauthorized[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 3.22 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458)
> 2) Changing yarn.acl.enable from false to true results in the following 
> errors:
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> Tests run: 10, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 49.044 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> testSingleAppKill[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 2.986 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:287)
> testSingleAppKillInvalidState[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 2.258 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidState(TestRMWebServicesAppsModification.java:369)
> testSingleAppKillUnauthorized[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 2.263 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458)
> testSingleAppKillInvalidId[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 0.214 sec  <<< FAILURE!
> java.lang.Asser

[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed

2014-06-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046867#comment-14046867
 ] 

Hudson commented on YARN-2104:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1788 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1788/])
YARN-2104. Scheduler queue filter failed to work because index of queue column 
changed. Contributed by Wangda Tan (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606265)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerPage.java


> Scheduler queue filter failed to work because index of queue column changed
> ---
>
> Key: YARN-2104
> URL: https://issues.apache.org/jira/browse/YARN-2104
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 3.0.0, 2.5.0
>
> Attachments: YARN-2104.patch
>
>
> YARN-563 added,
> {code}
> + th(".type", "Application Type”).
> {code}
> to application table, which makes queue’s column index from 3 to 4. And in 
> scheduler page, queue’s column index is hard coded to 3 when filter 
> application with queue’s name,
> {code}
>   "if (q == 'root') q = '';",
>   "else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';",
>   "$('#apps').dataTable().fnFilter(q, 3, true);",
> {code}
> So queue filter will not work for application page.
> Reproduce steps: (Thanks Bo Yang for pointing this)
> {code}
> 1) In default setup, there’s a default queue under root queue
> 2) Run an arbitrary application, you can find it in “Applications” page
> 3) Click “Default” queue in scheduler page
> 4) Click “Applications”, no application will show here
> 5) Click “Root” queue in scheduler page
> 6) Click “Applications”, application will show again
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2204) TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler

2014-06-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046871#comment-14046871
 ] 

Hudson commented on YARN-2204:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1788 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1788/])
YARN-2204. Addendum patch. TestAMRestart#testAMRestartWithExistingContainers 
assumes CapacityScheduler. (Robert Kanter via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606168)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


> TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler
> ---
>
> Key: YARN-2204
> URL: https://issues.apache.org/jira/browse/YARN-2204
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Trivial
> Fix For: 2.5.0
>
> Attachments: YARN-2204.patch, YARN-2204_addendum.patch, 
> YARN-2204_addendum.patch
>
>
> TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-06-28 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046853#comment-14046853
 ] 

Remus Rusanu commented on YARN-2198:


I got this working using LPC, but there are some complications vis-a-vis 
stdout/stderr. With a helper service the nodemanager no longer gets a free 
lunch of accessing the task stdout/stderr. Solutions exists:
 - read stdout/stderr from the helper and pump them over the LPC interface back 
to NM
 - explictly set a .out and .err file for the task and use them as 
stdout/stderr for the container launch.
Note that the problem applies to localizer launch too, which does no have a 
stdout/stderr redirect in the launch script.

Another complication is the Windows job model of NM/winutils. winutils create a 
job for the container and joins the job itself, ensuring a controlled lifespan 
for the all task launched processes. The service helper cannot join the job as 
it has it own, independent, lifespan. I solved this problem by having the 
helper service launch "wintuls task createAsUser ..." as an ordinary 
CreateProcess in the LPC server routine, rather than attempt to do the S4U 
impersonation in the helper service process itself. This works fine, and also 
greatly reduces the risks associated with leaking handles as the heavy work 
(=leak risk) occurs in a sub-process, not in the service.

I will have to investigate if there is any known issue vis-a-vis a very long 
LPC call (winutils waits for the spawned processes to finish). I there is, the 
solution would be for the helper service to hand over the spwaned task to the 
NM (duplicate the process task in the NM, yuck) and have the NM JNI (the LPC 
client) do the actual process handle wait (ie. blocking wait for task to 
finish). This would make the LPC call short (spawn process, duplicate handle, 
return handle to NM) at the risk of some induced complications. Also this would 
make the whole stdout/stderr transfer even more cumbersome if we opt for pipes 
vs. .out/.err files (open by helper process, duplicate it in NM, have the NM 
read the handles...)

> Remove the need to run NodeManager as privileged account for Windows Secure 
> Container Executor
> --
>
> Key: YARN-2198
> URL: https://issues.apache.org/jira/browse/YARN-2198
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows
>
> YARN-1972 introduces a Secure Windows Container Executor. However this 
> executor requires a the process launching the container to be LocalSystem or 
> a member of the a local Administrators group. Since the process in question 
> is the NodeManager, the requirement translates to the entire NM to run as a 
> privileged account, a very large surface area to review and protect.
> This proposal is to move the privileged operations into a dedicated NT 
> service. The NM can run as a low privilege account and communicate with the 
> privileged NT service when it needs to launch a container. This would reduce 
> the surface exposed to the high privileges. 
> There has to exist a secure, authenticated and authorized channel of 
> communication between the NM and the privileged NT service. Possible 
> alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
> be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
> specific inter-process communication channel that satisfies all requirements 
> and is easy to deploy. The privileged NT service would register and listen on 
> an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
> with libwinutils which would host the LPC client code. The client would 
> connect to the LPC port (NtConnectPort) and send a message requesting a 
> container launch (NtRequestWaitReplyPort). LPC provides authentication and 
> the privileged NT service can use authorization API (AuthZ) to validate the 
> caller.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-28 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046826#comment-14046826
 ] 

Rohith commented on YARN-1366:
--

Looking into fix findbug warning and test case. Will update patch once it is 
done.

> AM should implement Resync with the ApplicationMasterService instead of 
> shutting down
> -
>
> Key: YARN-1366
> URL: https://issues.apache.org/jira/browse/YARN-1366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Rohith
> Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
> YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.patch, 
> YARN-1366.prototype.patch, YARN-1366.prototype.patch
>
>
> The ApplicationMasterService currently sends a resync response to which the 
> AM responds by shutting down. The AM behavior is expected to change to 
> calling resyncing with the RM. Resync means resetting the allocate RPC 
> sequence number to 0 and the AM should send its entire outstanding request to 
> the RM. Note that if the AM is making its first allocate call to the RM then 
> things should proceed like normal without needing a resync. The RM will 
> return all containers that have completed since the RM last synced with the 
> AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2201) TestRMWebServicesAppsModification dependent on yarn-default.xml

2014-06-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046817#comment-14046817
 ] 

Hudson commented on YARN-2201:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #597 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/597/])
YARN-2201. Made TestRMWebServicesAppsModification be independent of the changes 
on yarn-default.xml. Contributed by Varun Vasudev. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606285)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java


> TestRMWebServicesAppsModification dependent on yarn-default.xml
> ---
>
> Key: YARN-2201
> URL: https://issues.apache.org/jira/browse/YARN-2201
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ray Chiang
>Assignee: Varun Vasudev
>  Labels: test
> Fix For: 2.5.0
>
> Attachments: apache-yarn-2201.0.patch, apache-yarn-2201.1.patch, 
> apache-yarn-2201.2.patch, apache-yarn-2201.3.patch
>
>
> TestRMWebServicesAppsModification.java has some errors that are 
> yarn-default.xml dependent.  By changing yarn-default.xml properties, I'm 
> seeing the following errors:
> 1) Changing yarn.resourcemanager.scheduler.class from 
> capacity.CapacityScheduler to fair.FairScheduler gives the error:
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 79.047 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> testSingleAppKillUnauthorized[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 3.22 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458)
> 2) Changing yarn.acl.enable from false to true results in the following 
> errors:
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> Tests run: 10, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 49.044 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> testSingleAppKill[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 2.986 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:287)
> testSingleAppKillInvalidState[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 2.258 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidState(TestRMWebServicesAppsModification.java:369)
> testSingleAppKillUnauthorized[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 2.263 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458)
> testSingleAppKillInvalidId[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 0.214 sec  <<< FAILURE!
> java.lang.Asserti

[jira] [Commented] (YARN-2204) TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler

2014-06-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046819#comment-14046819
 ] 

Hudson commented on YARN-2204:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #597 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/597/])
YARN-2204. Addendum patch. TestAMRestart#testAMRestartWithExistingContainers 
assumes CapacityScheduler. (Robert Kanter via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606168)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


> TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler
> ---
>
> Key: YARN-2204
> URL: https://issues.apache.org/jira/browse/YARN-2204
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Trivial
> Fix For: 2.5.0
>
> Attachments: YARN-2204.patch, YARN-2204_addendum.patch, 
> YARN-2204_addendum.patch
>
>
> TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed

2014-06-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046815#comment-14046815
 ] 

Hudson commented on YARN-2104:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #597 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/597/])
YARN-2104. Scheduler queue filter failed to work because index of queue column 
changed. Contributed by Wangda Tan (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606265)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerPage.java


> Scheduler queue filter failed to work because index of queue column changed
> ---
>
> Key: YARN-2104
> URL: https://issues.apache.org/jira/browse/YARN-2104
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 3.0.0, 2.5.0
>
> Attachments: YARN-2104.patch
>
>
> YARN-563 added,
> {code}
> + th(".type", "Application Type”).
> {code}
> to application table, which makes queue’s column index from 3 to 4. And in 
> scheduler page, queue’s column index is hard coded to 3 when filter 
> application with queue’s name,
> {code}
>   "if (q == 'root') q = '';",
>   "else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';",
>   "$('#apps').dataTable().fnFilter(q, 3, true);",
> {code}
> So queue filter will not work for application page.
> Reproduce steps: (Thanks Bo Yang for pointing this)
> {code}
> 1) In default setup, there’s a default queue under root queue
> 2) Run an arbitrary application, you can find it in “Applications” page
> 3) Click “Default” queue in scheduler page
> 4) Click “Applications”, no application will show here
> 5) Click “Root” queue in scheduler page
> 6) Click “Applications”, application will show again
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)