[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.7.patch > AM should implement Resync with the ApplicationMasterService instead of > shutting down > - > > Key: YARN-1366 > URL: https://issues.apache.org/jira/browse/YARN-1366 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Rohith > Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, > YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, > YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch > > > The ApplicationMasterService currently sends a resync response to which the > AM responds by shutting down. The AM behavior is expected to change to > calling resyncing with the RM. Resync means resetting the allocate RPC > sequence number to 0 and the AM should send its entire outstanding request to > the RM. Note that if the AM is making its first allocate call to the RM then > things should proceed like normal without needing a resync. The RM will > return all containers that have completed since the RM last synced with the > AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047031#comment-14047031 ] Tsuyoshi OZAWA commented on YARN-2052: -- Thank you for the review, Jian. The test failure is not related. > ContainerId creation after work preserving restart is broken > > > Key: YARN-2052 > URL: https://issues.apache.org/jira/browse/YARN-2052 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2052.1.patch, YARN-2052.10.patch, > YARN-2052.11.patch, YARN-2052.12.patch, YARN-2052.2.patch, YARN-2052.3.patch, > YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, > YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch > > > Container ids are made unique by using the app identifier and appending a > monotonically increasing sequence number to it. Since container creation is a > high churn activity the RM does not store the sequence number per app. So > after restart it does not know what the new sequence number should be for new > allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047026#comment-14047026 ] Hadoop QA commented on YARN-2052: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653034/YARN-2052.12.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4131//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4131//console This message is automatically generated. > ContainerId creation after work preserving restart is broken > > > Key: YARN-2052 > URL: https://issues.apache.org/jira/browse/YARN-2052 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2052.1.patch, YARN-2052.10.patch, > YARN-2052.11.patch, YARN-2052.12.patch, YARN-2052.2.patch, YARN-2052.3.patch, > YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, > YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch > > > Container ids are made unique by using the app identifier and appending a > monotonically increasing sequence number to it. Since container creation is a > high churn activity the RM does not store the sequence number per app. So > after restart it does not know what the new sequence number should be for new > allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047020#comment-14047020 ] Jian He commented on YARN-2052: --- looks good, pending jenkins. > ContainerId creation after work preserving restart is broken > > > Key: YARN-2052 > URL: https://issues.apache.org/jira/browse/YARN-2052 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2052.1.patch, YARN-2052.10.patch, > YARN-2052.11.patch, YARN-2052.12.patch, YARN-2052.2.patch, YARN-2052.3.patch, > YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, > YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch > > > Container ids are made unique by using the app identifier and appending a > monotonically increasing sequence number to it. Since container creation is a > high churn activity the RM does not store the sequence number per app. So > after restart it does not know what the new sequence number should be for new > allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2052: - Attachment: YARN-2052.12.patch Sounds good. Updated a patch: * Made EpochProto#epoch int64. * Updated EpochPBImpl#getEpoch to return lower 32 bit. > ContainerId creation after work preserving restart is broken > > > Key: YARN-2052 > URL: https://issues.apache.org/jira/browse/YARN-2052 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2052.1.patch, YARN-2052.10.patch, > YARN-2052.11.patch, YARN-2052.12.patch, YARN-2052.2.patch, YARN-2052.3.patch, > YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, > YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch > > > Container ids are made unique by using the app identifier and appending a > monotonically increasing sequence number to it. Since container creation is a > high churn activity the RM does not store the sequence number per app. So > after restart it does not know what the new sequence number should be for new > allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2223) NPE on ResourceManager recover
[ https://issues.apache.org/jira/browse/YARN-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047017#comment-14047017 ] Jian He commented on YARN-2223: --- looks like the some attempt data is missing . Can you find out the list of attempt files are under the state-store directory for application_1398453545406_0001 ? > NPE on ResourceManager recover > -- > > Key: YARN-2223 > URL: https://issues.apache.org/jira/browse/YARN-2223 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.1 > Environment: JDK 8u5 >Reporter: Jon Bringhurst > > I upgraded two clusters from tag 2.2.0 to branch-2.4.1 (latest commit is > https://github.com/apache/hadoop-common/commit/c96c8e45a60651b677a1de338b7856a444dc0461). > Both clusters have the same config (other than hostnames). Both are running > on JDK8u5 (I'm not sure if this is a factor here). > One cluster started up without any errors. The other started up with the > following error on the RM: > {noformat} > 18:33:45,463 WARN RMAppImpl:331 - The specific max attempts: 0 for > application: 1 is invalid, because it is out of the range [1, 50]. Use the > global max attempts instead. > 18:33:45,465 INFO RMAppImpl:651 - Recovering app: > application_1398450350082_0001 with 8 attempts and final state = KILLED > 18:33:45,468 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0001_01 with final state: KILLED > 18:33:45,478 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0001_02 with final state: FAILED > 18:33:45,478 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0001_03 with final state: FAILED > 18:33:45,479 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0001_04 with final state: FAILED > 18:33:45,479 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0001_05 with final state: FAILED > 18:33:45,480 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0001_06 with final state: FAILED > 18:33:45,480 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0001_07 with final state: FAILED > 18:33:45,481 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0001_08 with final state: FAILED > 18:33:45,482 INFO RMAppAttemptImpl:659 - > appattempt_1398450350082_0001_01 State change from NEW to KILLED > 18:33:45,482 INFO RMAppAttemptImpl:659 - > appattempt_1398450350082_0001_02 State change from NEW to FAILED > 18:33:45,482 INFO RMAppAttemptImpl:659 - > appattempt_1398450350082_0001_03 State change from NEW to FAILED > 18:33:45,482 INFO RMAppAttemptImpl:659 - > appattempt_1398450350082_0001_04 State change from NEW to FAILED > 18:33:45,483 INFO RMAppAttemptImpl:659 - > appattempt_1398450350082_0001_05 State change from NEW to FAILED > 18:33:45,483 INFO RMAppAttemptImpl:659 - > appattempt_1398450350082_0001_06 State change from NEW to FAILED > 18:33:45,483 INFO RMAppAttemptImpl:659 - > appattempt_1398450350082_0001_07 State change from NEW to FAILED > 18:33:45,483 INFO RMAppAttemptImpl:659 - > appattempt_1398450350082_0001_08 State change from NEW to FAILED > 18:33:45,485 INFO RMAppImpl:639 - application_1398450350082_0001 State > change from NEW to KILLED > 18:33:45,485 WARN RMAppImpl:331 - The specific max attempts: 0 for > application: 2 is invalid, because it is out of the range [1, 50]. Use the > global max attempts instead. > 18:33:45,485 INFO RMAppImpl:651 - Recovering app: > application_1398450350082_0002 with 8 attempts and final state = KILLED > 18:33:45,486 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0002_01 with final state: KILLED > 18:33:45,486 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0002_02 with final state: FAILED > 18:33:45,487 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0002_03 with final state: FAILED > 18:33:45,487 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0002_04 with final state: FAILED > 18:33:45,488 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0002_05 with final state: FAILED > 18:33:45,488 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0002_06 with final state: FAILED > 18:33:45,489 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0002_07 with final state: FAILED > 18:33:45,489 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0002_08 with final state: FAILED > 18:33:45,490 INFO RMAppAttemptImpl:659 - > appattempt_1398450350082_0002_01 State change from NEW to KILLED > 18:33:45,490
[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047006#comment-14047006 ] Hudson commented on YARN-614: - SUCCESS: Integrated in Hadoop-trunk-Commit #5797 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5797/]) YARN-614. Changed ResourceManager to not count disk failure, node loss and RM restart towards app failures. Contributed by Xuan Gong (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606407) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java > Separate AM failures from hardware failure or YARN error and do not count > them to AM retry count > > > Key: YARN-614 > URL: https://issues.apache.org/jira/browse/YARN-614 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Xuan Gong > Fix For: 2.5.0 > > Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, > YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, > YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.13.patch, > YARN-614.7.patch, YARN-614.8.patch, YARN-614.9.patch > > > Attempts can fail due to a large number of user errors and they should not be > retried unnecessarily. The only reason YARN should retry an attempt is when > the hardware fails or YARN has an error. NM failing, lost NM and NM disk > errors are the hardware errors that come to mind. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047002#comment-14047002 ] Jian He commented on YARN-614: -- Committed to trunk and branch-2, thanks Xuan! > Separate AM failures from hardware failure or YARN error and do not count > them to AM retry count > > > Key: YARN-614 > URL: https://issues.apache.org/jira/browse/YARN-614 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Xuan Gong > Fix For: 2.5.0 > > Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, > YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, > YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.13.patch, > YARN-614.7.patch, YARN-614.8.patch, YARN-614.9.patch > > > Attempts can fail due to a large number of user errors and they should not be > retried unnecessarily. The only reason YARN should retry an attempt is when > the hardware fails or YARN has an error. NM failing, lost NM and NM disk > errors are the hardware errors that come to mind. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047000#comment-14047000 ] Jian He commented on YARN-2052: --- found that may be we can change epochProto to use int64 also. For now 32 should be enough, but we never know when we need 64 in the future just like container Id. > ContainerId creation after work preserving restart is broken > > > Key: YARN-2052 > URL: https://issues.apache.org/jira/browse/YARN-2052 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2052.1.patch, YARN-2052.10.patch, > YARN-2052.11.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, > YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, > YARN-2052.9.patch, YARN-2052.9.patch > > > Container ids are made unique by using the app identifier and appending a > monotonically increasing sequence number to it. Since container creation is a > high churn activity the RM does not store the sequence number per app. So > after restart it does not know what the new sequence number should be for new > allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)
[ https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kawa updated YARN-2230: Component/s: scheduler client > Fix description of yarn.scheduler.maximum-allocation-vcores in > yarn-default.xml (or code) > - > > Key: YARN-2230 > URL: https://issues.apache.org/jira/browse/YARN-2230 > Project: Hadoop YARN > Issue Type: Bug > Components: client, scheduler >Affects Versions: 2.4.0 >Reporter: Adam Kawa >Priority: Minor > > When a user requests more vcores than the allocation limit (e.g. > mapreduce.map.cpu.vcores is larger than > yarn.scheduler.maximum-allocation-vcores), then > InvalidResourceRequestException is thrown - > https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java > {code} > if (resReq.getCapability().getVirtualCores() < 0 || > resReq.getCapability().getVirtualCores() > > maximumResource.getVirtualCores()) { > throw new InvalidResourceRequestException("Invalid resource request" > + ", requested virtual cores < 0" > + ", or requested virtual cores > max configured" > + ", requestedVirtualCores=" > + resReq.getCapability().getVirtualCores() > + ", maxVirtualCores=" + maximumResource.getVirtualCores()); > } > {code} > According to documentation - yarn-default.xml > http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml, > the request should be capped to the allocation limit. > {code} > > The maximum allocation for every container request at the RM, > in terms of virtual CPU cores. Requests higher than this won't take > effect, > and will get capped to this value. > yarn.scheduler.maximum-allocation-vcores > 32 > > {code} > This means that: > * Either documentation or code should be corrected (unless this exception is > handled elsewhere accordingly, but it looks that it is not). > This behavior is confusing, because when such a job (with > mapreduce.map.cpu.vcores is larger than > yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any > progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g. > {code} > 2014-06-29 00:34:51,469 WARN > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: > Invalid resource ask by application appattempt_1403993411503_0002_01 > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested virtual cores < 0, or requested virtual cores > > max configured, requestedVirtualCores=32, maxVirtualCores=3 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237) > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420) > . > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) > {code} > * IMHO, such an exception should be forwarded to client. Otherwise, it is non > obvious to discover why a job does not make any progress. > The same looks to be related to memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)
[ https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kawa updated YARN-2230: Affects Version/s: 2.4.0 > Fix description of yarn.scheduler.maximum-allocation-vcores in > yarn-default.xml (or code) > - > > Key: YARN-2230 > URL: https://issues.apache.org/jira/browse/YARN-2230 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Adam Kawa >Priority: Minor > > When a user requests more vcores than the allocation limit (e.g. > mapreduce.map.cpu.vcores is larger than > yarn.scheduler.maximum-allocation-vcores), then > InvalidResourceRequestException is thrown - > https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java > {code} > if (resReq.getCapability().getVirtualCores() < 0 || > resReq.getCapability().getVirtualCores() > > maximumResource.getVirtualCores()) { > throw new InvalidResourceRequestException("Invalid resource request" > + ", requested virtual cores < 0" > + ", or requested virtual cores > max configured" > + ", requestedVirtualCores=" > + resReq.getCapability().getVirtualCores() > + ", maxVirtualCores=" + maximumResource.getVirtualCores()); > } > {code} > According to documentation - yarn-default.xml > http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml, > the request should be capped to the allocation limit. > {code} > > The maximum allocation for every container request at the RM, > in terms of virtual CPU cores. Requests higher than this won't take > effect, > and will get capped to this value. > yarn.scheduler.maximum-allocation-vcores > 32 > > {code} > This means that: > * Either documentation or code should be corrected (unless this exception is > handled elsewhere accordingly, but it looks that it is not). > This behavior is confusing, because when such a job (with > mapreduce.map.cpu.vcores is larger than > yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any > progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g. > {code} > 2014-06-29 00:34:51,469 WARN > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: > Invalid resource ask by application appattempt_1403993411503_0002_01 > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested virtual cores < 0, or requested virtual cores > > max configured, requestedVirtualCores=32, maxVirtualCores=3 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237) > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420) > . > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) > {code} > * IMHO, such an exception should be forwarded to client. Otherwise, it is non > obvious to discover why a job does not make any progress. > The same looks to be related to memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)
[ https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kawa updated YARN-2230: Description: When a user requests more vcores than the allocation limit (e.g. mapreduce.map.cpu.vcores is larger than yarn.scheduler.maximum-allocation-vcores), then InvalidResourceRequestException is thrown - https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java {code} if (resReq.getCapability().getVirtualCores() < 0 || resReq.getCapability().getVirtualCores() > maximumResource.getVirtualCores()) { throw new InvalidResourceRequestException("Invalid resource request" + ", requested virtual cores < 0" + ", or requested virtual cores > max configured" + ", requestedVirtualCores=" + resReq.getCapability().getVirtualCores() + ", maxVirtualCores=" + maximumResource.getVirtualCores()); } {code} According to documentation - yarn-default.xml http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml, the request should be capped to the allocation limit. {code} The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value. yarn.scheduler.maximum-allocation-vcores 32 {code} This means that: * Either documentation or code should be corrected (unless this exception is handled elsewhere accordingly, but it looks that it is not). This behavior is confusing, because when such a job (with mapreduce.map.cpu.vcores is larger than yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g. {code} 2014-06-29 00:34:51,469 WARN org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Invalid resource ask by application appattempt_1403993411503_0002_01 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores < 0, or requested virtual cores > max configured, requestedVirtualCores=32, maxVirtualCores=3 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237) at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420) . at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) {code} * IMHO, such an exception should be forwarded to client. Otherwise, it is non obvious to discover why a job does not make any progress. The same looks to be related to memory. was: When a user requests more vcores than the allocation limit (e.g. mapreduce.map.cpu.vcores is larger than yarn.scheduler.maximum-allocation-vcores), then InvalidResourceRequestException is thrown - https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java {code} if (resReq.getCapability().getVirtualCores() < 0 || resReq.getCapability().getVirtualCores() > maximumResource.getVirtualCores()) { throw new InvalidResourceRequestException("Invalid resource request" + ", requested virtual cores < 0" + ", or requested virtual cores > max configured" + ", requestedVirtualCores=" + resReq.getCapability().getVirtualCores() + ", maxVirtualCores=" + maximumResource.getVirtualCores()); } {code} According to documentation - yarn-default.xml http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml, the request will be capped to the allocation limit: {code} The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value. yarn.scheduler.maximum-allocation-vcores 32 {code}
[jira] [Updated] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)
[ https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kawa updated YARN-2230: Summary: Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code) (was: Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code to show)) > Fix description of yarn.scheduler.maximum-allocation-vcores in > yarn-default.xml (or code) > - > > Key: YARN-2230 > URL: https://issues.apache.org/jira/browse/YARN-2230 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Adam Kawa >Priority: Minor > > When a user requests more vcores than the allocation limit (e.g. > mapreduce.map.cpu.vcores is larger than > yarn.scheduler.maximum-allocation-vcores), then > InvalidResourceRequestException is thrown - > https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java > {code} > if (resReq.getCapability().getVirtualCores() < 0 || > resReq.getCapability().getVirtualCores() > > maximumResource.getVirtualCores()) { > throw new InvalidResourceRequestException("Invalid resource request" > + ", requested virtual cores < 0" > + ", or requested virtual cores > max configured" > + ", requestedVirtualCores=" > + resReq.getCapability().getVirtualCores() > + ", maxVirtualCores=" + maximumResource.getVirtualCores()); > } > {code} > According to documentation - yarn-default.xml > http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml, > the request will be capped to the allocation limit: > {code} > > The maximum allocation for every container request at the RM, > in terms of virtual CPU cores. Requests higher than this won't take > effect, > and will get capped to this value. > yarn.scheduler.maximum-allocation-vcores > 32 > > {code} > * Either documentation or code should be corrected (unless this exception is > handled elsewhere accordingly, but it looks that it is not). > This behavior is confusing, because when such a job (with > mapreduce.map.cpu.vcores is larger than > yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any > progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g. > {code} > 2014-06-29 00:34:51,469 WARN > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: > Invalid resource ask by application appattempt_1403993411503_0002_01 > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested virtual cores < 0, or requested virtual cores > > max configured, requestedVirtualCores=32, maxVirtualCores=3 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237) > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) > {code} > * IMHO, such an exception should be forwarded to client. Otherwise, it is non > obvious to discover why a job does not make any progress. > The same looks to be related to memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code to show)
Adam Kawa created YARN-2230: --- Summary: Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code to show) Key: YARN-2230 URL: https://issues.apache.org/jira/browse/YARN-2230 Project: Hadoop YARN Issue Type: Bug Reporter: Adam Kawa Priority: Minor When a user requests more vcores than the allocation limit (e.g. mapreduce.map.cpu.vcores is larger than yarn.scheduler.maximum-allocation-vcores), then InvalidResourceRequestException is thrown - https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java {code} if (resReq.getCapability().getVirtualCores() < 0 || resReq.getCapability().getVirtualCores() > maximumResource.getVirtualCores()) { throw new InvalidResourceRequestException("Invalid resource request" + ", requested virtual cores < 0" + ", or requested virtual cores > max configured" + ", requestedVirtualCores=" + resReq.getCapability().getVirtualCores() + ", maxVirtualCores=" + maximumResource.getVirtualCores()); } {code} According to documentation - yarn-default.xml http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml, the request will be capped to the allocation limit: {code} The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value. yarn.scheduler.maximum-allocation-vcores 32 {code} * Either documentation or code should be corrected (unless this exception is handled elsewhere accordingly, but it looks that it is not). This behavior is confusing, because when such a job (with mapreduce.map.cpu.vcores is larger than yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g. {code} 2014-06-29 00:34:51,469 WARN org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Invalid resource ask by application appattempt_1403993411503_0002_01 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores < 0, or requested virtual cores > max configured, requestedVirtualCores=32, maxVirtualCores=3 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237) at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) {code} * IMHO, such an exception should be forwarded to client. Otherwise, it is non obvious to discover why a job does not make any progress. The same looks to be related to memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046900#comment-14046900 ] Hadoop QA commented on YARN-1366: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653008/YARN-1366.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4130//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4130//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4130//console This message is automatically generated. > AM should implement Resync with the ApplicationMasterService instead of > shutting down > - > > Key: YARN-1366 > URL: https://issues.apache.org/jira/browse/YARN-1366 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Rohith > Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, > YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.patch, > YARN-1366.prototype.patch, YARN-1366.prototype.patch > > > The ApplicationMasterService currently sends a resync response to which the > AM responds by shutting down. The AM behavior is expected to change to > calling resyncing with the RM. Resync means resetting the allocate RPC > sequence number to 0 and the AM should send its entire outstanding request to > the RM. Note that if the AM is making its first allocate call to the RM then > things should proceed like normal without needing a resync. The RM will > return all containers that have completed since the RM last synced with the > AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2223) NPE on ResourceManager recover
[ https://issues.apache.org/jira/browse/YARN-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2223: - Environment: JDK 8u5 > NPE on ResourceManager recover > -- > > Key: YARN-2223 > URL: https://issues.apache.org/jira/browse/YARN-2223 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.1 > Environment: JDK 8u5 >Reporter: Jon Bringhurst > > I upgraded two clusters from tag 2.2.0 to branch-2.4.1 (latest commit is > https://github.com/apache/hadoop-common/commit/c96c8e45a60651b677a1de338b7856a444dc0461). > Both clusters have the same config (other than hostnames). Both are running > on JDK8u5 (I'm not sure if this is a factor here). > One cluster started up without any errors. The other started up with the > following error on the RM: > {noformat} > 18:33:45,463 WARN RMAppImpl:331 - The specific max attempts: 0 for > application: 1 is invalid, because it is out of the range [1, 50]. Use the > global max attempts instead. > 18:33:45,465 INFO RMAppImpl:651 - Recovering app: > application_1398450350082_0001 with 8 attempts and final state = KILLED > 18:33:45,468 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0001_01 with final state: KILLED > 18:33:45,478 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0001_02 with final state: FAILED > 18:33:45,478 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0001_03 with final state: FAILED > 18:33:45,479 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0001_04 with final state: FAILED > 18:33:45,479 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0001_05 with final state: FAILED > 18:33:45,480 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0001_06 with final state: FAILED > 18:33:45,480 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0001_07 with final state: FAILED > 18:33:45,481 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0001_08 with final state: FAILED > 18:33:45,482 INFO RMAppAttemptImpl:659 - > appattempt_1398450350082_0001_01 State change from NEW to KILLED > 18:33:45,482 INFO RMAppAttemptImpl:659 - > appattempt_1398450350082_0001_02 State change from NEW to FAILED > 18:33:45,482 INFO RMAppAttemptImpl:659 - > appattempt_1398450350082_0001_03 State change from NEW to FAILED > 18:33:45,482 INFO RMAppAttemptImpl:659 - > appattempt_1398450350082_0001_04 State change from NEW to FAILED > 18:33:45,483 INFO RMAppAttemptImpl:659 - > appattempt_1398450350082_0001_05 State change from NEW to FAILED > 18:33:45,483 INFO RMAppAttemptImpl:659 - > appattempt_1398450350082_0001_06 State change from NEW to FAILED > 18:33:45,483 INFO RMAppAttemptImpl:659 - > appattempt_1398450350082_0001_07 State change from NEW to FAILED > 18:33:45,483 INFO RMAppAttemptImpl:659 - > appattempt_1398450350082_0001_08 State change from NEW to FAILED > 18:33:45,485 INFO RMAppImpl:639 - application_1398450350082_0001 State > change from NEW to KILLED > 18:33:45,485 WARN RMAppImpl:331 - The specific max attempts: 0 for > application: 2 is invalid, because it is out of the range [1, 50]. Use the > global max attempts instead. > 18:33:45,485 INFO RMAppImpl:651 - Recovering app: > application_1398450350082_0002 with 8 attempts and final state = KILLED > 18:33:45,486 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0002_01 with final state: KILLED > 18:33:45,486 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0002_02 with final state: FAILED > 18:33:45,487 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0002_03 with final state: FAILED > 18:33:45,487 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0002_04 with final state: FAILED > 18:33:45,488 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0002_05 with final state: FAILED > 18:33:45,488 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0002_06 with final state: FAILED > 18:33:45,489 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0002_07 with final state: FAILED > 18:33:45,489 INFO RMAppAttemptImpl:691 - Recovering attempt: > appattempt_1398450350082_0002_08 with final state: FAILED > 18:33:45,490 INFO RMAppAttemptImpl:659 - > appattempt_1398450350082_0002_01 State change from NEW to KILLED > 18:33:45,490 INFO RMAppAttemptImpl:659 - > appattempt_1398450350082_0002_02 State change from NEW to FAILED > 18:33:45,490 INFO RMAppAttemptImpl:659 - > appattempt_1398450350082_0002_0
[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.6.patch Attached updated the patch. Please review the patch > AM should implement Resync with the ApplicationMasterService instead of > shutting down > - > > Key: YARN-1366 > URL: https://issues.apache.org/jira/browse/YARN-1366 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Rohith > Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, > YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.patch, > YARN-1366.prototype.patch, YARN-1366.prototype.patch > > > The ApplicationMasterService currently sends a resync response to which the > AM responds by shutting down. The AM behavior is expected to change to > calling resyncing with the RM. Resync means resetting the allocate RPC > sequence number to 0 and the AM should send its entire outstanding request to > the RM. Note that if the AM is making its first allocate call to the RM then > things should proceed like normal without needing a resync. The RM will > return all containers that have completed since the RM last synced with the > AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed
[ https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046880#comment-14046880 ] Hudson commented on YARN-2104: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1815 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1815/]) YARN-2104. Scheduler queue filter failed to work because index of queue column changed. Contributed by Wangda Tan (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606265) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerPage.java > Scheduler queue filter failed to work because index of queue column changed > --- > > Key: YARN-2104 > URL: https://issues.apache.org/jira/browse/YARN-2104 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Wangda Tan >Assignee: Wangda Tan > Fix For: 3.0.0, 2.5.0 > > Attachments: YARN-2104.patch > > > YARN-563 added, > {code} > + th(".type", "Application Type”). > {code} > to application table, which makes queue’s column index from 3 to 4. And in > scheduler page, queue’s column index is hard coded to 3 when filter > application with queue’s name, > {code} > "if (q == 'root') q = '';", > "else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';", > "$('#apps').dataTable().fnFilter(q, 3, true);", > {code} > So queue filter will not work for application page. > Reproduce steps: (Thanks Bo Yang for pointing this) > {code} > 1) In default setup, there’s a default queue under root queue > 2) Run an arbitrary application, you can find it in “Applications” page > 3) Click “Default” queue in scheduler page > 4) Click “Applications”, no application will show here > 5) Click “Root” queue in scheduler page > 6) Click “Applications”, application will show again > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2201) TestRMWebServicesAppsModification dependent on yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046882#comment-14046882 ] Hudson commented on YARN-2201: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1815 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1815/]) YARN-2201. Made TestRMWebServicesAppsModification be independent of the changes on yarn-default.xml. Contributed by Varun Vasudev. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606285) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java > TestRMWebServicesAppsModification dependent on yarn-default.xml > --- > > Key: YARN-2201 > URL: https://issues.apache.org/jira/browse/YARN-2201 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ray Chiang >Assignee: Varun Vasudev > Labels: test > Fix For: 2.5.0 > > Attachments: apache-yarn-2201.0.patch, apache-yarn-2201.1.patch, > apache-yarn-2201.2.patch, apache-yarn-2201.3.patch > > > TestRMWebServicesAppsModification.java has some errors that are > yarn-default.xml dependent. By changing yarn-default.xml properties, I'm > seeing the following errors: > 1) Changing yarn.resourcemanager.scheduler.class from > capacity.CapacityScheduler to fair.FairScheduler gives the error: > Running > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 79.047 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > testSingleAppKillUnauthorized[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 3.22 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458) > 2) Changing yarn.acl.enable from false to true results in the following > errors: > Running > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > Tests run: 10, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 49.044 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > testSingleAppKill[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 2.986 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:287) > testSingleAppKillInvalidState[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 2.258 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidState(TestRMWebServicesAppsModification.java:369) > testSingleAppKillUnauthorized[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 2.263 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458) > testSingleAppKillInvalidId[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 0.214 sec <<< FAILURE! > java.
[jira] [Commented] (YARN-2204) TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046883#comment-14046883 ] Hudson commented on YARN-2204: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1815 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1815/]) YARN-2204. Addendum patch. TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler. (Robert Kanter via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606168) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java > TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler > --- > > Key: YARN-2204 > URL: https://issues.apache.org/jira/browse/YARN-2204 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Trivial > Fix For: 2.5.0 > > Attachments: YARN-2204.patch, YARN-2204_addendum.patch, > YARN-2204_addendum.patch > > > TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2201) TestRMWebServicesAppsModification dependent on yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046869#comment-14046869 ] Hudson commented on YARN-2201: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1788 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1788/]) YARN-2201. Made TestRMWebServicesAppsModification be independent of the changes on yarn-default.xml. Contributed by Varun Vasudev. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606285) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java > TestRMWebServicesAppsModification dependent on yarn-default.xml > --- > > Key: YARN-2201 > URL: https://issues.apache.org/jira/browse/YARN-2201 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ray Chiang >Assignee: Varun Vasudev > Labels: test > Fix For: 2.5.0 > > Attachments: apache-yarn-2201.0.patch, apache-yarn-2201.1.patch, > apache-yarn-2201.2.patch, apache-yarn-2201.3.patch > > > TestRMWebServicesAppsModification.java has some errors that are > yarn-default.xml dependent. By changing yarn-default.xml properties, I'm > seeing the following errors: > 1) Changing yarn.resourcemanager.scheduler.class from > capacity.CapacityScheduler to fair.FairScheduler gives the error: > Running > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 79.047 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > testSingleAppKillUnauthorized[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 3.22 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458) > 2) Changing yarn.acl.enable from false to true results in the following > errors: > Running > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > Tests run: 10, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 49.044 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > testSingleAppKill[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 2.986 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:287) > testSingleAppKillInvalidState[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 2.258 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidState(TestRMWebServicesAppsModification.java:369) > testSingleAppKillUnauthorized[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 2.263 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458) > testSingleAppKillInvalidId[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 0.214 sec <<< FAILURE! > java.lang.Asser
[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed
[ https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046867#comment-14046867 ] Hudson commented on YARN-2104: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1788 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1788/]) YARN-2104. Scheduler queue filter failed to work because index of queue column changed. Contributed by Wangda Tan (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606265) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerPage.java > Scheduler queue filter failed to work because index of queue column changed > --- > > Key: YARN-2104 > URL: https://issues.apache.org/jira/browse/YARN-2104 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Wangda Tan >Assignee: Wangda Tan > Fix For: 3.0.0, 2.5.0 > > Attachments: YARN-2104.patch > > > YARN-563 added, > {code} > + th(".type", "Application Type”). > {code} > to application table, which makes queue’s column index from 3 to 4. And in > scheduler page, queue’s column index is hard coded to 3 when filter > application with queue’s name, > {code} > "if (q == 'root') q = '';", > "else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';", > "$('#apps').dataTable().fnFilter(q, 3, true);", > {code} > So queue filter will not work for application page. > Reproduce steps: (Thanks Bo Yang for pointing this) > {code} > 1) In default setup, there’s a default queue under root queue > 2) Run an arbitrary application, you can find it in “Applications” page > 3) Click “Default” queue in scheduler page > 4) Click “Applications”, no application will show here > 5) Click “Root” queue in scheduler page > 6) Click “Applications”, application will show again > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2204) TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046871#comment-14046871 ] Hudson commented on YARN-2204: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1788 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1788/]) YARN-2204. Addendum patch. TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler. (Robert Kanter via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606168) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java > TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler > --- > > Key: YARN-2204 > URL: https://issues.apache.org/jira/browse/YARN-2204 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Trivial > Fix For: 2.5.0 > > Attachments: YARN-2204.patch, YARN-2204_addendum.patch, > YARN-2204_addendum.patch > > > TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046853#comment-14046853 ] Remus Rusanu commented on YARN-2198: I got this working using LPC, but there are some complications vis-a-vis stdout/stderr. With a helper service the nodemanager no longer gets a free lunch of accessing the task stdout/stderr. Solutions exists: - read stdout/stderr from the helper and pump them over the LPC interface back to NM - explictly set a .out and .err file for the task and use them as stdout/stderr for the container launch. Note that the problem applies to localizer launch too, which does no have a stdout/stderr redirect in the launch script. Another complication is the Windows job model of NM/winutils. winutils create a job for the container and joins the job itself, ensuring a controlled lifespan for the all task launched processes. The service helper cannot join the job as it has it own, independent, lifespan. I solved this problem by having the helper service launch "wintuls task createAsUser ..." as an ordinary CreateProcess in the LPC server routine, rather than attempt to do the S4U impersonation in the helper service process itself. This works fine, and also greatly reduces the risks associated with leaking handles as the heavy work (=leak risk) occurs in a sub-process, not in the service. I will have to investigate if there is any known issue vis-a-vis a very long LPC call (winutils waits for the spawned processes to finish). I there is, the solution would be for the helper service to hand over the spwaned task to the NM (duplicate the process task in the NM, yuck) and have the NM JNI (the LPC client) do the actual process handle wait (ie. blocking wait for task to finish). This would make the LPC call short (spawn process, duplicate handle, return handle to NM) at the risk of some induced complications. Also this would make the whole stdout/stderr transfer even more cumbersome if we opt for pipes vs. .out/.err files (open by helper process, duplicate it in NM, have the NM read the handles...) > Remove the need to run NodeManager as privileged account for Windows Secure > Container Executor > -- > > Key: YARN-2198 > URL: https://issues.apache.org/jira/browse/YARN-2198 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Labels: security, windows > > YARN-1972 introduces a Secure Windows Container Executor. However this > executor requires a the process launching the container to be LocalSystem or > a member of the a local Administrators group. Since the process in question > is the NodeManager, the requirement translates to the entire NM to run as a > privileged account, a very large surface area to review and protect. > This proposal is to move the privileged operations into a dedicated NT > service. The NM can run as a low privilege account and communicate with the > privileged NT service when it needs to launch a container. This would reduce > the surface exposed to the high privileges. > There has to exist a secure, authenticated and authorized channel of > communication between the NM and the privileged NT service. Possible > alternatives are a new TCP endpoint, Java RPC etc. My proposal though would > be to use Windows LPC (Local Procedure Calls), which is a Windows platform > specific inter-process communication channel that satisfies all requirements > and is easy to deploy. The privileged NT service would register and listen on > an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop > with libwinutils which would host the LPC client code. The client would > connect to the LPC port (NtConnectPort) and send a message requesting a > container launch (NtRequestWaitReplyPort). LPC provides authentication and > the privileged NT service can use authorization API (AuthZ) to validate the > caller. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046826#comment-14046826 ] Rohith commented on YARN-1366: -- Looking into fix findbug warning and test case. Will update patch once it is done. > AM should implement Resync with the ApplicationMasterService instead of > shutting down > - > > Key: YARN-1366 > URL: https://issues.apache.org/jira/browse/YARN-1366 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Rohith > Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, > YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.patch, > YARN-1366.prototype.patch, YARN-1366.prototype.patch > > > The ApplicationMasterService currently sends a resync response to which the > AM responds by shutting down. The AM behavior is expected to change to > calling resyncing with the RM. Resync means resetting the allocate RPC > sequence number to 0 and the AM should send its entire outstanding request to > the RM. Note that if the AM is making its first allocate call to the RM then > things should proceed like normal without needing a resync. The RM will > return all containers that have completed since the RM last synced with the > AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2201) TestRMWebServicesAppsModification dependent on yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046817#comment-14046817 ] Hudson commented on YARN-2201: -- FAILURE: Integrated in Hadoop-Yarn-trunk #597 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/597/]) YARN-2201. Made TestRMWebServicesAppsModification be independent of the changes on yarn-default.xml. Contributed by Varun Vasudev. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606285) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java > TestRMWebServicesAppsModification dependent on yarn-default.xml > --- > > Key: YARN-2201 > URL: https://issues.apache.org/jira/browse/YARN-2201 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ray Chiang >Assignee: Varun Vasudev > Labels: test > Fix For: 2.5.0 > > Attachments: apache-yarn-2201.0.patch, apache-yarn-2201.1.patch, > apache-yarn-2201.2.patch, apache-yarn-2201.3.patch > > > TestRMWebServicesAppsModification.java has some errors that are > yarn-default.xml dependent. By changing yarn-default.xml properties, I'm > seeing the following errors: > 1) Changing yarn.resourcemanager.scheduler.class from > capacity.CapacityScheduler to fair.FairScheduler gives the error: > Running > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 79.047 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > testSingleAppKillUnauthorized[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 3.22 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458) > 2) Changing yarn.acl.enable from false to true results in the following > errors: > Running > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > Tests run: 10, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 49.044 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > testSingleAppKill[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 2.986 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:287) > testSingleAppKillInvalidState[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 2.258 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidState(TestRMWebServicesAppsModification.java:369) > testSingleAppKillUnauthorized[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 2.263 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458) > testSingleAppKillInvalidId[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 0.214 sec <<< FAILURE! > java.lang.Asserti
[jira] [Commented] (YARN-2204) TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046819#comment-14046819 ] Hudson commented on YARN-2204: -- FAILURE: Integrated in Hadoop-Yarn-trunk #597 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/597/]) YARN-2204. Addendum patch. TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler. (Robert Kanter via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606168) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java > TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler > --- > > Key: YARN-2204 > URL: https://issues.apache.org/jira/browse/YARN-2204 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Trivial > Fix For: 2.5.0 > > Attachments: YARN-2204.patch, YARN-2204_addendum.patch, > YARN-2204_addendum.patch > > > TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed
[ https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046815#comment-14046815 ] Hudson commented on YARN-2104: -- FAILURE: Integrated in Hadoop-Yarn-trunk #597 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/597/]) YARN-2104. Scheduler queue filter failed to work because index of queue column changed. Contributed by Wangda Tan (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606265) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerPage.java > Scheduler queue filter failed to work because index of queue column changed > --- > > Key: YARN-2104 > URL: https://issues.apache.org/jira/browse/YARN-2104 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Wangda Tan >Assignee: Wangda Tan > Fix For: 3.0.0, 2.5.0 > > Attachments: YARN-2104.patch > > > YARN-563 added, > {code} > + th(".type", "Application Type”). > {code} > to application table, which makes queue’s column index from 3 to 4. And in > scheduler page, queue’s column index is hard coded to 3 when filter > application with queue’s name, > {code} > "if (q == 'root') q = '';", > "else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';", > "$('#apps').dataTable().fnFilter(q, 3, true);", > {code} > So queue filter will not work for application page. > Reproduce steps: (Thanks Bo Yang for pointing this) > {code} > 1) In default setup, there’s a default queue under root queue > 2) Run an arbitrary application, you can find it in “Applications” page > 3) Click “Default” queue in scheduler page > 4) Click “Applications”, no application will show here > 5) Click “Root” queue in scheduler page > 6) Click “Applications”, application will show again > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)