[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status
[ https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anders updated YARN-2142: - Attachment: trust003.patch Add one service to check the nodes' TRUST status - Key: YARN-2142 URL: https://issues.apache.org/jira/browse/YARN-2142 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager, scheduler, webapp Affects Versions: 2.2.0 Environment: OS:Ubuntu 13.04; JAVA:OpenJDK 7u51-2.4.4-0 Reporter: anders Priority: Minor Labels: patch Fix For: 2.2.0 Attachments: test.patch, trust.patch, trust.patch, trust.patch, trust001.patch, trust002.patch, trust003.patch Original Estimate: 1m Remaining Estimate: 1m Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's health checkservice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047412#comment-14047412 ] Rohith commented on YARN-1366: -- Thank you for reviewing patch. I will update patch soon. One update on the comment, bq.testAMRMClientResendsRequestsOnRMRestart seems not testing re-sending pendingReleases across RM restart, because the pending releases seems already decremented to zero before restart happens The test does verification of pending release too. Before restart , 1st container is released. Below code releases only 1 container. One approach is to make test stronger, number of container can be asserted before/after this code. {noformat} int pendingRelease = 0; IteratorContainer it = allocatedContainers.iterator(); while (it.hasNext()) { amClient.releaseAssignedContainer(it.next().getId()); pendingRelease++; it.remove(); break;// remove one container } {noformat} AM should implement Resync with the ApplicationMasterService instead of shutting down - Key: YARN-1366 URL: https://issues.apache.org/jira/browse/YARN-1366 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Rohith Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The AM behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2142) Add one service to check the nodes' TRUST status
[ https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047413#comment-14047413 ] Hadoop QA commented on YARN-2142: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653105/trust003.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4139//console This message is automatically generated. Add one service to check the nodes' TRUST status - Key: YARN-2142 URL: https://issues.apache.org/jira/browse/YARN-2142 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager, scheduler, webapp Affects Versions: 2.2.0 Environment: OS:Ubuntu 13.04; JAVA:OpenJDK 7u51-2.4.4-0 Reporter: anders Priority: Minor Labels: patch Fix For: 2.2.0 Attachments: test.patch, trust.patch, trust.patch, trust.patch, trust001.patch, trust002.patch, trust003.patch Original Estimate: 1m Remaining Estimate: 1m Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's health checkservice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2142) Add one service to check the nodes' TRUST status
[ https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047425#comment-14047425 ] Hadoop QA commented on YARN-2142: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653106/trust2.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4140//console This message is automatically generated. Add one service to check the nodes' TRUST status - Key: YARN-2142 URL: https://issues.apache.org/jira/browse/YARN-2142 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager, scheduler, webapp Affects Versions: 2.2.0 Environment: OS:Ubuntu 13.04; JAVA:OpenJDK 7u51-2.4.4-0 Reporter: anders Priority: Minor Labels: patch Fix For: 2.2.0 Attachments: test.patch, trust.patch, trust.patch, trust.patch, trust001.patch, trust002.patch, trust003.patch, trust2.patch Original Estimate: 1m Remaining Estimate: 1m Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's health checkservice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047432#comment-14047432 ] Jian He commented on YARN-1366: --- bq. The test does verification of pending release too I see. - We can move release.addAll(this.pendingRelease); to the first isResyncCommand check? similarly, blacklistAdditions.addAll(this.blacklistedNodes); - please add some comments to pendingRelease and release variables AM should implement Resync with the ApplicationMasterService instead of shutting down - Key: YARN-1366 URL: https://issues.apache.org/jira/browse/YARN-1366 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Rohith Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The AM behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status
[ https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anders updated YARN-2142: - Environment: OS:Ubuntu 13.04; JAVA:OpenJDK 7u51-2.4.4-0 Only in branch-2.2.0. was: OS:Ubuntu 13.04; JAVA:OpenJDK 7u51-2.4.4-0 Add one service to check the nodes' TRUST status - Key: YARN-2142 URL: https://issues.apache.org/jira/browse/YARN-2142 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager, scheduler, webapp Affects Versions: 2.2.0 Environment: OS:Ubuntu 13.04; JAVA:OpenJDK 7u51-2.4.4-0 Only in branch-2.2.0. Reporter: anders Priority: Minor Labels: patch Fix For: 2.2.0 Attachments: test.patch, trust.patch, trust.patch, trust.patch, trust001.patch, trust002.patch, trust003.patch, trust2.patch Original Estimate: 1m Remaining Estimate: 1m Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's health checkservice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status
[ https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anders updated YARN-2142: - Description: Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's health checkservice. was: Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's health checkservice. Add one service to check the nodes' TRUST status - Key: YARN-2142 URL: https://issues.apache.org/jira/browse/YARN-2142 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager, scheduler, webapp Affects Versions: 2.2.0 Environment: OS:Ubuntu 13.04; JAVA:OpenJDK 7u51-2.4.4-0 Only in branch-2.2.0. Reporter: anders Priority: Minor Labels: patch Fix For: 2.2.0 Attachments: test.patch, trust.patch, trust.patch, trust.patch, trust001.patch, trust002.patch, trust003.patch, trust2.patch Original Estimate: 1m Remaining Estimate: 1m Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's health checkservice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status
[ https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anders updated YARN-2142: - Description: Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's health checkservice. ***Only in branch-2.2.0 , not in trunk*** was: Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's health checkservice. --Only in branch-2.2.0 , not in trunk-- Add one service to check the nodes' TRUST status - Key: YARN-2142 URL: https://issues.apache.org/jira/browse/YARN-2142 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager, scheduler, webapp Affects Versions: 2.2.0 Environment: OS:Ubuntu 13.04; JAVA:OpenJDK 7u51-2.4.4-0 Only in branch-2.2.0. Reporter: anders Priority: Minor Labels: patch Fix For: 2.2.0 Attachments: test.patch, trust.patch, trust.patch, trust.patch, trust001.patch, trust002.patch, trust003.patch, trust2.patch Original Estimate: 1m Remaining Estimate: 1m Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's health checkservice. ***Only in branch-2.2.0 , not in trunk*** -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status
[ https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anders updated YARN-2142: - Description: Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's health checkservice. Only in branch-2.2.0 , not in trunk was: Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's health checkservice. Add one service to check the nodes' TRUST status - Key: YARN-2142 URL: https://issues.apache.org/jira/browse/YARN-2142 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager, scheduler, webapp Affects Versions: 2.2.0 Environment: OS:Ubuntu 13.04; JAVA:OpenJDK 7u51-2.4.4-0 Only in branch-2.2.0. Reporter: anders Priority: Minor Labels: patch Fix For: 2.2.0 Attachments: test.patch, trust.patch, trust.patch, trust.patch, trust001.patch, trust002.patch, trust003.patch, trust2.patch Original Estimate: 1m Remaining Estimate: 1m Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's health checkservice. Only in branch-2.2.0 , not in trunk -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status
[ https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anders updated YARN-2142: - Description: Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's health checkservice. --Only in branch-2.2.0 , not in trunk-- was: Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's health checkservice. Only in branch-2.2.0 , not in trunk Add one service to check the nodes' TRUST status - Key: YARN-2142 URL: https://issues.apache.org/jira/browse/YARN-2142 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager, scheduler, webapp Affects Versions: 2.2.0 Environment: OS:Ubuntu 13.04; JAVA:OpenJDK 7u51-2.4.4-0 Only in branch-2.2.0. Reporter: anders Priority: Minor Labels: patch Fix For: 2.2.0 Attachments: test.patch, trust.patch, trust.patch, trust.patch, trust001.patch, trust002.patch, trust003.patch, trust2.patch Original Estimate: 1m Remaining Estimate: 1m Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's health checkservice. --Only in branch-2.2.0 , not in trunk-- -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status
[ https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anders updated YARN-2142: - Target Version/s: (was: 2.2.0) Fix Version/s: (was: 2.2.0) Add one service to check the nodes' TRUST status - Key: YARN-2142 URL: https://issues.apache.org/jira/browse/YARN-2142 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager, scheduler, webapp Affects Versions: 2.2.0 Environment: OS:Ubuntu 13.04; JAVA:OpenJDK 7u51-2.4.4-0 Only in branch-2.2.0. Reporter: anders Priority: Minor Labels: patch Attachments: test.patch, trust.patch, trust.patch, trust.patch, trust001.patch, trust002.patch, trust003.patch, trust2.patch Original Estimate: 1m Remaining Estimate: 1m Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's health checkservice. ***Only in branch-2.2.0 , not in trunk*** -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-570) Time strings are formated in different timezone
[ https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047459#comment-14047459 ] Akira AJISAKA commented on YARN-570: Thanks [~ozawa] for the comment. Attaching a patch to change {{renderHadoopDate}} to return almost the same format. Javascript doesn't support z pattern (ex. PDT, JST, ...), so the patch will output in Z pattern (ex. -0800, +0900, ...). In my environment, date is rendered as follows: {code} Mon Jun 30 16:48:18 +0900 2014 // EEE MMM dd hh:mm:ss Z {code} I think it's better to render in Java instead of Javascript to make the format the same. Time strings are formated in different timezone --- Key: YARN-570 URL: https://issues.apache.org/jira/browse/YARN-570 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.2.0 Reporter: Peng Zhang Assignee: Akira AJISAKA Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch Time strings on different page are displayed in different timezone. If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as Wed, 10 Apr 2013 08:29:56 GMT If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 16:29:56 Same value, but different timezone. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-570) Time strings are formated in different timezone
[ https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-570: --- Attachment: YARN-570.3.patch Time strings are formated in different timezone --- Key: YARN-570 URL: https://issues.apache.org/jira/browse/YARN-570 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.2.0 Reporter: Peng Zhang Assignee: Akira AJISAKA Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch, YARN-570.3.patch Time strings on different page are displayed in different timezone. If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as Wed, 10 Apr 2013 08:29:56 GMT If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 16:29:56 Same value, but different timezone. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-570) Time strings are formated in different timezone
[ https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047473#comment-14047473 ] Hadoop QA commented on YARN-570: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653110/YARN-570.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4141//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4141//console This message is automatically generated. Time strings are formated in different timezone --- Key: YARN-570 URL: https://issues.apache.org/jira/browse/YARN-570 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.2.0 Reporter: Peng Zhang Assignee: Akira AJISAKA Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch, YARN-570.3.patch Time strings on different page are displayed in different timezone. If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as Wed, 10 Apr 2013 08:29:56 GMT If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 16:29:56 Same value, but different timezone. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2181) Add preemption info to RM Web UI
[ https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2181: - Attachment: YARN-2181.patch Add preemption info to RM Web UI Key: YARN-2181 URL: https://issues.apache.org/jira/browse/YARN-2181 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page.png, queue page.png We need add preemption info to RM web page to make administrator/user get more understanding about preemption happened on app, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI
[ https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047506#comment-14047506 ] Hadoop QA commented on YARN-2181: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653118/YARN-2181.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4142//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4142//console This message is automatically generated. Add preemption info to RM Web UI Key: YARN-2181 URL: https://issues.apache.org/jira/browse/YARN-2181 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page.png, queue page.png We need add preemption info to RM web page to make administrator/user get more understanding about preemption happened on app, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047563#comment-14047563 ] Hudson commented on YARN-2052: -- FAILURE: Integrated in Hadoop-Yarn-trunk #599 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/599/]) YARN-2052. Embedded an epoch number in container id to ensure the uniqueness of container id after RM restarts. Contributed by Tsuyoshi OZAWA (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1606557) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/Epoch.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/EpochPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerApplicationAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestMaxRunningAppsEnforcer.java ContainerId creation after work preserving restart is broken Key: YARN-2052 URL: https://issues.apache.org/jira/browse/YARN-2052 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA
[jira] [Created] (YARN-2232) ClientRMService doesn't allow delegation token owner to cancel their own token
Varun Vasudev created YARN-2232: --- Summary: ClientRMService doesn't allow delegation token owner to cancel their own token Key: YARN-2232 URL: https://issues.apache.org/jira/browse/YARN-2232 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2232.0.patch The ClientRMSerivce doesn't allow delegation token owners to cancel their own tokens. The root cause is this piece of code from the cancelDelegationToken function - {noformat} String user = getRenewerForToken(token); ... private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) throws IOException { UserGroupInformation user = UserGroupInformation.getCurrentUser(); UserGroupInformation loginUser = UserGroupInformation.getLoginUser(); // we can always renew our own tokens return loginUser.getUserName().equals(user.getUserName()) ? token.decodeIdentifier().getRenewer().toString() : user.getShortUserName(); } {noformat} It ends up passing the user short name to the cancelToken function whereas AbstractDelegationTokenSecretManager::cancelToken expects the full user name. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2232) ClientRMService doesn't allow delegation token owner to cancel their own token
[ https://issues.apache.org/jira/browse/YARN-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2232: Attachment: apache-yarn-2232.0.patch Uploaded patch with fix. ClientRMService doesn't allow delegation token owner to cancel their own token -- Key: YARN-2232 URL: https://issues.apache.org/jira/browse/YARN-2232 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2232.0.patch The ClientRMSerivce doesn't allow delegation token owners to cancel their own tokens. The root cause is this piece of code from the cancelDelegationToken function - {noformat} String user = getRenewerForToken(token); ... private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) throws IOException { UserGroupInformation user = UserGroupInformation.getCurrentUser(); UserGroupInformation loginUser = UserGroupInformation.getLoginUser(); // we can always renew our own tokens return loginUser.getUserName().equals(user.getUserName()) ? token.decodeIdentifier().getRenewer().toString() : user.getShortUserName(); } {noformat} It ends up passing the user short name to the cancelToken function whereas AbstractDelegationTokenSecretManager::cancelToken expects the full user name. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2232) ClientRMService doesn't allow delegation token owner to cancel their own token in secure mode
[ https://issues.apache.org/jira/browse/YARN-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2232: Summary: ClientRMService doesn't allow delegation token owner to cancel their own token in secure mode (was: ClientRMService doesn't allow delegation token owner to cancel their own token) ClientRMService doesn't allow delegation token owner to cancel their own token in secure mode - Key: YARN-2232 URL: https://issues.apache.org/jira/browse/YARN-2232 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2232.0.patch The ClientRMSerivce doesn't allow delegation token owners to cancel their own tokens. The root cause is this piece of code from the cancelDelegationToken function - {noformat} String user = getRenewerForToken(token); ... private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) throws IOException { UserGroupInformation user = UserGroupInformation.getCurrentUser(); UserGroupInformation loginUser = UserGroupInformation.getLoginUser(); // we can always renew our own tokens return loginUser.getUserName().equals(user.getUserName()) ? token.decodeIdentifier().getRenewer().toString() : user.getShortUserName(); } {noformat} It ends up passing the user short name to the cancelToken function whereas AbstractDelegationTokenSecretManager::cancelToken expects the full user name. This bug occurs in secure mode and is not an issue with simple auth. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2232) ClientRMService doesn't allow delegation token owner to cancel their own token
[ https://issues.apache.org/jira/browse/YARN-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2232: Description: The ClientRMSerivce doesn't allow delegation token owners to cancel their own tokens. The root cause is this piece of code from the cancelDelegationToken function - {noformat} String user = getRenewerForToken(token); ... private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) throws IOException { UserGroupInformation user = UserGroupInformation.getCurrentUser(); UserGroupInformation loginUser = UserGroupInformation.getLoginUser(); // we can always renew our own tokens return loginUser.getUserName().equals(user.getUserName()) ? token.decodeIdentifier().getRenewer().toString() : user.getShortUserName(); } {noformat} It ends up passing the user short name to the cancelToken function whereas AbstractDelegationTokenSecretManager::cancelToken expects the full user name. This bug occurs in secure mode and is not an issue with simple auth. was: The ClientRMSerivce doesn't allow delegation token owners to cancel their own tokens. The root cause is this piece of code from the cancelDelegationToken function - {noformat} String user = getRenewerForToken(token); ... private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) throws IOException { UserGroupInformation user = UserGroupInformation.getCurrentUser(); UserGroupInformation loginUser = UserGroupInformation.getLoginUser(); // we can always renew our own tokens return loginUser.getUserName().equals(user.getUserName()) ? token.decodeIdentifier().getRenewer().toString() : user.getShortUserName(); } {noformat} It ends up passing the user short name to the cancelToken function whereas AbstractDelegationTokenSecretManager::cancelToken expects the full user name. ClientRMService doesn't allow delegation token owner to cancel their own token -- Key: YARN-2232 URL: https://issues.apache.org/jira/browse/YARN-2232 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2232.0.patch The ClientRMSerivce doesn't allow delegation token owners to cancel their own tokens. The root cause is this piece of code from the cancelDelegationToken function - {noformat} String user = getRenewerForToken(token); ... private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) throws IOException { UserGroupInformation user = UserGroupInformation.getCurrentUser(); UserGroupInformation loginUser = UserGroupInformation.getLoginUser(); // we can always renew our own tokens return loginUser.getUserName().equals(user.getUserName()) ? token.decodeIdentifier().getRenewer().toString() : user.getShortUserName(); } {noformat} It ends up passing the user short name to the cancelToken function whereas AbstractDelegationTokenSecretManager::cancelToken expects the full user name. This bug occurs in secure mode and is not an issue with simple auth. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2233) Implement web services to create, renew and cancel delegation tokens
Varun Vasudev created YARN-2233: --- Summary: Implement web services to create, renew and cancel delegation tokens Key: YARN-2233 URL: https://issues.apache.org/jira/browse/YARN-2233 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Implement functionality to create, renew and cancel delegation tokens. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2233) Implement web services to create, renew and cancel delegation tokens
[ https://issues.apache.org/jira/browse/YARN-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2233: Attachment: apache-yarn-2233.0.patch Uploaded patch. Implement web services to create, renew and cancel delegation tokens Key: YARN-2233 URL: https://issues.apache.org/jira/browse/YARN-2233 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2233.0.patch Implement functionality to create, renew and cancel delegation tokens. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2233) Implement web services to create, renew and cancel delegation tokens
[ https://issues.apache.org/jira/browse/YARN-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047596#comment-14047596 ] Varun Vasudev commented on YARN-2233: - Adding blocker because one test assumes that owners can cancel their own delegation tokens. I'll update the patch if YARN-2232 is marked invalid. Implement web services to create, renew and cancel delegation tokens Key: YARN-2233 URL: https://issues.apache.org/jira/browse/YARN-2233 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2233.0.patch Implement functionality to create, renew and cancel delegation tokens. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2232) ClientRMService doesn't allow delegation token owner to cancel their own token in secure mode
[ https://issues.apache.org/jira/browse/YARN-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047599#comment-14047599 ] Hadoop QA commented on YARN-2232: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653139/apache-yarn-2232.0.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4143//console This message is automatically generated. ClientRMService doesn't allow delegation token owner to cancel their own token in secure mode - Key: YARN-2232 URL: https://issues.apache.org/jira/browse/YARN-2232 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2232.0.patch The ClientRMSerivce doesn't allow delegation token owners to cancel their own tokens. The root cause is this piece of code from the cancelDelegationToken function - {noformat} String user = getRenewerForToken(token); ... private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) throws IOException { UserGroupInformation user = UserGroupInformation.getCurrentUser(); UserGroupInformation loginUser = UserGroupInformation.getLoginUser(); // we can always renew our own tokens return loginUser.getUserName().equals(user.getUserName()) ? token.decodeIdentifier().getRenewer().toString() : user.getShortUserName(); } {noformat} It ends up passing the user short name to the cancelToken function whereas AbstractDelegationTokenSecretManager::cancelToken expects the full user name. This bug occurs in secure mode and is not an issue with simple auth. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1342) Recover container tokens upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047598#comment-14047598 ] Devaraj K commented on YARN-1342: - Unfortunately this patch also has gone stale, could you rebase this. Thanks Recover container tokens upon nodemanager restart - Key: YARN-1342 URL: https://issues.apache.org/jira/browse/YARN-1342 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1342.patch, YARN-1342v2.patch, YARN-1342v3-and-YARN-1987.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2232) ClientRMService doesn't allow delegation token owner to cancel their own token in secure mode
[ https://issues.apache.org/jira/browse/YARN-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2232: Attachment: apache-yarn-2232.1.patch Uploaded new patch removing unnecessary includes which caused compilation error. ClientRMService doesn't allow delegation token owner to cancel their own token in secure mode - Key: YARN-2232 URL: https://issues.apache.org/jira/browse/YARN-2232 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2232.0.patch, apache-yarn-2232.1.patch The ClientRMSerivce doesn't allow delegation token owners to cancel their own tokens. The root cause is this piece of code from the cancelDelegationToken function - {noformat} String user = getRenewerForToken(token); ... private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) throws IOException { UserGroupInformation user = UserGroupInformation.getCurrentUser(); UserGroupInformation loginUser = UserGroupInformation.getLoginUser(); // we can always renew our own tokens return loginUser.getUserName().equals(user.getUserName()) ? token.decodeIdentifier().getRenewer().toString() : user.getShortUserName(); } {noformat} It ends up passing the user short name to the cancelToken function whereas AbstractDelegationTokenSecretManager::cancelToken expects the full user name. This bug occurs in secure mode and is not an issue with simple auth. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins
[ https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-1408: -- Attachment: Yarn-1408.6.patch Thank you very much [~leftnoteasy] for the comments. bq.ResourceRequest stored in RMContainerImpl should include rack/any RR, +1. Yes, RackLocal was missed and while recreating there will be problems as you mentioned (relaxLocaity). bq.You can edit appSchedulingInfo.allocate to return a list a RRs I think we can have a new api in appSchedulingInfo to return list of ResourceRequests (node local, rack local and any). I updated a patch as per the comments, kindly check and share your opinion. Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins -- Key: YARN-1408 URL: https://issues.apache.org/jira/browse/YARN-1408 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sunil G Assignee: Sunil G Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.patch Capacity preemption is enabled as follows. * yarn.resourcemanager.scheduler.monitor.enable= true , * yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy Queue = a,b Capacity of Queue A = 80% Capacity of Queue B = 20% Step 1: Assign a big jobA on queue a which uses full cluster capacity Step 2: Submitted a jobB to queue b which would use less than 20% of cluster capacity JobA task which uses queue b capcity is been preempted and killed. This caused below problem: 1. New Container has got allocated for jobA in Queue A as per node update from an NM. 2. This container has been preempted immediately as per preemption. Here ACQUIRED at KILLED Invalid State exception came when the next AM heartbeat reached RM. ERROR org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: ACQUIRED at KILLED This also caused the Task to go for a timeout for 30minutes as this Container was already killed by preemption. attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2232) ClientRMService doesn't allow delegation token owner to cancel their own token in secure mode
[ https://issues.apache.org/jira/browse/YARN-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047655#comment-14047655 ] Hadoop QA commented on YARN-2232: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653148/apache-yarn-2232.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4144//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4144//console This message is automatically generated. ClientRMService doesn't allow delegation token owner to cancel their own token in secure mode - Key: YARN-2232 URL: https://issues.apache.org/jira/browse/YARN-2232 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2232.0.patch, apache-yarn-2232.1.patch The ClientRMSerivce doesn't allow delegation token owners to cancel their own tokens. The root cause is this piece of code from the cancelDelegationToken function - {noformat} String user = getRenewerForToken(token); ... private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) throws IOException { UserGroupInformation user = UserGroupInformation.getCurrentUser(); UserGroupInformation loginUser = UserGroupInformation.getLoginUser(); // we can always renew our own tokens return loginUser.getUserName().equals(user.getUserName()) ? token.decodeIdentifier().getRenewer().toString() : user.getShortUserName(); } {noformat} It ends up passing the user short name to the cancelToken function whereas AbstractDelegationTokenSecretManager::cancelToken expects the full user name. This bug occurs in secure mode and is not an issue with simple auth. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2232) ClientRMService doesn't allow delegation token owner to cancel their own token in secure mode
[ https://issues.apache.org/jira/browse/YARN-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047656#comment-14047656 ] Varun Vasudev commented on YARN-2232: - Test failure is unrelated. ClientRMService doesn't allow delegation token owner to cancel their own token in secure mode - Key: YARN-2232 URL: https://issues.apache.org/jira/browse/YARN-2232 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2232.0.patch, apache-yarn-2232.1.patch The ClientRMSerivce doesn't allow delegation token owners to cancel their own tokens. The root cause is this piece of code from the cancelDelegationToken function - {noformat} String user = getRenewerForToken(token); ... private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) throws IOException { UserGroupInformation user = UserGroupInformation.getCurrentUser(); UserGroupInformation loginUser = UserGroupInformation.getLoginUser(); // we can always renew our own tokens return loginUser.getUserName().equals(user.getUserName()) ? token.decodeIdentifier().getRenewer().toString() : user.getShortUserName(); } {noformat} It ends up passing the user short name to the cancelToken function whereas AbstractDelegationTokenSecretManager::cancelToken expects the full user name. This bug occurs in secure mode and is not an issue with simple auth. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047665#comment-14047665 ] Hudson commented on YARN-2052: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1817 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1817/]) YARN-2052. Embedded an epoch number in container id to ensure the uniqueness of container id after RM restarts. Contributed by Tsuyoshi OZAWA (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1606557) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/Epoch.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/EpochPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerApplicationAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestMaxRunningAppsEnforcer.java ContainerId creation after work preserving restart is broken Key: YARN-2052 URL: https://issues.apache.org/jira/browse/YARN-2052 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi
[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.8.patch I updated the patch addressing comments. bq. isApplicationMasterRegistered is actually not an argument, may be throw ApplicationMasterNotRegsiteredException in this case ? DONE bq. pom.xml format: use spaces instead of tabs DONE bq. Not related to this jira. Current ApplicationMasterService does not allow multiple registers. Application may want to update its tracking url etc. Should we make AMS accept multiple registers ? DONE bq. please add some comments to pendingRelease and release variables DONE AM should implement Resync with the ApplicationMasterService instead of shutting down - Key: YARN-1366 URL: https://issues.apache.org/jira/browse/YARN-1366 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Rohith Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.8.patch, YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The AM behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047686#comment-14047686 ] Hudson commented on YARN-2052: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1790 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1790/]) YARN-2052. Embedded an epoch number in container id to ensure the uniqueness of container id after RM restarts. Contributed by Tsuyoshi OZAWA (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1606557) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/Epoch.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/EpochPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerApplicationAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestMaxRunningAppsEnforcer.java ContainerId creation after work preserving restart is broken Key: YARN-2052 URL: https://issues.apache.org/jira/browse/YARN-2052 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA
[jira] [Created] (YARN-2234) Incorrect description in RM audit logs while refreshing Admin ACL
Varun Saxena created YARN-2234: -- Summary: Incorrect description in RM audit logs while refreshing Admin ACL Key: YARN-2234 URL: https://issues.apache.org/jira/browse/YARN-2234 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Varun Saxena In method, AdminService#refreshAdminAcls(AdminService.java:446), failure RM audit log, which is generated when RM is not active, has following description : ResourceManager is not active. Can not refresh user-groups. This should instead be changed to ResourceManager is not active. Can not refresh admin ACLs'. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins
[ https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047694#comment-14047694 ] Hadoop QA commented on YARN-1408: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653155/Yarn-1408.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4145//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4145//console This message is automatically generated. Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins -- Key: YARN-1408 URL: https://issues.apache.org/jira/browse/YARN-1408 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sunil G Assignee: Sunil G Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.patch Capacity preemption is enabled as follows. * yarn.resourcemanager.scheduler.monitor.enable= true , * yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy Queue = a,b Capacity of Queue A = 80% Capacity of Queue B = 20% Step 1: Assign a big jobA on queue a which uses full cluster capacity Step 2: Submitted a jobB to queue b which would use less than 20% of cluster capacity JobA task which uses queue b capcity is been preempted and killed. This caused below problem: 1. New Container has got allocated for jobA in Queue A as per node update from an NM. 2. This container has been preempted immediately as per preemption. Here ACQUIRED at KILLED Invalid State exception came when the next AM heartbeat reached RM. ERROR org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: ACQUIRED at KILLED This also caused the Task to go for a timeout for 30minutes as this Container was already killed by preemption. attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2235) No success Audit log is present when Refresh Service ACL operation is performed using rmadmin
Varun Saxena created YARN-2235: -- Summary: No success Audit log is present when Refresh Service ACL operation is performed using rmadmin Key: YARN-2235 URL: https://issues.apache.org/jira/browse/YARN-2235 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.0 Reporter: Varun Saxena No success Audit log is present when Refresh Service ACL operation is performed using rmadmin. In AdminService#refreshServiceAcls method, only failure audit log is present. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047718#comment-14047718 ] Hadoop QA commented on YARN-1366: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653162/YARN-1366.8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4146//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4146//console This message is automatically generated. AM should implement Resync with the ApplicationMasterService instead of shutting down - Key: YARN-1366 URL: https://issues.apache.org/jira/browse/YARN-1366 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Rohith Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.8.patch, YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The AM behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins
[ https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047755#comment-14047755 ] Wangda Tan commented on YARN-1408: -- Hi [~sunilg], Thanks for updating the patch, overall approach LGTM, some comments, 1) bq. I think we can have a new api in appSchedulingInfo to return list of ResourceRequests (node local, rack local and any). I would suggest to modify existing appSchedulingInfo.allocate to return list of RRs. There existed outstanding resource decrement logic in allocate(), we can simply add decremented RR to a list and return them. It looks more like by-product of ASI.allocate to me. 2) {code} if (type.equals(NodeType.NODE_LOCAL)) { list.add(nodeRequests.get(hostName)); } {code} It's better to clone RR instead of add ref to list. It works, but it's better to set a #container correctly and prevent RR changed in ASI in the future. 3) TestCapacityScheduler: It's good to have a test for FairScheduler here too. I think we can put the test to org.apache.hadoop.yarn.server.resourcemanager.scheduler, and make it parameterized for Fair/Capacity/FIFO. Two minor comment for TestCapacityScheduler. 3.1 {code} for (ResourceRequest request : requests) { // Skip the OffRack and RackLocal resource requests. if (request.getResourceName().equals(node.getRackName()) || request.getResourceName().equals(ResourceRequest.ANY)) { Assert.assertEquals(request.getNumContainers(), 1); continue; } // Resource request must have added back in RM after preempt event handling. Assert.assertNotNull(app.getResourceRequest(request.getPriority(), request.getResourceName())); } {code} We can make it simpler to, {code} for (ResourceRequest request : requests) { // Resource request must have added back in RM after preempt event handling. Assert.assertEquals(1, app.getResourceRequest(request.getPriority(), request.getResourceName()).getNumContainers()); } {code} Because we added them back, there's no difference between node/rack/any. 3.2 {code} // allocate container ListContainer containers = am1.allocate(new ArrayListResourceRequest(), new ArrayListContainerId()).getAllocatedContainers(); {code} Should we wait for containers allocated in a while loop? This works now because previous we called rm1.waitForState(nm1, ...). But it's better to wait container allocated explictly Thanks, Wangda Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins -- Key: YARN-1408 URL: https://issues.apache.org/jira/browse/YARN-1408 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sunil G Assignee: Sunil G Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.patch Capacity preemption is enabled as follows. * yarn.resourcemanager.scheduler.monitor.enable= true , * yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy Queue = a,b Capacity of Queue A = 80% Capacity of Queue B = 20% Step 1: Assign a big jobA on queue a which uses full cluster capacity Step 2: Submitted a jobB to queue b which would use less than 20% of cluster capacity JobA task which uses queue b capcity is been preempted and killed. This caused below problem: 1. New Container has got allocated for jobA in Queue A as per node update from an NM. 2. This container has been preempted immediately as per preemption. Here ACQUIRED at KILLED Invalid State exception came when the next AM heartbeat reached RM. ERROR org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: ACQUIRED at KILLED This also caused the Task to go for a timeout for 30minutes as this Container was already killed by preemption. attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2216) TestRMApplicationHistoryWriter sometimes fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047839#comment-14047839 ] Xuan Gong commented on YARN-2216: - +1 LGTM TestRMApplicationHistoryWriter sometimes fails in trunk --- Key: YARN-2216 URL: https://issues.apache.org/jira/browse/YARN-2216 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Zhijie Shen Priority: Minor Attachments: TestRMApplicationHistoryWriter.stack, YARN-2216.1.patch From https://builds.apache.org/job/Hadoop-Yarn-trunk/595/ : {code} testRMWritingMassiveHistory(org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter) Time elapsed: 33.469 sec FAILURE! java.lang.AssertionError: expected:1 but was:7156 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:430) at org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:391) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2216) TestRMApplicationHistoryWriter sometimes fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047840#comment-14047840 ] Xuan Gong commented on YARN-2216: - Committed to trunk and branch-2. Thanks Zhijie ! TestRMApplicationHistoryWriter sometimes fails in trunk --- Key: YARN-2216 URL: https://issues.apache.org/jira/browse/YARN-2216 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Zhijie Shen Priority: Minor Attachments: TestRMApplicationHistoryWriter.stack, YARN-2216.1.patch From https://builds.apache.org/job/Hadoop-Yarn-trunk/595/ : {code} testRMWritingMassiveHistory(org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter) Time elapsed: 33.469 sec FAILURE! java.lang.AssertionError: expected:1 but was:7156 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:430) at org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:391) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047910#comment-14047910 ] Devaraj K commented on YARN-1341: - Sorry for coming late here. +1 for limiting the implementation/discussion as per Jira title and handling other cases in the respected Jira’s. In addition to option 1), I'd think of making the NM down if NM fails to store RM keys for certain number of times(configurable) consecutively. And also we can make it(i.e. tear down NM or not) as configurable and let the users choose whether to enable or disable the config to make the NM down for RM keys state store failures. Similarly for Container/Application state store failures, NM can mark that Container/Application as failed and can be reported to RM. These can be discussed more detail in the corresponding Jira’s YARN-1337 and YARN-1354. However for all these NM state store operations, we could think of having retries before throwing the IOException. Thoughts? Recover NMTokens upon nodemanager restart - Key: YARN-1341 URL: https://issues.apache.org/jira/browse/YARN-1341 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2236) Shared Cache uploader service on the Node Manager
Chris Trezzo created YARN-2236: -- Summary: Shared Cache uploader service on the Node Manager Key: YARN-2236 URL: https://issues.apache.org/jira/browse/YARN-2236 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Implement the shared cache uploader service on the node manager. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2236) Shared Cache uploader service on the Node Manager
[ https://issues.apache.org/jira/browse/YARN-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-2236: --- Attachment: YARN-2236-trunk-v1.patch Attached is a v1 patch on trunk+YARN-2179,YARN-2180,YARN-2183,YARN-2186,YARN-2188,YARN-2189,YARN-2203. Shared Cache uploader service on the Node Manager - Key: YARN-2236 URL: https://issues.apache.org/jira/browse/YARN-2236 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2236-trunk-v1.patch Implement the shared cache uploader service on the node manager. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2217) Shared cache client side changes
[ https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-2217: --- Attachment: YARN-2217-trunk-v2.patch Attached is a re-based v2 patch on trunk+YARN-2179,YARN-2180,YARN-2183,YARN-2186,YARN-2188,YARN-2189,YARN-2203,YARN-2236. Shared cache client side changes Key: YARN-2217 URL: https://issues.apache.org/jira/browse/YARN-2217 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch Implement the client side changes for the shared cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2022: -- Attachment: YARN-2022.9.patch Thank you [~vinodkv] for the comments. bq.The other option that we may do is to skip the config completely and hard-code the skipping. Yes. I also feel we can hardcode for now. bq.RMContainer usually is a read only interface. setMasterContainer() doesn't belong here, please move it to RMContainerImpl. I moved setMasterContainer to RMContainerImpl. In this case, if I need to invoke setMasterContainer then I may need to raise an Event to RMContainerImpl (downcast may not be good). Please suggest your opinion. I also fixed other comments. Kindly review the updated patch. Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy - Key: YARN-2022 URL: https://issues.apache.org/jira/browse/YARN-2022 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Assignee: Sunil G Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, Yarn-2022.1.patch Cluster Size = 16GB [2NM's] Queue A Capacity = 50% Queue B Capacity = 50% Consider there are 3 applications running in Queue A which has taken the full cluster capacity. J1 = 2GB AM + 1GB * 4 Maps J2 = 2GB AM + 1GB * 4 Maps J3 = 2GB AM + 1GB * 2 Maps Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ]. Currently in this scenario, Jobs J3 will get killed including its AM. It is better if AM can be given least priority among multiple applications. In this same scenario, map tasks from J3 and J2 can be preempted. Later when cluster is free, maps can be allocated to these Jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2147) client lacks delegation token exception details when application submit fails
[ https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047946#comment-14047946 ] Daryn Sharp commented on YARN-2147: --- Code looks fine. Currently the test verifies the stringified token is in the exception's message. However since the mock is throwing an exception explicitly with the stringified token, we don't know if the code change is actually catching and adding the token. The mock should throw a generic string of say, boom. Then check the caught exception against something like Failed to renew token: token: boom. client lacks delegation token exception details when application submit fails - Key: YARN-2147 URL: https://issues.apache.org/jira/browse/YARN-2147 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Chen He Priority: Minor Attachments: YARN-2147-v2.patch, YARN-2147-v3.patch, YARN-2147-v4.patch, YARN-2147.patch When an client submits an application and the delegation token process fails the client can lack critical details needed to understand the nature of the error. Only the message of the error exception is conveyed to the client, which sometimes isn't enough to debug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2190) Provide a Windows container executor that can limit memory and CPU
[ https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuan Liu updated YARN-2190: Attachment: YARN-2190.1.patch Here is a new patch that addressed all the comments above. Provide a Windows container executor that can limit memory and CPU -- Key: YARN-2190 URL: https://issues.apache.org/jira/browse/YARN-2190 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Reporter: Chuan Liu Assignee: Chuan Liu Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch Yarn default container executor on Windows does not set the resource limit on the containers currently. The memory limit is enforced by a separate monitoring thread. The container implementation on Windows uses Job Object right now. The latest Windows (8 or later) API allows CPU and memory limits on the job objects. We want to create a Windows container executor that sets the limits on job objects thus provides resource enforcement at OS level. http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047962#comment-14047962 ] Jian He commented on YARN-1366: --- Thanks for updating, some more comments: - “blacklistRemovals.addAll(blacklistToRemove);”, we don't need to add this in isResyncCommand check? as RM after restart will just forget all previously blacklisted nodes. - below code needs synchronize ? {code} for (MapString, TreeMapResource, ResourceRequestInfo rr : remoteRequestsTable .values()) { for (MapResource, ResourceRequestInfo capabalities : rr.values()) { for (ResourceRequestInfo request : capabalities.values()) { addResourceRequestToAsk(request.remoteRequest); } } } {code} - “isApplicationMasterRegistered = false;” not needed in allocate and unregisterApplicationMaster. - Instead of adding a new core-site.xml file, we can just set the config in the test code conf object. AM should implement Resync with the ApplicationMasterService instead of shutting down - Key: YARN-1366 URL: https://issues.apache.org/jira/browse/YARN-1366 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Rohith Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.8.patch, YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The AM behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2190) Provide a Windows container executor that can limit memory and CPU
[ https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuan Liu updated YARN-2190: Attachment: YARN-2190.2.patch Slightly update with some indentation fixes. Provide a Windows container executor that can limit memory and CPU -- Key: YARN-2190 URL: https://issues.apache.org/jira/browse/YARN-2190 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Reporter: Chuan Liu Assignee: Chuan Liu Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, YARN-2190.2.patch Yarn default container executor on Windows does not set the resource limit on the containers currently. The memory limit is enforced by a separate monitoring thread. The container implementation on Windows uses Job Object right now. The latest Windows (8 or later) API allows CPU and memory limits on the job objects. We want to create a Windows container executor that sets the limits on job objects thus provides resource enforcement at OS level. http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048010#comment-14048010 ] Vinod Kumar Vavilapalli commented on YARN-2022: --- bq. I moved setMasterContainer to RMContainerImpl. In this case, if I need to invoke setMasterContainer then I may need to raise an Event to RMContainerImpl (downcast may not be good). Please suggest your opinion. It's okay to do a cast. The main focus of the reader interfaces is to hide readers from the implementations and not accidentally invoke methods that mutate state (as opposed to enabling alternative implementations). Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy - Key: YARN-2022 URL: https://issues.apache.org/jira/browse/YARN-2022 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Assignee: Sunil G Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, Yarn-2022.1.patch Cluster Size = 16GB [2NM's] Queue A Capacity = 50% Queue B Capacity = 50% Consider there are 3 applications running in Queue A which has taken the full cluster capacity. J1 = 2GB AM + 1GB * 4 Maps J2 = 2GB AM + 1GB * 4 Maps J3 = 2GB AM + 1GB * 2 Maps Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ]. Currently in this scenario, Jobs J3 will get killed including its AM. It is better if AM can be given least priority among multiple applications. In this same scenario, map tasks from J3 and J2 can be preempted. Later when cluster is free, maps can be allocated to these Jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2216) TestRMApplicationHistoryWriter sometimes fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2216: -- Fix Version/s: 2.5.0 TestRMApplicationHistoryWriter sometimes fails in trunk --- Key: YARN-2216 URL: https://issues.apache.org/jira/browse/YARN-2216 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Zhijie Shen Priority: Minor Fix For: 2.5.0 Attachments: TestRMApplicationHistoryWriter.stack, YARN-2216.1.patch From https://builds.apache.org/job/Hadoop-Yarn-trunk/595/ : {code} testRMWritingMassiveHistory(org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter) Time elapsed: 33.469 sec FAILURE! java.lang.AssertionError: expected:1 but was:7156 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:430) at org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:391) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU
[ https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048016#comment-14048016 ] Hadoop QA commented on YARN-2190: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653201/YARN-2190.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4147//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4147//console This message is automatically generated. Provide a Windows container executor that can limit memory and CPU -- Key: YARN-2190 URL: https://issues.apache.org/jira/browse/YARN-2190 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Reporter: Chuan Liu Assignee: Chuan Liu Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, YARN-2190.2.patch Yarn default container executor on Windows does not set the resource limit on the containers currently. The memory limit is enforced by a separate monitoring thread. The container implementation on Windows uses Job Object right now. The latest Windows (8 or later) API allows CPU and memory limits on the job objects. We want to create a Windows container executor that sets the limits on job objects thus provides resource enforcement at OS level. http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU
[ https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048051#comment-14048051 ] Hadoop QA commented on YARN-2190: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653206/YARN-2190.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.ha.TestZKFailoverControllerStress {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4148//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4148//console This message is automatically generated. Provide a Windows container executor that can limit memory and CPU -- Key: YARN-2190 URL: https://issues.apache.org/jira/browse/YARN-2190 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Reporter: Chuan Liu Assignee: Chuan Liu Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, YARN-2190.2.patch Yarn default container executor on Windows does not set the resource limit on the containers currently. The memory limit is enforced by a separate monitoring thread. The container implementation on Windows uses Job Object right now. The latest Windows (8 or later) API allows CPU and memory limits on the job objects. We want to create a Windows container executor that sets the limits on job objects thus provides resource enforcement at OS level. http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2022: -- Attachment: (was: YARN-2022.9.patch) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy - Key: YARN-2022 URL: https://issues.apache.org/jira/browse/YARN-2022 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Assignee: Sunil G Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, Yarn-2022.1.patch Cluster Size = 16GB [2NM's] Queue A Capacity = 50% Queue B Capacity = 50% Consider there are 3 applications running in Queue A which has taken the full cluster capacity. J1 = 2GB AM + 1GB * 4 Maps J2 = 2GB AM + 1GB * 4 Maps J3 = 2GB AM + 1GB * 2 Maps Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ]. Currently in this scenario, Jobs J3 will get killed including its AM. It is better if AM can be given least priority among multiple applications. In this same scenario, map tasks from J3 and J2 can be preempted. Later when cluster is free, maps can be allocated to these Jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048079#comment-14048079 ] Jian He commented on YARN-1366: --- - we may check pendingRelease isEmpty as well to avoid unnecessary loops. {code} if (!allocateResponse.getCompletedContainersStatuses().isEmpty()) { removePendingReleaseRequests(allocateResponse .getCompletedContainersStatuses()); } {code} AM should implement Resync with the ApplicationMasterService instead of shutting down - Key: YARN-1366 URL: https://issues.apache.org/jira/browse/YARN-1366 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Rohith Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.8.patch, YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The AM behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2022: -- Attachment: YARN-2022.9.patch Thank you [~vinodkv]. I updated the patch as per the comment. Please review. Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy - Key: YARN-2022 URL: https://issues.apache.org/jira/browse/YARN-2022 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Assignee: Sunil G Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, Yarn-2022.1.patch Cluster Size = 16GB [2NM's] Queue A Capacity = 50% Queue B Capacity = 50% Consider there are 3 applications running in Queue A which has taken the full cluster capacity. J1 = 2GB AM + 1GB * 4 Maps J2 = 2GB AM + 1GB * 4 Maps J3 = 2GB AM + 1GB * 2 Maps Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ]. Currently in this scenario, Jobs J3 will get killed including its AM. It is better if AM can be given least priority among multiple applications. In this same scenario, map tasks from J3 and J2 can be preempted. Later when cluster is free, maps can be allocated to these Jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2147) client lacks delegation token exception details when application submit fails
[ https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-2147: -- Attachment: YARN-2147-v5.patch Thank you [~daryn]. I updated patch as you suggested. client lacks delegation token exception details when application submit fails - Key: YARN-2147 URL: https://issues.apache.org/jira/browse/YARN-2147 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Chen He Priority: Minor Attachments: YARN-2147-v2.patch, YARN-2147-v3.patch, YARN-2147-v4.patch, YARN-2147-v5.patch, YARN-2147.patch When an client submits an application and the delegation token process fails the client can lack critical details needed to understand the nature of the error. Only the message of the error exception is conveyed to the client, which sometimes isn't enough to debug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2147) client lacks delegation token exception details when application submit fails
[ https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048118#comment-14048118 ] Varun Vasudev commented on YARN-2147: - The patch looks mostly good. Agree with [~daryn] on testing the message. Question about the timeout - it doesn't look like the test needs a timeout, any particular reason why you added it? client lacks delegation token exception details when application submit fails - Key: YARN-2147 URL: https://issues.apache.org/jira/browse/YARN-2147 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Chen He Priority: Minor Attachments: YARN-2147-v2.patch, YARN-2147-v3.patch, YARN-2147-v4.patch, YARN-2147-v5.patch, YARN-2147.patch When an client submits an application and the delegation token process fails the client can lack critical details needed to understand the nature of the error. Only the message of the error exception is conveyed to the client, which sometimes isn't enough to debug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2229) Making ContainerId long type
[ https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048129#comment-14048129 ] Tsuyoshi OZAWA commented on YARN-2229: -- I took some time to think about the new format. IMHO, if it's necessary for us to assure backward compatibility, we need to use current format and deal with overflow. Current idea is as follows: * ContainerId will have epoch as 64 bit field based on the value stored on ZKRMStateStore. * {{ContainerId#compareTo}}, {{ContainerId#equals}} use the epoch field to deal with the overflow. * {{ContainerId#toString}} will show the 64 bit epoch as suffix of current container id. {{ConverterUtils#toContainerId}} will be updated to parse epoch. 64 bits are used like this: {code} |54bits|10bits| |0|0| // inital value |0|1023| // before overflow |1|1024==0| // overflowed. toString shows only lower 10 bits for backward compatibility. |1|1| |2|1024==0| {code} Old {{ConverterUtil#toContainerId}} can still parse the new format of {{ContainerId#toString}}. One problem is that the result of old {{ConverterUtil#toContainerId}} cannot care the overflow of epoch. [~jianhe], [~bikassaha], what do you think? Making ContainerId long type Key: YARN-2229 URL: https://issues.apache.org/jira/browse/YARN-2229 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA On YARN-2052, we changed containerId format: upper 10 bits are for epoch, lower 22 bits are for sequence number of Ids. This is for preserving semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM restarts 1024 times. To avoid the problem, its better to make containerId long. We need to define the new format of container Id with preserving backward compatibility on this JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048170#comment-14048170 ] Hadoop QA commented on YARN-2022: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653224/YARN-2022.9.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4149//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4149//console This message is automatically generated. Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy - Key: YARN-2022 URL: https://issues.apache.org/jira/browse/YARN-2022 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Assignee: Sunil G Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, Yarn-2022.1.patch Cluster Size = 16GB [2NM's] Queue A Capacity = 50% Queue B Capacity = 50% Consider there are 3 applications running in Queue A which has taken the full cluster capacity. J1 = 2GB AM + 1GB * 4 Maps J2 = 2GB AM + 1GB * 4 Maps J3 = 2GB AM + 1GB * 2 Maps Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ]. Currently in this scenario, Jobs J3 will get killed including its AM. It is better if AM can be given least priority among multiple applications. In this same scenario, map tasks from J3 and J2 can be preempted. Later when cluster is free, maps can be allocated to these Jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2147) client lacks delegation token exception details when application submit fails
[ https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048180#comment-14048180 ] Hadoop QA commented on YARN-2147: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653227/YARN-2147-v5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4150//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4150//console This message is automatically generated. client lacks delegation token exception details when application submit fails - Key: YARN-2147 URL: https://issues.apache.org/jira/browse/YARN-2147 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Chen He Priority: Minor Attachments: YARN-2147-v2.patch, YARN-2147-v3.patch, YARN-2147-v4.patch, YARN-2147-v5.patch, YARN-2147.patch When an client submits an application and the delegation token process fails the client can lack critical details needed to understand the nature of the error. Only the message of the error exception is conveyed to the client, which sometimes isn't enough to debug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1713) Implement getnewapplication and submitapp as part of RM web service
[ https://issues.apache.org/jira/browse/YARN-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048185#comment-14048185 ] Vinod Kumar Vavilapalli commented on YARN-1713: --- bq. AppSubmissionContextInfo - AppSubmissionSubmissionContextInfo Sorry, typo. Meant the full-name ApplicationSubmissionContextInfo.. Implement getnewapplication and submitapp as part of RM web service --- Key: YARN-1713 URL: https://issues.apache.org/jira/browse/YARN-1713 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Blocker Attachments: apache-yarn-1713.3.patch, apache-yarn-1713.4.patch, apache-yarn-1713.5.patch, apache-yarn-1713.6.patch, apache-yarn-1713.7.patch, apache-yarn-1713.8.patch, apache-yarn-1713.cumulative.2.patch, apache-yarn-1713.cumulative.3.patch, apache-yarn-1713.cumulative.4.patch, apache-yarn-1713.cumulative.patch, apache-yarn-1713.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1367) After restart NM should resync with the RM without killing containers
[ https://issues.apache.org/jira/browse/YARN-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048208#comment-14048208 ] Anubhav Dhoot commented on YARN-1367: - I had it that way but after discussion it seemed like depending on config might make it cumbersome. I am worried about what happens when we have a mismatch between RM and NM. For example if NM does not kill containers (setting on) and RM is not expecting containers to be preserved (Setting off). Then the containers could be running without RM accounting for them. After restart NM should resync with the RM without killing containers - Key: YARN-1367 URL: https://issues.apache.org/jira/browse/YARN-1367 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1367.001.patch, YARN-1367.002.patch, YARN-1367.prototype.patch After RM restart, the RM sends a resync response to NMs that heartbeat to it. Upon receiving the resync response, the NM kills all containers and re-registers with the RM. The NM should be changed to not kill the container and instead inform the RM about all currently running containers including their allocations etc. After the re-register, the NM should send all pending container completions to the RM as usual. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048216#comment-14048216 ] Tsuyoshi OZAWA commented on YARN-2052: -- Thank you for the review and comments, Jian, Vinod, and Bikas! ContainerId creation after work preserving restart is broken Key: YARN-2052 URL: https://issues.apache.org/jira/browse/YARN-2052 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.5.0 Attachments: YARN-2052.1.patch, YARN-2052.10.patch, YARN-2052.11.patch, YARN-2052.12.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch Container ids are made unique by using the app identifier and appending a monotonically increasing sequence number to it. Since container creation is a high churn activity the RM does not store the sequence number per app. So after restart it does not know what the new sequence number should be for new allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048225#comment-14048225 ] Tsuyoshi OZAWA commented on YARN-2052: -- I'm planning to define epoch format on YARN-2229 at first and change toString behavior on YARN-2182. ContainerId creation after work preserving restart is broken Key: YARN-2052 URL: https://issues.apache.org/jira/browse/YARN-2052 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.5.0 Attachments: YARN-2052.1.patch, YARN-2052.10.patch, YARN-2052.11.patch, YARN-2052.12.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch Container ids are made unique by using the app identifier and appending a monotonically increasing sequence number to it. Since container creation is a high churn activity the RM does not store the sequence number per app. So after restart it does not know what the new sequence number should be for new allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2237) MRAppMaster changes for AMRMToken roll-up
Xuan Gong created YARN-2237: --- Summary: MRAppMaster changes for AMRMToken roll-up Key: YARN-2237 URL: https://issues.apache.org/jira/browse/YARN-2237 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1367) After restart NM should resync with the RM without killing containers
[ https://issues.apache.org/jira/browse/YARN-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048291#comment-14048291 ] Jian He commented on YARN-1367: --- In that case, RM should still be able to shoot unknown containers. I think the point is that in the future we are only supporting work-preserving restart and the newly added command will be useless at that point. This config is only a temporary solution for testing and stabilizing. After restart NM should resync with the RM without killing containers - Key: YARN-1367 URL: https://issues.apache.org/jira/browse/YARN-1367 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1367.001.patch, YARN-1367.002.patch, YARN-1367.prototype.patch After RM restart, the RM sends a resync response to NMs that heartbeat to it. Upon receiving the resync response, the NM kills all containers and re-registers with the RM. The NM should be changed to not kill the container and instead inform the RM about all currently running containers including their allocations etc. After the re-register, the NM should send all pending container completions to the RM as usual. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048308#comment-14048308 ] Mayank Bansal commented on YARN-2022: - Thanks [~sunilg] for the patch. {code} public void setAMContainer(boolean isAMContainer) { this.isAMContainer = isAMContainer; } {code} There should be write lock to it as well Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy - Key: YARN-2022 URL: https://issues.apache.org/jira/browse/YARN-2022 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Assignee: Sunil G Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, Yarn-2022.1.patch Cluster Size = 16GB [2NM's] Queue A Capacity = 50% Queue B Capacity = 50% Consider there are 3 applications running in Queue A which has taken the full cluster capacity. J1 = 2GB AM + 1GB * 4 Maps J2 = 2GB AM + 1GB * 4 Maps J3 = 2GB AM + 1GB * 2 Maps Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ]. Currently in this scenario, Jobs J3 will get killed including its AM. It is better if AM can be given least priority among multiple applications. In this same scenario, map tasks from J3 and J2 can be preempted. Later when cluster is free, maps can be allocated to these Jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2001) Threshold for RM to accept requests from AM after failover
[ https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048333#comment-14048333 ] Vinod Kumar Vavilapalli commented on YARN-2001: --- +1 for the general idea. I suppose you will implement the node-threshold separately? There are a lot of reasons why it makes sense for scheduler to pause for a while. Mind adding some of them here and to the config documentation? Insufficient state etc.. Are there more issues? It'd be great to add some tests too. Threshold for RM to accept requests from AM after failover -- Key: YARN-2001 URL: https://issues.apache.org/jira/browse/YARN-2001 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2001.1.patch After failover, RM may require a certain threshold to determine whether it’s safe to make scheduling decisions and start accepting new container requests from AMs. The threshold could be a certain amount of nodes. i.e. RM waits until a certain amount of nodes joining before accepting new container requests. Or it could simply be a timeout, only after the timeout RM accepts new requests. NMs joined after the threshold can be treated as new NMs and instructed to kill all its containers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048344#comment-14048344 ] Vinod Kumar Vavilapalli commented on YARN-2022: --- The latest patch looks much better to me, save for Mayank's comment above. +1 otherwise. Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy - Key: YARN-2022 URL: https://issues.apache.org/jira/browse/YARN-2022 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Assignee: Sunil G Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, Yarn-2022.1.patch Cluster Size = 16GB [2NM's] Queue A Capacity = 50% Queue B Capacity = 50% Consider there are 3 applications running in Queue A which has taken the full cluster capacity. J1 = 2GB AM + 1GB * 4 Maps J2 = 2GB AM + 1GB * 4 Maps J3 = 2GB AM + 1GB * 2 Maps Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ]. Currently in this scenario, Jobs J3 will get killed including its AM. It is better if AM can be given least priority among multiple applications. In this same scenario, map tasks from J3 and J2 can be preempted. Later when cluster is free, maps can be allocated to these Jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048345#comment-14048345 ] Carlo Curino commented on YARN-1039: Hi Guys, I am just tuning in now... (apologies if I am misinterpreting the conversation), but it seems that some of the proposed changes resemble what we were proposing for the reservation YARN-1051 work. In the sub-task YARN-1708 we propose and extension of ResourceRequest that expresses the duration (or leaseDuration if you prefer) for which resources will be reserved... The same concept could be used here as a hint from the user on for how long I expect to hold onto the resources. What I am suggesting is that having a time associated with a ResourceRequest could serve both purposes, and be a generally useful hint to the RM. Thoughts? Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048352#comment-14048352 ] Wangda Tan commented on YARN-2022: -- Just go through main logic and tests, besides Mayank's comment, LGTM, +1. Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy - Key: YARN-2022 URL: https://issues.apache.org/jira/browse/YARN-2022 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Assignee: Sunil G Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, Yarn-2022.1.patch Cluster Size = 16GB [2NM's] Queue A Capacity = 50% Queue B Capacity = 50% Consider there are 3 applications running in Queue A which has taken the full cluster capacity. J1 = 2GB AM + 1GB * 4 Maps J2 = 2GB AM + 1GB * 4 Maps J3 = 2GB AM + 1GB * 2 Maps Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ]. Currently in this scenario, Jobs J3 will get killed including its AM. It is better if AM can be given least priority among multiple applications. In this same scenario, map tasks from J3 and J2 can be preempted. Later when cluster is free, maps can be allocated to these Jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2238) filtering on UI sticks even if I move away from the page
[ https://issues.apache.org/jira/browse/YARN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2238: -- Attachment: filtered.png Screenshot of such a filtered page. Note that it clearly says it's showing 2 entries filtered from 601 entries, but all the search fields are blank. filtering on UI sticks even if I move away from the page Key: YARN-2238 URL: https://issues.apache.org/jira/browse/YARN-2238 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.4.0 Reporter: Sangjin Lee Attachments: filtered.png The main data table in many web pages (RM, AM, etc.) seems to show an unexpected filtering behavior. If I filter the table by typing something in the key or value field (or I suspect any search field), the data table gets filtered. The example I used is the job configuration page for a MR job. That is expected. However, when I move away from that page and visit any other web page of the same type (e.g. a job configuration page), the page is rendered with the filtering! That is unexpected. What's even stranger is that it does not render the filtering term. As a result, I have a page that's mysteriously filtered but doesn't tell me what it's filtering on. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2238) filtering on UI sticks even if I move away from the page
Sangjin Lee created YARN-2238: - Summary: filtering on UI sticks even if I move away from the page Key: YARN-2238 URL: https://issues.apache.org/jira/browse/YARN-2238 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.4.0 Reporter: Sangjin Lee Attachments: filtered.png The main data table in many web pages (RM, AM, etc.) seems to show an unexpected filtering behavior. If I filter the table by typing something in the key or value field (or I suspect any search field), the data table gets filtered. The example I used is the job configuration page for a MR job. That is expected. However, when I move away from that page and visit any other web page of the same type (e.g. a job configuration page), the page is rendered with the filtering! That is unexpected. What's even stranger is that it does not render the filtering term. As a result, I have a page that's mysteriously filtered but doesn't tell me what it's filtering on. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2238) filtering on UI sticks even if I move away from the page
[ https://issues.apache.org/jira/browse/YARN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048365#comment-14048365 ] Sangjin Lee commented on YARN-2238: --- Could this be related to YARN-237? filtering on UI sticks even if I move away from the page Key: YARN-2238 URL: https://issues.apache.org/jira/browse/YARN-2238 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.4.0 Reporter: Sangjin Lee Attachments: filtered.png The main data table in many web pages (RM, AM, etc.) seems to show an unexpected filtering behavior. If I filter the table by typing something in the key or value field (or I suspect any search field), the data table gets filtered. The example I used is the job configuration page for a MR job. That is expected. However, when I move away from that page and visit any other web page of the same type (e.g. a job configuration page), the page is rendered with the filtering! That is unexpected. What's even stranger is that it does not render the filtering term. As a result, I have a page that's mysteriously filtered but doesn't tell me what it's filtering on. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2238) filtering on UI sticks even if I move away from the page
[ https://issues.apache.org/jira/browse/YARN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048379#comment-14048379 ] Sangjin Lee commented on YARN-2238: --- A minor clarification: if the filtering was done via the top right search field, the search term is shown when you go to another page. So at least you know the filtering is being done by that search term. On the other hand, if the filtering was done via the bottom search fields (key, value, or source chain), the search term is NOT shown when you go to another page (but the filtering is still done). filtering on UI sticks even if I move away from the page Key: YARN-2238 URL: https://issues.apache.org/jira/browse/YARN-2238 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.4.0 Reporter: Sangjin Lee Attachments: filtered.png The main data table in many web pages (RM, AM, etc.) seems to show an unexpected filtering behavior. If I filter the table by typing something in the key or value field (or I suspect any search field), the data table gets filtered. The example I used is the job configuration page for a MR job. That is expected. However, when I move away from that page and visit any other web page of the same type (e.g. a job configuration page), the page is rendered with the filtering! That is unexpected. What's even stranger is that it does not render the filtering term. As a result, I have a page that's mysteriously filtered but doesn't tell me what it's filtering on. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-941) RM Should have a way to update the tokens it has for a running application
[ https://issues.apache.org/jira/browse/YARN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048384#comment-14048384 ] Vinod Kumar Vavilapalli commented on YARN-941: -- [~bcwalrus], Your point about us remembering the AMRMTokens is right, I stand corrected. Responses inline bq. The attacker gains access to the persistence store, where the RM stores its id, password map. In this case, all bets are off. Neither solution is more secure than the other. That is correct. This problem is handled by securing the state-store, which we require today. bq. The attacker snoops an insecure RPC channel and reads valid tokens from the network. The proper solution is to turn on RPC privacy. The token replacement patch does not offer any real protection. On the contrary, it may give people a false sense of security, which would be worse. RPC privacy is a very expensive solution for AM-RM communication. First, it needs setup so AM/RM have access to key infrastructure - having this burden on all applications is not reasonable. This is compounded by the fact that we use AMRMTokens in non-secure mode too. Second, AM - RM communication is a very chatty protocol, it's likely the overhead is huge.. You are right in that token renewal+replacement is a mitigative solution but the plus is that it can do so without that much cost. bq. The attacker mounts a cryptographic attack, or somehow manages to guess a valid id, password pair. Token replacement is better because it limits the exposure. But this attack is very unlikely. And we can counter that by using a stronger hash function. Unfortunately with long running services (the focus of this JIRA), this attack and its success is not as unlikely. This is the very reason why we roll master-keys every so often in the first place. For long-running services, AMRMTokens play a very similar role of master-keys between Hadoop daemons. Overall, I think token replacement is not as complex as you may think it is. The evidence to that is the redistribution of NMTokens that we *already* do. And as I see Xuan already has a patch. Our client, for all our efforts, already is fat. The way we handled the burden of NMTokens etc is by having a smarter client that takes care of the replacement for users. We can do the same for AMRMTokens too.. RM Should have a way to update the tokens it has for a running application -- Key: YARN-941 URL: https://issues.apache.org/jira/browse/YARN-941 Project: Hadoop YARN Issue Type: Sub-task Reporter: Robert Joseph Evans Assignee: Xuan Gong Attachments: YARN-941.preview.2.patch, YARN-941.preview.3.patch, YARN-941.preview.4.patch, YARN-941.preview.patch When an application is submitted to the RM it includes with it a set of tokens that the RM will renew on behalf of the application, that will be passed to the AM when the application is launched, and will be used when launching the application to access HDFS to download files on behalf of the application. For long lived applications/services these tokens can expire, and then the tokens that the AM has will be invalid, and the tokens that the RM had will also not work to launch a new AM. We need to provide an API that will allow the RM to replace the current tokens for this application with a new set. To avoid any real race issues, I think this API should be something that the AM calls, so that the client can connect to the AM with a new set of tokens it got using kerberos, then the AM can inform the RM of the new set of tokens and quickly update its tokens internally to use these new ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2181) Add preemption info to RM Web UI
[ https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2181: - Attachment: YARN-2181.patch Merge RMAppAttemptPreemptionMetrics into RMAppPreemptionMetrics to make attempt's preemption metrics is always consistent with app's preemption metrics. Add preemption info to RM Web UI Key: YARN-2181 URL: https://issues.apache.org/jira/browse/YARN-2181 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page.png, queue page.png We need add preemption info to RM web page to make administrator/user get more understanding about preemption happened on app, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2236) Shared Cache uploader service on the Node Manager
[ https://issues.apache.org/jira/browse/YARN-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-2236: --- Attachment: YARN-2236-trunk-v2.patch Attached is a v2 patch (to fix a small issue introduced during rebase). Shared Cache uploader service on the Node Manager - Key: YARN-2236 URL: https://issues.apache.org/jira/browse/YARN-2236 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2236-trunk-v1.patch, YARN-2236-trunk-v2.patch Implement the shared cache uploader service on the node manager. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI
[ https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048438#comment-14048438 ] Hadoop QA commented on YARN-2181: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653284/YARN-2181.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4151//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4151//console This message is automatically generated. Add preemption info to RM Web UI Key: YARN-2181 URL: https://issues.apache.org/jira/browse/YARN-2181 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page.png, queue page.png We need add preemption info to RM web page to make administrator/user get more understanding about preemption happened on app, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048444#comment-14048444 ] Chris Trezzo commented on YARN-1492: All patches for the shared cache have now been posted. Please let me know if I can do anything else to facilitate code reviews and make the review process as easy as possible. Thanks in advance! truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Attachments: YARN-1492-all-trunk-v1.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2229) Making ContainerId long type
[ https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2229: - Attachment: YARN-2229-wip.01.patch Making ContainerId long type Key: YARN-2229 URL: https://issues.apache.org/jira/browse/YARN-2229 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2229-wip.01.patch On YARN-2052, we changed containerId format: upper 10 bits are for epoch, lower 22 bits are for sequence number of Ids. This is for preserving semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM restarts 1024 times. To avoid the problem, its better to make containerId long. We need to define the new format of container Id with preserving backward compatibility on this JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2229) Making ContainerId long type
[ https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048495#comment-14048495 ] Hadoop QA commented on YARN-2229: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653313/YARN-2229-wip.01.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4152//console This message is automatically generated. Making ContainerId long type Key: YARN-2229 URL: https://issues.apache.org/jira/browse/YARN-2229 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2229-wip.01.patch On YARN-2052, we changed containerId format: upper 10 bits are for epoch, lower 22 bits are for sequence number of Ids. This is for preserving semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM restarts 1024 times. To avoid the problem, its better to make containerId long. We need to define the new format of container Id with preserving backward compatibility on this JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2181) Add preemption info to RM Web UI
[ https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2181: - Attachment: YARN-2181.patch Add preemption info to RM Web UI Key: YARN-2181 URL: https://issues.apache.org/jira/browse/YARN-2181 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page.png, queue page.png We need add preemption info to RM web page to make administrator/user get more understanding about preemption happened on app, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2022: -- Attachment: YARN-2022.10.patch Thank you [~mayank_bansal] , [~vinodkv] and [~leftnoteasy] for the comments. I have updated the patch. Please review. Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy - Key: YARN-2022 URL: https://issues.apache.org/jira/browse/YARN-2022 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Assignee: Sunil G Attachments: YARN-2022-DesignDraft.docx, YARN-2022.10.patch, YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, Yarn-2022.1.patch Cluster Size = 16GB [2NM's] Queue A Capacity = 50% Queue B Capacity = 50% Consider there are 3 applications running in Queue A which has taken the full cluster capacity. J1 = 2GB AM + 1GB * 4 Maps J2 = 2GB AM + 1GB * 4 Maps J3 = 2GB AM + 1GB * 2 Maps Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ]. Currently in this scenario, Jobs J3 will get killed including its AM. It is better if AM can be given least priority among multiple applications. In this same scenario, map tasks from J3 and J2 can be preempted. Later when cluster is free, maps can be allocated to these Jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2022: -- Attachment: (was: YARN-2022.10.patch) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy - Key: YARN-2022 URL: https://issues.apache.org/jira/browse/YARN-2022 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Assignee: Sunil G Attachments: YARN-2022-DesignDraft.docx, YARN-2022.10.patch, YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, Yarn-2022.1.patch Cluster Size = 16GB [2NM's] Queue A Capacity = 50% Queue B Capacity = 50% Consider there are 3 applications running in Queue A which has taken the full cluster capacity. J1 = 2GB AM + 1GB * 4 Maps J2 = 2GB AM + 1GB * 4 Maps J3 = 2GB AM + 1GB * 2 Maps Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ]. Currently in this scenario, Jobs J3 will get killed including its AM. It is better if AM can be given least priority among multiple applications. In this same scenario, map tasks from J3 and J2 can be preempted. Later when cluster is free, maps can be allocated to these Jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2022: -- Attachment: YARN-2022.10.patch Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy - Key: YARN-2022 URL: https://issues.apache.org/jira/browse/YARN-2022 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Assignee: Sunil G Attachments: YARN-2022-DesignDraft.docx, YARN-2022.10.patch, YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, Yarn-2022.1.patch Cluster Size = 16GB [2NM's] Queue A Capacity = 50% Queue B Capacity = 50% Consider there are 3 applications running in Queue A which has taken the full cluster capacity. J1 = 2GB AM + 1GB * 4 Maps J2 = 2GB AM + 1GB * 4 Maps J3 = 2GB AM + 1GB * 2 Maps Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ]. Currently in this scenario, Jobs J3 will get killed including its AM. It is better if AM can be given least priority among multiple applications. In this same scenario, map tasks from J3 and J2 can be preempted. Later when cluster is free, maps can be allocated to these Jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.9.patch Thank for reviewing patch.. I updated patch as per comments. Please review update patch bq. “blacklistRemovals.addAll(blacklistToRemove);”, we don't need to add this in isResyncCommand check? as RM after restart will just forget all previously blacklisted nodes. DONE, bq. below code needs synchronize ? Yes bq. “isApplicationMasterRegistered = false;” not needed in allocate and unregisterApplicationMaster. DONE, removed this variale itself since not used. bq. we may check pendingRelease isEmpty as well to avoid unnecessary loops DONE bq. Instead of adding a new core-site.xml file, we can just set the config in the test code conf object. core-site.xml has to be there since SecurityUtil.java loads configurations during class loading.It can not be passed through conf object.!! AM should implement Resync with the ApplicationMasterService instead of shutting down - Key: YARN-1366 URL: https://issues.apache.org/jira/browse/YARN-1366 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Rohith Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.8.patch, YARN-1366.9.patch, YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The AM behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI
[ https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048547#comment-14048547 ] Hadoop QA commented on YARN-2181: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653319/YARN-2181.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4153//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4153//console This message is automatically generated. Add preemption info to RM Web UI Key: YARN-2181 URL: https://issues.apache.org/jira/browse/YARN-2181 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page.png, queue page.png We need add preemption info to RM web page to make administrator/user get more understanding about preemption happened on app, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: (was: YARN-1366.9.patch) AM should implement Resync with the ApplicationMasterService instead of shutting down - Key: YARN-1366 URL: https://issues.apache.org/jira/browse/YARN-1366 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Rohith Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.8.patch, YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The AM behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.9.patch Update patch for test cases correction AM should implement Resync with the ApplicationMasterService instead of shutting down - Key: YARN-1366 URL: https://issues.apache.org/jira/browse/YARN-1366 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Rohith Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.8.patch, YARN-1366.9.patch, YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The AM behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)