[jira] [Commented] (YARN-600) Hook up cgroups CPU settings to the number of virtual cores allocated
[ https://issues.apache.org/jira/browse/YARN-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646436#comment-13646436 ] Sandy Ryza commented on YARN-600: - (by which I mean YARN doesn't give hints to the OS on how and when to assign cores to processes) Hook up cgroups CPU settings to the number of virtual cores allocated - Key: YARN-600 URL: https://issues.apache.org/jira/browse/YARN-600 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza YARN-3 introduced CPU isolation and monitoring through cgroups. YARN-2 and introduced CPU scheduling in the capacity scheduler, and YARN-326 will introduce it in the fair scheduler. The number of virtual cores allocated to a container should be used to weight the number of cgroups CPU shares given to it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-600) Hook up cgroups CPU settings to the number of virtual cores allocated
[ https://issues.apache.org/jira/browse/YARN-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646435#comment-13646435 ] Sandy Ryza commented on YARN-600: - I don't believe anything special happens when it's not enabled. Hook up cgroups CPU settings to the number of virtual cores allocated - Key: YARN-600 URL: https://issues.apache.org/jira/browse/YARN-600 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza YARN-3 introduced CPU isolation and monitoring through cgroups. YARN-2 and introduced CPU scheduling in the capacity scheduler, and YARN-326 will introduce it in the fair scheduler. The number of virtual cores allocated to a container should be used to weight the number of cgroups CPU shares given to it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-422) Add AM-NM client library
[ https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-422: - Attachment: YARN-422.1.patch Here's the first patch, which is ready for review. Based on the previous definition file, the patch has the following updates: 1. Refactor some code (add some more logs, paraphrase the javadoc, and etc) 2. Rename AMNMClient to NMClient since not only AM will use this client. 3. In NMClientAsync, join the threadpool's thread, which is set to non-daeomon. 4. Enhance the test cases. As the patch is already big, I suggest to defer the code changes of using the client in AM and RM in the follow-up patches. In addition, as I found, maven seems not to distinguish scope when checking cyclic dependency. Therefore, making resourcemanager project depend on client (to use NMClient in AMLauncher) will fail the build. Add AM-NM client library Key: YARN-422 URL: https://issues.apache.org/jira/browse/YARN-422 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Zhijie Shen Attachments: AMNMClient_Defination.txt, AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf, YARN-422.1.patch Create a simple wrapper over the AM-NM container protocol to provide hide the details of the protocol implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-422) Add AM-NM client library
[ https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646448#comment-13646448 ] Hadoop QA commented on YARN-422: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581346/YARN-422.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: org.apache.hadoop.yarn.client.TestNMClientAsync {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/852//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/852//console This message is automatically generated. Add AM-NM client library Key: YARN-422 URL: https://issues.apache.org/jira/browse/YARN-422 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Zhijie Shen Attachments: AMNMClient_Defination.txt, AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf, YARN-422.1.patch Create a simple wrapper over the AM-NM container protocol to provide hide the details of the protocol implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-422) Add AM-NM client library
[ https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-422: - Attachment: YARN-422.2.patch Try to fix the test failure. The problem seems that the mockNMClient is changed before all the test cases are completed. So, split the success and the failure tests. Add AM-NM client library Key: YARN-422 URL: https://issues.apache.org/jira/browse/YARN-422 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Zhijie Shen Attachments: AMNMClient_Defination.txt, AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf, YARN-422.1.patch, YARN-422.2.patch Create a simple wrapper over the AM-NM container protocol to provide hide the details of the protocol implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-422) Add AM-NM client library
[ https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646473#comment-13646473 ] Hadoop QA commented on YARN-422: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581348/YARN-422.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/853//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/853//console This message is automatically generated. Add AM-NM client library Key: YARN-422 URL: https://issues.apache.org/jira/browse/YARN-422 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Zhijie Shen Attachments: AMNMClient_Defination.txt, AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf, YARN-422.1.patch, YARN-422.2.patch Create a simple wrapper over the AM-NM container protocol to provide hide the details of the protocol implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container
[ https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646558#comment-13646558 ] Daryn Sharp commented on YARN-613: -- I just have general concerns with assuming the entire hadoop environment is trusted and thus introducing weaknesses at a global level . Ex. A weakness is introduced every time one entity shares a secret to validate a token created by another entity. Compromising one of hundreds or thousands of node shouldn't put the entire cluster at risk. If I can gain access to one NM host and its keytab, I believe I can secretly launch a malicious NM? NMs currently share a global key container token secrets, but there is a jira to move to per-NM secrets so sharing a global AM secret would be another step backwards. Exploring alternate avenues to avoid global trust, is passing the allowed am token allowed to get status and stop the container with the launch request not feasible? Create NM proxy per NM instead of per container --- Key: YARN-613 URL: https://issues.apache.org/jira/browse/YARN-613 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Vinod Kumar Vavilapalli Currently a new NM proxy has to be created per container since the secure authentication is using a containertoken from the container. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-617) In unsercure mode, AM can fake resource requirements
[ https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646572#comment-13646572 ] Daryn Sharp commented on YARN-617: -- bq. we are trying to change the auth to use AMTokens and authorization will continue to be via ContainerTokens I may have misinterpreted the other jira... I thought the goal is continue to auth container launches with a container token, but change status and stop to authenticate with the am token? Are you saying the goal is to auth container launches with the am token too? {quote}bq. A RPC server also enables SASL DIGEST-MD5 if a secret manager is active.{quote} bq. Off topic, but this is what I guessed is the reason underlying YARN-626, do you know when this got merged into branch-2? The SASL changes HADOOP-8783/HADOOP-8784 went in Oct 3-4 2012. The change allowed servers to accept tokens regardless of security setting if a secret manager is present, and for clients to always use a token if present regardless of security setting. This didn't change behavior for secure cluster, so YARN-626 can't be related because security is enabled and the AM is lacking a token for the RM in its UGI. In unsercure mode, AM can fake resource requirements - Key: YARN-617 URL: https://issues.apache.org/jira/browse/YARN-617 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Minor Without security, it is impossible to completely avoid AMs faking resources. We can at the least make it as difficult as possible by using the same container tokens and the RM-NM shared key mechanism over unauthenticated RM-NM channel. In the minimum, this will avoid accidental bugs in AMs in unsecure mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-18: --- Attachment: Pluggable topologies with NodeGroup for YARN.pdf Implementation doc for NodeGroup layer support in YARN. Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology - Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, Pluggable topologies with NodeGroup for YARN.pdf, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch There are several classes in YARN’s container assignment and task scheduling algorithms that relate to data locality which were updated to give preference to running a container on other locality besides node-local and rack-local (like nodegroup-local). This propose to make these data structure/algorithms pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class ScheduledRequests was made a package level class to it would be easier to create a subclass, ScheduledRequestsWithNodeGroup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-18: --- Attachment: YARN-18-v6.patch Sync patch to latest trunk which keep consist with doc just attached. Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology - Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, Pluggable topologies with NodeGroup for YARN.pdf, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch, YARN-18-v6.patch There are several classes in YARN’s container assignment and task scheduling algorithms that relate to data locality which were updated to give preference to running a container on other locality besides node-local and rack-local (like nodegroup-local). This propose to make these data structure/algorithms pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class ScheduledRequests was made a package level class to it would be easier to create a subclass, ScheduledRequestsWithNodeGroup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646652#comment-13646652 ] Junping Du commented on YARN-18: Thanks Luke for explanation. Hey Arun ([~acmurthy]), I just attached a doc with implementation details for this patch and YARN-19. Hopefully that will be helpful for your review. Your thought to abstract notion of topology logic in scheduler make great sense to me. We already did this for each scheduler, but you mean that need to abstract common part for all schedulers which is a little complicated but still doable. Can we do this code refactoring work in a separated jira and I am glad to work on it? For a new NodeGroup Scheduler, I think it may not be necessary as it address different topologies rather than different algorithm to isolate/prioritise job. So even under topology with NodeGroup layer, different user still need different scheduler like: Fair, Capacity. etc. Thoughts? Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology - Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, Pluggable topologies with NodeGroup for YARN.pdf, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch, YARN-18-v6.patch There are several classes in YARN’s container assignment and task scheduling algorithms that relate to data locality which were updated to give preference to running a container on other locality besides node-local and rack-local (like nodegroup-local). This propose to make these data structure/algorithms pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class ScheduledRequests was made a package level class to it would be easier to create a subclass, ScheduledRequestsWithNodeGroup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646659#comment-13646659 ] Hadoop QA commented on YARN-18: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581362/YARN-18-v6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 3 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/854//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/854//console This message is automatically generated. Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology - Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, Pluggable topologies with NodeGroup for YARN.pdf, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch, YARN-18-v6.patch There are several classes in YARN’s container assignment and task scheduling algorithms that relate to data locality which were updated to give preference to running a container on other locality besides node-local and rack-local (like nodegroup-local). This propose to make these data structure/algorithms pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class ScheduledRequests was made a package level class to it would be easier to create a subclass, ScheduledRequestsWithNodeGroup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-582) Restore appToken for app attempt after RM restart
[ https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-582: - Attachment: YARN-582.3.patch Restore appToken for app attempt after RM restart - Key: YARN-582 URL: https://issues.apache.org/jira/browse/YARN-582 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-582.1.patch, YARN-582.2.patch, YARN-582.3.patch These need to be saved and restored on a per app attempt basis. This is required only when work preserving restart is implemented for secure clusters. In non-preserving restart app attempts are killed and so this does not matter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-582) Restore appToken for app attempt after RM restart
[ https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646806#comment-13646806 ] Jian He commented on YARN-582: -- new patch addressed last comments Restore appToken for app attempt after RM restart - Key: YARN-582 URL: https://issues.apache.org/jira/browse/YARN-582 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-582.1.patch, YARN-582.2.patch, YARN-582.3.patch These need to be saved and restored on a per app attempt basis. This is required only when work preserving restart is implemented for secure clusters. In non-preserving restart app attempts are killed and so this does not matter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-582) Restore appToken for app attempt after RM restart
[ https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646835#comment-13646835 ] Hadoop QA commented on YARN-582: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581393/YARN-582.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/855//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/855//console This message is automatically generated. Restore appToken for app attempt after RM restart - Key: YARN-582 URL: https://issues.apache.org/jira/browse/YARN-582 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-582.1.patch, YARN-582.2.patch, YARN-582.3.patch These need to be saved and restored on a per app attempt basis. This is required only when work preserving restart is implemented for secure clusters. In non-preserving restart app attempts are killed and so this does not matter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646914#comment-13646914 ] Xuan Gong commented on YARN-513: Actually, the RMClient.invoke() can be removed. I followed the pattern on how NMProxies created the proxy. So, in current patch, we just need to expose the proxy object, and client calls proxy.method(). It should be good to go. I will do the further test. Create common proxy client for communicating with RM Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646993#comment-13646993 ] Chris Riccomini commented on YARN-614: -- bq. One solution could be to move the check from finishAttempt() to createAttempt(). finishAttempt() always enqueues a new attempt. the new attempt creation checks if one can still be created based on failed count etc. This wouldn't fix the problem with RMAppManager.recover(), would it? Whether we enqueue attempts in finishAttempt or createAttempt, if the attempt account ever goes above maxAppAttempts, it seems like RMAppManager would not recover the app, right? Are you proposing we always call appImpl.recover() in RMAppManager, always retry in RMAppImpl.AttemptFailedTransition, and call RMAppImpl.countFailureToAttemptLimit() inside RMAppImpl.createNewAttempt? Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1 -- Key: YARN-614 URL: https://issues.apache.org/jira/browse/YARN-614 Project: Hadoop YARN Issue Type: Improvement Reporter: Bikas Saha Attachments: YARN-614-0.patch Attempts can fail due to a large number of user errors and they should not be retried unnecessarily. The only reason YARN should retry an attempt is when the hardware fails or YARN has an error. NM failing, lost NM and NM disk errors are the hardware errors that come to mind. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number
[ https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647062#comment-13647062 ] Vinod Kumar Vavilapalli commented on YARN-618: -- +1, checking it in. Modify RM_INVALID_IDENTIFIER to a -ve number - Key: YARN-618 URL: https://issues.apache.org/jira/browse/YARN-618 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-618.1.patch, YARN-618.2.patch, YARN-618.3.patch, YARN-618.patch RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. Probably a -ve number is what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number
[ https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647075#comment-13647075 ] Hudson commented on YARN-618: - Integrated in Hadoop-trunk-Commit #3708 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3708/]) YARN-618. Modified RM_INVALID_IDENTIFIER to be -1 instead of zero. Contributed by Jian He. (Revision 1478230) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1478230 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerConstants.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestEventFlow.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java Modify RM_INVALID_IDENTIFIER to a -ve number - Key: YARN-618 URL: https://issues.apache.org/jira/browse/YARN-618 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-618.1.patch, YARN-618.2.patch, YARN-618.3.patch, YARN-618.patch RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. Probably a -ve number is what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647085#comment-13647085 ] Xuan Gong commented on YARN-513: The new patch includes: 1. remove RMClient, create RMProxy, followed the pattern of NameNodeProxy, includes several static CreateXXX() method. This will be much simpler. 2. At NodeManagerUpdaterImpl, there is not RMProxy object anymore, use RMProxy.createRMProxy() to create ResourceTracker object directly. 3. Remove invoke method, call proxy.method() directly. 4. change the testcases, including changes localRMClient to LocalRMProxy Create common proxy client for communicating with RM Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-513) Create common proxy client for communicating with RM
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-513: --- Attachment: YARN.513.5.patch Create common proxy client for communicating with RM Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number
[ https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-618: - Attachment: YARN-618.3-branch-2.patch The patch didn't apply cleanly against branch-2. Here's the one that I generated one myself which compiles successfully and passes TestContainerManager which had the merge conflict. Modify RM_INVALID_IDENTIFIER to a -ve number - Key: YARN-618 URL: https://issues.apache.org/jira/browse/YARN-618 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-618.1.patch, YARN-618.2.patch, YARN-618.3-branch-2.patch, YARN-618.3.patch, YARN-618.patch RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. Probably a -ve number is what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647089#comment-13647089 ] Xuan Gong commented on YARN-513: The new patch is YARN.513.5.patch Create common proxy client for communicating with RM Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-422) Add NM client library
[ https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-422: - Description: Create a simple wrapper over the ContainerManager protocol to provide hide the details of the protocol implementation. (was: Create a simple wrapper over the AM-NM container protocol to provide hide the details of the protocol implementation.) Add NM client library - Key: YARN-422 URL: https://issues.apache.org/jira/browse/YARN-422 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Zhijie Shen Attachments: AMNMClient_Defination.txt, AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf, YARN-422.1.patch, YARN-422.2.patch Create a simple wrapper over the ContainerManager protocol to provide hide the details of the protocol implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-422) Add NM client library
[ https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-422: - Summary: Add NM client library (was: Add AM-NM client library) Add NM client library - Key: YARN-422 URL: https://issues.apache.org/jira/browse/YARN-422 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Zhijie Shen Attachments: AMNMClient_Defination.txt, AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf, YARN-422.1.patch, YARN-422.2.patch Create a simple wrapper over the AM-NM container protocol to provide hide the details of the protocol implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-639) Make AM of Distributed Shell Use NMClient
Zhijie Shen created YARN-639: Summary: Make AM of Distributed Shell Use NMClient Key: YARN-639 URL: https://issues.apache.org/jira/browse/YARN-639 Project: Hadoop YARN Issue Type: Bug Environment: YARN-422 adds Reporter: Zhijie Shen -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-639) Make AM of Distributed Shell Use NMClient
[ https://issues.apache.org/jira/browse/YARN-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-639: - Description: YARN-422 ads Make AM of Distributed Shell Use NMClient - Key: YARN-639 URL: https://issues.apache.org/jira/browse/YARN-639 Project: Hadoop YARN Issue Type: Bug Environment: YARN-422 adds Reporter: Zhijie Shen YARN-422 ads -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647109#comment-13647109 ] Hadoop QA commented on YARN-513: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581439/YARN.513.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/856//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/856//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/856//console This message is automatically generated. Create common proxy client for communicating with RM Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-639) Make AM of Distributed Shell Use NMClient
[ https://issues.apache.org/jira/browse/YARN-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reassigned YARN-639: Assignee: Zhijie Shen Make AM of Distributed Shell Use NMClient - Key: YARN-639 URL: https://issues.apache.org/jira/browse/YARN-639 Project: Hadoop YARN Issue Type: Bug Environment: YARN-422 adds Reporter: Zhijie Shen Assignee: Zhijie Shen YARN-422 adds NMClient. AM of Distributed Shell should use it instead of using ContainerManager directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-639) Make AM of Distributed Shell Use NMClient
[ https://issues.apache.org/jira/browse/YARN-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-639: - Environment: (was: YARN-422 adds ) Make AM of Distributed Shell Use NMClient - Key: YARN-639 URL: https://issues.apache.org/jira/browse/YARN-639 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen YARN-422 adds NMClient. AM of Distributed Shell should use it instead of using ContainerManager directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-640) Make AM of M/R Use NMClient
Zhijie Shen created YARN-640: Summary: Make AM of M/R Use NMClient Key: YARN-640 URL: https://issues.apache.org/jira/browse/YARN-640 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen YARN-422 adds NMClient. AM of mapreduce should use it instead of using the raw ContainerManager proxy directly. ContainerLauncherImpl needs to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-638) Add RMDelegationTokens back to DelegationTokenSecretManager after RM Restart
[ https://issues.apache.org/jira/browse/YARN-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-638: - Summary: Add RMDelegationTokens back to DelegationTokenSecretManager after RM Restart (was: Add DelegationTokens back to DelegationTokenSecretManager after RM Restart) Add RMDelegationTokens back to DelegationTokenSecretManager after RM Restart Key: YARN-638 URL: https://issues.apache.org/jira/browse/YARN-638 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-638.1.patch This is missed in YARN-581. After RM restart, delegation tokens need to be added both in DelegationTokenRenewer (addressed in YARN-581), and delegationTokenSecretManager -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-606) negative queue metrics apps Failed
[ https://issues.apache.org/jira/browse/YARN-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nemon lou updated YARN-606: --- Assignee: nemon lou negative queue metrics apps Failed - Key: YARN-606 URL: https://issues.apache.org/jira/browse/YARN-606 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: nemon lou Assignee: nemon lou Priority: Minor Queue metrcis apps Failed can be negative in some cases(more than one attempt for an application can cause this). It's confusing if we use this metrics directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-638) Add RMDelegationTokens back to DelegationTokenSecretManager after RM Restart
[ https://issues.apache.org/jira/browse/YARN-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-638: - Description: This is missed in YARN-581. After RM restart, RMDelegationTokens need to be added both in DelegationTokenRenewer (addressed in YARN-581), and delegationTokenSecretManager (was: This is missed in YARN-581. After RM restart, delegation tokens need to be added both in DelegationTokenRenewer (addressed in YARN-581), and delegationTokenSecretManager) Add RMDelegationTokens back to DelegationTokenSecretManager after RM Restart Key: YARN-638 URL: https://issues.apache.org/jira/browse/YARN-638 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-638.1.patch This is missed in YARN-581. After RM restart, RMDelegationTokens need to be added both in DelegationTokenRenewer (addressed in YARN-581), and delegationTokenSecretManager -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-617) In unsercure mode, AM can fake resource requirements
[ https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi reassigned YARN-617: -- Assignee: Omkar Vinit Joshi (was: Vinod Kumar Vavilapalli) In unsercure mode, AM can fake resource requirements - Key: YARN-617 URL: https://issues.apache.org/jira/browse/YARN-617 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Omkar Vinit Joshi Priority: Minor Without security, it is impossible to completely avoid AMs faking resources. We can at the least make it as difficult as possible by using the same container tokens and the RM-NM shared key mechanism over unauthenticated RM-NM channel. In the minimum, this will avoid accidental bugs in AMs in unsecure mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-617) In unsercure mode, AM can fake resource requirements
[ https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-617: --- Attachment: YARN-617.20130501.patch In unsercure mode, AM can fake resource requirements - Key: YARN-617 URL: https://issues.apache.org/jira/browse/YARN-617 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Omkar Vinit Joshi Priority: Minor Attachments: YARN-617.20130501.patch Without security, it is impossible to completely avoid AMs faking resources. We can at the least make it as difficult as possible by using the same container tokens and the RM-NM shared key mechanism over unauthenticated RM-NM channel. In the minimum, this will avoid accidental bugs in AMs in unsecure mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-617) In unsercure mode, AM can fake resource requirements
[ https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-617: --- Attachment: YARN-617.20130501.1.patch In unsercure mode, AM can fake resource requirements - Key: YARN-617 URL: https://issues.apache.org/jira/browse/YARN-617 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Omkar Vinit Joshi Priority: Minor Attachments: YARN-617.20130501.1.patch, YARN-617.20130501.patch Without security, it is impossible to completely avoid AMs faking resources. We can at the least make it as difficult as possible by using the same container tokens and the RM-NM shared key mechanism over unauthenticated RM-NM channel. In the minimum, this will avoid accidental bugs in AMs in unsecure mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-617) In unsercure mode, AM can fake resource requirements
[ https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647211#comment-13647211 ] Omkar Vinit Joshi commented on YARN-617: I am attaching the patch. (Junit tests not included). I will update the patch with tests soon. * At present master key is exchanged between RM and NM only if the environment is secured. I am updating this to make sure that RM - NM exchange master key in both the scenarios Secured / Unsecured. ** During NM register ** During NM heartbeat (status updater only if key is updated as it is today) * At present master key is not genertated/sent during container launch for unsecured case. Now making sure that it is send as a part of the payload to AMLauncher to NodeManager.. On Node Manager this token will be used to verify container start request. ** For Secured case retrieving token from remoteUgi ** For unsecured case retrieving token from passed in container payload. There are some other changes related to this patch * start Container requires UGI-username to be that of container-id ... still I have not understood why so? (ContainerLauncherImpl) * Making sure that NMContainerTokenSecretManager is created for both cases. In unsercure mode, AM can fake resource requirements - Key: YARN-617 URL: https://issues.apache.org/jira/browse/YARN-617 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Omkar Vinit Joshi Priority: Minor Attachments: YARN-617.20130501.1.patch, YARN-617.20130501.patch Without security, it is impossible to completely avoid AMs faking resources. We can at the least make it as difficult as possible by using the same container tokens and the RM-NM shared key mechanism over unauthenticated RM-NM channel. In the minimum, this will avoid accidental bugs in AMs in unsecure mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-629) Make YarnRemoteException not be rooted at IOException
[ https://issues.apache.org/jira/browse/YARN-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647229#comment-13647229 ] Xuan Gong commented on YARN-629: bq:Need a specific test which validates that the final exception is same as the one thrown on the remote side. Can you check if TestNodeManagerResync and TestContainerManager can be changed to validate this? Look for refs to YARN-142 in those tests. Should it be YarnRemoteException ??? Make YarnRemoteException not be rooted at IOException - Key: YARN-629 URL: https://issues.apache.org/jira/browse/YARN-629 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-629.1.patch, YARN-629.2.patch After HADOOP-9343, it should be possible for YarnException to not be rooted at IOException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-629) Make YarnRemoteException not be rooted at IOException
[ https://issues.apache.org/jira/browse/YARN-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647228#comment-13647228 ] Xuan Gong commented on YARN-629: bq:Need a specific test which validates that the final exception is same as the one thrown on the remote side. Can you check if TestNodeManagerResync and TestContainerManager can be changed to validate this? Look for refs to YARN-142 in those tests. bq:Please create a sister JIRA in MapReduce, I'll review MR changes there but commit that together with this patch. Create MR ticket. MAPREDUCE-5204 bq:Investigated the test-failures? they are all passing now. bq:TestClientTokens: Explicitly catch YarnRemoteException and fail instead of instance checks? Can't understand the changes in TestClientRMService, explain please why we need to only now do e.getCause().getMessage() to capture the remote exception message. YarnRemoteException is not rooted as IOException now. For ugi.doAs(), it throws IOException, InterruptedException, UndeclaredThrowableException,etc, but does not throw YarnRemoteException. So, we can not Explicitly catch YarnRemoteException. I think if YarnRemoteException is throw inside the doAs(), it will be wrapped. So by calling getCause() will get it back. Make YarnRemoteException not be rooted at IOException - Key: YARN-629 URL: https://issues.apache.org/jira/browse/YARN-629 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-629.1.patch, YARN-629.2.patch After HADOOP-9343, it should be possible for YarnException to not be rooted at IOException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-18: --- Attachment: YARN-18-v6.1.patch Address minor javadoc issue in v6.1 patch. Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology - Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, Pluggable topologies with NodeGroup for YARN.pdf, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch, YARN-18-v6.1.patch, YARN-18-v6.patch There are several classes in YARN’s container assignment and task scheduling algorithms that relate to data locality which were updated to give preference to running a container on other locality besides node-local and rack-local (like nodegroup-local). This propose to make these data structure/algorithms pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class ScheduledRequests was made a package level class to it would be easier to create a subclass, ScheduledRequestsWithNodeGroup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-617) In unsercure mode, AM can fake resource requirements
[ https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647242#comment-13647242 ] Vinod Kumar Vavilapalli commented on YARN-617: -- bq. Are you saying the goal is to auth container launches with the am token too? Yes. All communication with NM to be authenticated by AMToken. We could keep it separate from startContainer() and stop/getStatus, but we want to solve YARN-613 too. Having the authentication via container-token is forcing us to create a connection per-container. You must have seen the gory MR ContainerLauncher resorting to tricks like creating lots of threads, opening and closing connections immediately to avoid hitting ulimits etc. Some of that ugliness will go away if we perform all authentication using AMTokens and use ContainerTokens for authorization. Thanks for the tip on HADOOP-8783/HADOOP-8784. In unsercure mode, AM can fake resource requirements - Key: YARN-617 URL: https://issues.apache.org/jira/browse/YARN-617 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Omkar Vinit Joshi Priority: Minor Attachments: YARN-617.20130501.1.patch, YARN-617.20130501.patch Without security, it is impossible to completely avoid AMs faking resources. We can at the least make it as difficult as possible by using the same container tokens and the RM-NM shared key mechanism over unauthenticated RM-NM channel. In the minimum, this will avoid accidental bugs in AMs in unsecure mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647249#comment-13647249 ] Hadoop QA commented on YARN-18: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581468/YARN-18-v6.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/857//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/857//console This message is automatically generated. Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology - Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, Pluggable topologies with NodeGroup for YARN.pdf, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch, YARN-18-v6.1.patch, YARN-18-v6.patch There are several classes in YARN’s container assignment and task scheduling algorithms that relate to data locality which were updated to give preference to running a container on other locality besides node-local and rack-local (like nodegroup-local). This propose to make these data structure/algorithms pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class ScheduledRequests was made a package level class to it would be easier to create a subclass, ScheduledRequestsWithNodeGroup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container
[ https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647263#comment-13647263 ] Vinod Kumar Vavilapalli commented on YARN-613: -- bq. I just have general concerns with assuming the entire hadoop environment is trusted and thus introducing weaknesses at a global level . Ex. A weakness is introduced every time one entity shares a secret to validate a token created by another entity. Compromising one of hundreds or thousands of node shouldn't put the entire cluster at risk. Agree with you in general. Read on. bq. If I can gain access to one NM host and its keytab, I believe I can secretly launch a malicious NM? That is true in general. And I am not sure how we can even contain such a break-in. I suppose going the way of DataNode to start the server on privileged ports will contain it [1]. If one can get hold of the keytab(owned by YARN user), I suppose at that point he can launch the container-executor binary too, which will give him root access. So it's all predicated on secure setup to not do stupid things. bq. NMs currently share a global key container token secrets, but there is a jira to move to per-NM secrets so sharing a global AM secret would be another step backwards. Agreed. bq. Exploring alternate avenues to avoid global trust, is passing the allowed am token allowed to get status and stop the container with the launch request not feasible? May be it isn't clear in my proposal, but let me state it again anyways, mostly repeating what I just commented about on YARN-617. - Having the authentication via container-token is forcing us to create a connection per-container. - MR's ContainerLauncher for example resorts to tricks like creating lots of threads, opening and closing connections immediately to avoid hitting ulimits etc. - Most of that ugliness will go away if we perform all authentication using AMTokens for *all* AM-NM APIs and use ContainerTokens for authorization of startContainer() requests. May be we should just do [1] above (previleged ports). To sum it up, I am open to suggestions. My fundamental requirements are: - If possible, AMs should open only one connection - secure one - to each NM. Not one per container - All connections (all APIs) between AM and NM should be authenticated - DIGEST based at best here - and if possible without AMs having to latch on to things like ContainerTokens for potentially long periods. Create NM proxy per NM instead of per container --- Key: YARN-613 URL: https://issues.apache.org/jira/browse/YARN-613 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Vinod Kumar Vavilapalli Currently a new NM proxy has to be created per container since the secure authentication is using a containertoken from the container. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-629) Make YarnRemoteException not be rooted at IOException
[ https://issues.apache.org/jira/browse/YARN-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647272#comment-13647272 ] Vinod Kumar Vavilapalli commented on YARN-629: -- bq. YarnRemoteException is not rooted as IOException now. For ugi.doAs(), it throws IOException, InterruptedException, UndeclaredThrowableException,etc, but does not throw YarnRemoteException. So, we can not Explicitly catch YarnRemoteException. Then why was this code catching YarnRemoteException *before* your patch ? Try changing client.ping() to throw YarnRemoteException, like we do in real protocols. bq. I think if YarnRemoteException is throw inside the doAs(), it will be wrapped. So by calling getCause() will get it back. Why didn't we need this before and only now? bq. Should it be YarnRemoteException ??? No, see NMNotYetReadyException and InvalidContainerException. Make YarnRemoteException not be rooted at IOException - Key: YARN-629 URL: https://issues.apache.org/jira/browse/YARN-629 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-629.1.patch, YARN-629.2.patch After HADOOP-9343, it should be possible for YarnException to not be rooted at IOException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira