[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644327#comment-13644327 ] Bikas Saha commented on YARN-45: My understanding is the the containers being presented in PreemptionMessage are going to be preempted by the RM some time in the near future if the RM cannot find free resources elsewhere. The AM's are not supposed to preempt the containers but they are encourage to checkpoint and save work. The RM can always choose to not preempt these containers and so it would be sub-optimal for the AM to kill these containers. If we want to add additional information besides the set of containers-to-be-preempted then I would prefer ResourceRequest (like it was in the original patch) and not Resource. Not only is that symmetric but also allows the RM to provide additional information about where to free containers. A smarter RM could potentially ask for resources to be preempted where the under-allocated job wants it and a smart AM could help out by choosing containers close to the desired locations. Secondly, Resource is too amorphous by itself. Asking an AM to free 50GB does not tell it whether the RM needs 10*5 or 50*1. Without that information the AM can end up freeing containers in a manner that does not help the RM to meet the request of the under-allocated job, thus failing to meet quota and wasting work at the same time. Scheduler feedback to AM to release containers -- Key: YARN-45 URL: https://issues.apache.org/jira/browse/YARN-45 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Chris Douglas Assignee: Carlo Curino Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45_summary_of_alternatives.pdf The ResourceManager strikes a balance between cluster utilization and strict enforcement of resource invariants in the cluster. Individual allocations of containers must be reclaimed- or reserved- to restore the global invariants when cluster load shifts. In some cases, the ApplicationMaster can respond to fluctuations in resource availability without losing the work already completed by that task (MAPREDUCE-4584). Supplying it with this information would be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol for the RM to ask the AM to release containers. [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-576) RM should not allow registrations from NMs that do not satisfy minimum scheduler allocations
[ https://issues.apache.org/jira/browse/YARN-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644408#comment-13644408 ] Hudson commented on YARN-576: - Integrated in Hadoop-Yarn-trunk #198 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/198/]) YARN-576. Modified ResourceManager to reject NodeManagers that don't satisy minimum resource requirements. Contributed by Kenji Kikushima. (Revision 1476824) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1476824 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMExpiry.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestRMNMRPCResponseId.java RM should not allow registrations from NMs that do not satisfy minimum scheduler allocations Key: YARN-576 URL: https://issues.apache.org/jira/browse/YARN-576 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Kenji Kikushima Labels: newbie Fix For: 2.0.5-beta Attachments: YARN-576-2.patch, YARN-576-3.patch, YARN-576-4.patch, YARN-576.patch If the minimum resource allocation configured for the RM scheduler is 1 GB, the RM should drop all NMs that register with a total capacity of less than 1 GB. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-576) RM should not allow registrations from NMs that do not satisfy minimum scheduler allocations
[ https://issues.apache.org/jira/browse/YARN-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644462#comment-13644462 ] Hudson commented on YARN-576: - Integrated in Hadoop-Hdfs-trunk #1387 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1387/]) YARN-576. Modified ResourceManager to reject NodeManagers that don't satisy minimum resource requirements. Contributed by Kenji Kikushima. (Revision 1476824) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1476824 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMExpiry.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestRMNMRPCResponseId.java RM should not allow registrations from NMs that do not satisfy minimum scheduler allocations Key: YARN-576 URL: https://issues.apache.org/jira/browse/YARN-576 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Kenji Kikushima Labels: newbie Fix For: 2.0.5-beta Attachments: YARN-576-2.patch, YARN-576-3.patch, YARN-576-4.patch, YARN-576.patch If the minimum resource allocation configured for the RM scheduler is 1 GB, the RM should drop all NMs that register with a total capacity of less than 1 GB. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-576) RM should not allow registrations from NMs that do not satisfy minimum scheduler allocations
[ https://issues.apache.org/jira/browse/YARN-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644498#comment-13644498 ] Hudson commented on YARN-576: - Integrated in Hadoop-Mapreduce-trunk #1414 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1414/]) YARN-576. Modified ResourceManager to reject NodeManagers that don't satisy minimum resource requirements. Contributed by Kenji Kikushima. (Revision 1476824) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1476824 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMExpiry.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestRMNMRPCResponseId.java RM should not allow registrations from NMs that do not satisfy minimum scheduler allocations Key: YARN-576 URL: https://issues.apache.org/jira/browse/YARN-576 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Kenji Kikushima Labels: newbie Fix For: 2.0.5-beta Attachments: YARN-576-2.patch, YARN-576-3.patch, YARN-576-4.patch, YARN-576.patch If the minimum resource allocation configured for the RM scheduler is 1 GB, the RM should drop all NMs that register with a total capacity of less than 1 GB. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644626#comment-13644626 ] Chris Douglas commented on YARN-45: --- I'm also a fan of {{ResourceRequest}}, but we're not really using all its features, yet. Similarly, {{Resource}} bakes in the fungibility of resources, which could be awkward as the RM accommodates richer requests (as in YARN-392). We could use {{ResourceRequest}}- so the API is there for extensions- but only populate the capability as an aggregate. With the convention that \-1 containers can mean packed as you see fit, it expresses {{Resource}} (which we need in practice, since the priorities for requests don't always [match the preemption order|https://issues.apache.org/jira/browse/YARN-569?focusedCommentId=13638825page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13638825]), which is sufficient for the current schedulers. If we're adding the contract back with the set of containers, the [semantics|https://issues.apache.org/jira/browse/YARN-45?focusedCommentId=13628950page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628950] we discussed earlier still seem OK. Scheduler feedback to AM to release containers -- Key: YARN-45 URL: https://issues.apache.org/jira/browse/YARN-45 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Chris Douglas Assignee: Carlo Curino Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45_summary_of_alternatives.pdf The ResourceManager strikes a balance between cluster utilization and strict enforcement of resource invariants in the cluster. Individual allocations of containers must be reclaimed- or reserved- to restore the global invariants when cluster load shifts. In some cases, the ApplicationMaster can respond to fluctuations in resource availability without losing the work already completed by that task (MAPREDUCE-4584). Supplying it with this information would be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol for the RM to ask the AM to release containers. [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644664#comment-13644664 ] Carlo Curino commented on YARN-45: -- [~acmurthy] I see your point, which was in fact reflected more clearly in our initial proposal. The only caveat is not to make this a capacity-only protocol (which you are not, but I wanted to reiterate that there are other use cases). I like [~bikassaha] and [~chris.douglas] spin on it (i.e., using ResourceRequest), as it gives us the immediate capacity angle, but will eventually allow to evolve the implementations towards something richer (e.g., the preempt on behalf of a specific request that Bikas considered before) without impact to the protocols. I think there is a slightly cleaner version of Chris's proposal: use ResourceRequest and to represent a request that only cares about overall capacity we could express the ResourceRequest as a multiple of the minimum allocation (i.e., if we want 100GB of RAM back and min_container size is 1GB we ask for 100 x 1GB containers). This achieves Chris's proposal with a slightly prettier use of ResourceRequest. Note that there are size-matching issues (e.g., you have 1.5GB containers and I ask for 1x1GB containers, but we have very similar problems with Resource). I would say that as Chris pointed out [these semantics | https://issues.apache.org/jira/browse/YARN-45?focusedCommentId=13628950page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628950] plus the use of ResourceRequest I propose here as a minor variation on Chris's take should cover Arun's and Bika's comments (and I believe also the prior 45+ messages). Thoughts? Scheduler feedback to AM to release containers -- Key: YARN-45 URL: https://issues.apache.org/jira/browse/YARN-45 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Chris Douglas Assignee: Carlo Curino Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45_summary_of_alternatives.pdf The ResourceManager strikes a balance between cluster utilization and strict enforcement of resource invariants in the cluster. Individual allocations of containers must be reclaimed- or reserved- to restore the global invariants when cluster load shifts. In some cases, the ApplicationMaster can respond to fluctuations in resource availability without losing the work already completed by that task (MAPREDUCE-4584). Supplying it with this information would be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol for the RM to ask the AM to release containers. [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644675#comment-13644675 ] Chris Douglas commented on YARN-45: --- bq. we could express the ResourceRequest as a multiple of the minimum allocation +1 This is better Scheduler feedback to AM to release containers -- Key: YARN-45 URL: https://issues.apache.org/jira/browse/YARN-45 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Chris Douglas Assignee: Carlo Curino Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45_summary_of_alternatives.pdf The ResourceManager strikes a balance between cluster utilization and strict enforcement of resource invariants in the cluster. Individual allocations of containers must be reclaimed- or reserved- to restore the global invariants when cluster load shifts. In some cases, the ApplicationMaster can respond to fluctuations in resource availability without losing the work already completed by that task (MAPREDUCE-4584). Supplying it with this information would be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol for the RM to ask the AM to release containers. [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644695#comment-13644695 ] Xuan Gong commented on YARN-513: bq:We could try to reuse existing RetryPolicy etc inside RMClient as long as we maintain the RMClient abstraction. Reuse the RetryPolicy in new patch. The RetryInvocationHandler provides the retry logic in its invoke method. We can reuse that bq:Are we not missing an RMClient.disconnect()? This one would internally stop the proxy? Yes, we need that. Adding the disconnect code in the new patch bq:Looks like NMStatusUpdater.getRMClient() can be removed because createRMClient() is being overridden by all tests. Removed from the new patch bq:Why are we throwing YARNException? Original code throws the YarnException, now i want to keep consistant. And I think we will change the exception thru YARN-142. bq:Is any test explicitly testing the new code with a real RM? How about manually doing it? Tested the new code in single node cluster Verify all clients will wait for RM to restart -- Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-513: --- Attachment: YARN-513.3.patch Verify all clients will wait for RM to restart -- Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-326: Attachment: FairSchedulerDRFDesignDoc-1.pdf Uploading a new design doc to reflect the discussion Add multi-resource scheduling to the fair scheduler --- Key: YARN-326 URL: https://issues.apache.org/jira/browse/YARN-326 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: FairSchedulerDRFDesignDoc-1.pdf, FairSchedulerDRFDesignDoc.pdf, YARN-326.patch, YARN-326.patch With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness. The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness. More details to come on how the corner cases with fair scheduler configs such as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644734#comment-13644734 ] Karthik Kambatla commented on YARN-326: --- Sandy - thanks for updating the doc. The approach is clear and fairly straight-forward. Nit: might want to add other DRF-followup papers to references. Add multi-resource scheduling to the fair scheduler --- Key: YARN-326 URL: https://issues.apache.org/jira/browse/YARN-326 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: FairSchedulerDRFDesignDoc-1.pdf, FairSchedulerDRFDesignDoc.pdf, YARN-326.patch, YARN-326.patch With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness. The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness. More details to come on how the corner cases with fair scheduler configs such as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644753#comment-13644753 ] Sandy Ryza commented on YARN-326: - Uploaded new patch that reflects design changes Add multi-resource scheduling to the fair scheduler --- Key: YARN-326 URL: https://issues.apache.org/jira/browse/YARN-326 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: FairSchedulerDRFDesignDoc-1.pdf, FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326.patch, YARN-326.patch With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness. The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness. More details to come on how the corner cases with fair scheduler configs such as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644759#comment-13644759 ] Hadoop QA commented on YARN-326: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581014/YARN-326-1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/837//console This message is automatically generated. Add multi-resource scheduling to the fair scheduler --- Key: YARN-326 URL: https://issues.apache.org/jira/browse/YARN-326 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: FairSchedulerDRFDesignDoc-1.pdf, FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326.patch, YARN-326.patch With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness. The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness. More details to come on how the corner cases with fair scheduler configs such as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-582) Restore appToken and clientToken for app attempt after RM restart
[ https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644770#comment-13644770 ] Hadoop QA commented on YARN-582: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581012/YARN-582.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/836//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/836//console This message is automatically generated. Restore appToken and clientToken for app attempt after RM restart - Key: YARN-582 URL: https://issues.apache.org/jira/browse/YARN-582 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-582.1.patch These need to be saved and restored on a per app attempt basis. This is required only when work preserving restart is implemented for secure clusters. In non-preserving restart app attempts are killed and so this does not matter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644784#comment-13644784 ] Robert Joseph Evans commented on YARN-528: -- Thanks for doing this Sid. I started pulling on the string and there was just too much involved, so I had to stop. Make IDs read only -- Key: YARN-528 URL: https://issues.apache.org/jira/browse/YARN-528 Project: Hadoop YARN Issue Type: Sub-task Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: y528_AppIdPart_01_Refactor.txt, y528_AppIdPart_02_AppIdChanges.txt, y528_AppIdPart_03_fixUsage.txt, y528_ApplicationIdComplete_WIP.txt, YARN-528.txt, YARN-528.txt I really would like to rip out most if not all of the abstraction layer that sits in-between Protocol Buffers, the RPC, and the actual user code. We have no plans to support any other serialization type, and the abstraction layer just, makes it more difficult to change protocols, makes changing them more error prone, and slows down the objects themselves. Completely doing that is a lot of work. This JIRA is a first step towards that. It makes the various ID objects immutable. If this patch is wel received I will try to go through other objects/classes of objects and update them in a similar way. This is probably the last time we will be able to make a change like this before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (YARN-579) Make ApplicationToken part of Container's token list to help RM-restart
[ https://issues.apache.org/jira/browse/YARN-579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp reopened YARN-579: -- This has broken secure clusters. The AM is unable to find the token to register with the RM. I've debugged it far enough to see that localization has put the token in the nm-private dir, so it looks like the AM has amnesia when it connects to the RM. {noformat} 2013-04-29 17:47:02,666 DEBUG [IPC Client (4914628) connection to $RM:8030 from $USER] org.apache.hadoop.ipc.Client: IPC Client (4914628) connection to $RM:8030 from $USER: stopped, remaining connections 1 2013-04-29 17:47:02,667 ERROR [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Exception while registering java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:128) at org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.registerApplicationMaster(AMRMProtocolPBClientImpl.java:103) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:153) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.start(RMCommunicator.java:112) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.start(RMContainerAllocator.java:211) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.start(MRAppMaster.java:797) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:1014) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1369) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1365) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1318) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): SIMPLE authentication is not enabled. Available:[KERBEROS, DIGEST] at org.apache.hadoop.ipc.Client.call(Client.java:1229) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy28.registerApplicationMaster(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.registerApplicationMaster(AMRMProtocolPBClientImpl.java:100) ... 12 more 2013-04-29 17:47:02,668 ERROR [main] org.apache.hadoop.yarn.service.CompositeService: Error starting services org.apache.hadoop.mapreduce.v2.app.MRAppMaster org.apache.hadoop.yarn.YarnException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:166) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.start(RMCommunicator.java:112) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.start(RMContainerAllocator.java:211) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.start(MRAppMaster.java:797) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:1014) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1369) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1365) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1318) Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:128) at org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.registerApplicationMaster(AMRMProtocolPBClientImpl.java:103) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:153) ... 11 more Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): SIMPLE authentication is not enabled. Available:[KERBEROS, DIGEST] at org.apache.hadoop.ipc.Client.call(Client.java:1229) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy28.registerApplicationMaster(Unknown Source)
[jira] [Commented] (YARN-575) ContainerManager APIs should be user accessible
[ https://issues.apache.org/jira/browse/YARN-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644793#comment-13644793 ] Daryn Sharp commented on YARN-575: -- I agree with your 2nd point, I think allowing users to directly stop containers will lead to problems. ContainerManager APIs should be user accessible --- Key: YARN-575 URL: https://issues.apache.org/jira/browse/YARN-575 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Assignee: Vinod Kumar Vavilapalli Priority: Critical Auth for ContainerManager is based on the containerId being accessed - since this is what is used to launch containers (There's likely another jira somewhere to change this to not be containerId based). What this also means is the API is effectively not usable with kerberos credentials. Also, it should be possible to use this API with some generic tokens (RMDelegation?), instead of with Container specific tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-617) In unsercure mode, AM can fake resource requirements
[ https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644795#comment-13644795 ] Daryn Sharp commented on YARN-617: -- Does there really need to be different NM behavior? Ie. Why can't the NM always require container tokens regardless of security setting? In unsercure mode, AM can fake resource requirements - Key: YARN-617 URL: https://issues.apache.org/jira/browse/YARN-617 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Minor Without security, it is impossible to completely avoid AMs faking resources. We can at the least make it as difficult as possible by using the same container tokens and the RM-NM shared key mechanism over unauthenticated RM-NM channel. In the minimum, this will avoid accidental bugs in AMs in unsecure mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-582) Restore appToken and clientToken for app attempt after RM restart
[ https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644805#comment-13644805 ] Daryn Sharp commented on YARN-582: -- I've only glanced over the patch, but do these tokens actually need to be handled specially? Is it feasible to handle all tokens in an opaque credentials within the store? I think that may reduce the copy-n-paste code throughout the stores for restoring these tokens. Restore appToken and clientToken for app attempt after RM restart - Key: YARN-582 URL: https://issues.apache.org/jira/browse/YARN-582 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-582.1.patch These need to be saved and restored on a per app attempt basis. This is required only when work preserving restart is implemented for secure clusters. In non-preserving restart app attempts are killed and so this does not matter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container
[ https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644809#comment-13644809 ] Daryn Sharp commented on YARN-613: -- Question: How do you plan for NMs to authenticate the AM tokens? Create NM proxy per NM instead of per container --- Key: YARN-613 URL: https://issues.apache.org/jira/browse/YARN-613 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Vinod Kumar Vavilapalli Currently a new NM proxy has to be created per container since the secure authentication is using a containertoken from the container. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-617) In unsercure mode, AM can fake resource requirements
[ https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644814#comment-13644814 ] Vinod Kumar Vavilapalli commented on YARN-617: -- bq. Does there really need to be different NM behavior? Ie. Why can't the NM always require container tokens regardless of security setting? That is what I meant in my points above. ContainerTokens will always be sent irrespective of security and are used for *authorization*. I just put them as separate points to highlight that in secure mode, we also use ContainerTokens for *authentication*. In unsercure mode, AM can fake resource requirements - Key: YARN-617 URL: https://issues.apache.org/jira/browse/YARN-617 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Minor Without security, it is impossible to completely avoid AMs faking resources. We can at the least make it as difficult as possible by using the same container tokens and the RM-NM shared key mechanism over unauthenticated RM-NM channel. In the minimum, this will avoid accidental bugs in AMs in unsecure mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container
[ https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644816#comment-13644816 ] Vinod Kumar Vavilapalli commented on YARN-613: -- bq. Question: How do you plan for NMs to authenticate the AM tokens? I thought I covered it but missed stating that - RM will share the underlying secret key corresponding to AM tokens as part of node-registration just like the one corresponding to ContainerTokens. Create NM proxy per NM instead of per container --- Key: YARN-613 URL: https://issues.apache.org/jira/browse/YARN-613 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Vinod Kumar Vavilapalli Currently a new NM proxy has to be created per container since the secure authentication is using a containertoken from the container. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-582) Restore appToken and clientToken for app attempt after RM restart
[ https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644820#comment-13644820 ] Vinod Kumar Vavilapalli commented on YARN-582: -- bq. Is it feasible to handle all tokens in an opaque credentials within the store? Agreed. But because there are two types of tokens - application level and application-attempt level, we should have two credential fields. Restore appToken and clientToken for app attempt after RM restart - Key: YARN-582 URL: https://issues.apache.org/jira/browse/YARN-582 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-582.1.patch These need to be saved and restored on a per app attempt basis. This is required only when work preserving restart is implemented for secure clusters. In non-preserving restart app attempts are killed and so this does not matter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-620) TestContainerLocalizer.testContainerLocalizerMain failed on branch-2
[ https://issues.apache.org/jira/browse/YARN-620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644826#comment-13644826 ] Jian He commented on YARN-620: -- checked, it works fine now TestContainerLocalizer.testContainerLocalizerMain failed on branch-2 - Key: YARN-620 URL: https://issues.apache.org/jira/browse/YARN-620 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-620.1.patch Argument(s) are different! Wanted: localFs.mkdir( /Users/jhe/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer/0/usercache/yak/filecache, isA(org.apache.hadoop.fs.permission.FsPermission), false ); - at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer.testContainerLocalizerMain(TestContainerLocalizer.java:170) Actual invocation has different arguments: localFs.mkdir( file:/Users/jhe/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer/0/usercache/yak/filecache, rwxr-xr-x, false ); - at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer.testContainerLocalizerMain(TestContainerLocalizer.java:162) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer.testContainerLocalizerMain(TestContainerLocalizer.java:170) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-624) Support gang scheduling in the AM RM protocol
Sandy Ryza created YARN-624: --- Summary: Support gang scheduling in the AM RM protocol Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-582) Restore appToken and clientToken for app attempt after RM restart
[ https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644828#comment-13644828 ] Jian He commented on YARN-582: -- Yes, application-level token is stored along with ApplicationSubmissionContext, no need additional handle for that Restore appToken and clientToken for app attempt after RM restart - Key: YARN-582 URL: https://issues.apache.org/jira/browse/YARN-582 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-582.1.patch These need to be saved and restored on a per app attempt basis. This is required only when work preserving restart is implemented for secure clusters. In non-preserving restart app attempts are killed and so this does not matter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-582) Restore appToken and clientToken for app attempt after RM restart
[ https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644845#comment-13644845 ] Bikas Saha commented on YARN-582: - The RMStore stores applications and their attempts. And it is used to restore applications and their attempts from the data that they had earlier stored. This allows the recovery code to follow existing code paths to the fullest extent and prevent recovery logic from diverging from the normal code path. So I would like to avoid storing tokens separately from apps/attempts and then have to manage their relationship later on during recovery. As far as saving appToken and clientToken, I agree it would be nice to have a single object store all attempt tokens in one place. At AppSubmitContext does that for app tokens. Restore appToken and clientToken for app attempt after RM restart - Key: YARN-582 URL: https://issues.apache.org/jira/browse/YARN-582 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-582.1.patch These need to be saved and restored on a per app attempt basis. This is required only when work preserving restart is implemented for secure clusters. In non-preserving restart app attempts are killed and so this does not matter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-579) Make ApplicationToken part of Container's token list to help RM-restart
[ https://issues.apache.org/jira/browse/YARN-579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644859#comment-13644859 ] Vinod Kumar Vavilapalli commented on YARN-579: -- I validated this on trunk, I can run it successfully on trunk even now. It seems like it is failing on branch-2. Something at RPC level I suppose, digging through.. Make ApplicationToken part of Container's token list to help RM-restart --- Key: YARN-579 URL: https://issues.apache.org/jira/browse/YARN-579 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.4-alpha Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Fix For: 2.0.5-beta Attachments: YARN-579-20130422.1.txt, YARN-579-20130422.1_YARNChanges.txt Container is already persisted for helping RM restart. Instead of explicitly setting ApplicationToken in AM's env, if we change it to be in Container, we can avoid env and can also help restart. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644891#comment-13644891 ] Hadoop QA commented on YARN-513: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581001/YARN-513.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1366 javac compiler warnings (more than the trunk's current 1365 warnings). {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/838//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/838//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/838//console This message is automatically generated. Verify all clients will wait for RM to restart -- Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number
[ https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-618: - Attachment: YARN-618.patch This patch changed RM_INVALID_IDENTIFIER to a -ve number, and changed the tests accordingly Modify RM_INVALID_IDENTIFIER to a -ve number - Key: YARN-618 URL: https://issues.apache.org/jira/browse/YARN-618 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-618.patch RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. Probably a -ve number is what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number
[ https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644954#comment-13644954 ] Hadoop QA commented on YARN-618: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581054/YARN-618.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/839//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/839//console This message is automatically generated. Modify RM_INVALID_IDENTIFIER to a -ve number - Key: YARN-618 URL: https://issues.apache.org/jira/browse/YARN-618 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-618.patch RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. Probably a -ve number is what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-575) ContainerManager APIs should be user accessible
[ https://issues.apache.org/jira/browse/YARN-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644964#comment-13644964 ] Siddharth Seth commented on YARN-575: - I'm fine going the route of getting container status from the RM - when required. Assuming we keep the NM equivalent though, for AMs to use. The AppTokens will be used for Authentication as well as Authorization for getContainerStatus calls ? ContainerManager APIs should be user accessible --- Key: YARN-575 URL: https://issues.apache.org/jira/browse/YARN-575 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Assignee: Vinod Kumar Vavilapalli Priority: Critical Auth for ContainerManager is based on the containerId being accessed - since this is what is used to launch containers (There's likely another jira somewhere to change this to not be containerId based). What this also means is the API is effectively not usable with kerberos credentials. Also, it should be possible to use this API with some generic tokens (RMDelegation?), instead of with Container specific tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644970#comment-13644970 ] Siddharth Seth commented on YARN-528: - bq Thanks for doing this Sid. I started pulling on the string and there was just too much involved, so I had to stop. Any thoughts on the approach used in the patch. Making IDs immutable should be reasonably fast using this - changing the PB mechanisms for other classes is a different beast though. Make IDs read only -- Key: YARN-528 URL: https://issues.apache.org/jira/browse/YARN-528 Project: Hadoop YARN Issue Type: Sub-task Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: y528_AppIdPart_01_Refactor.txt, y528_AppIdPart_02_AppIdChanges.txt, y528_AppIdPart_03_fixUsage.txt, y528_ApplicationIdComplete_WIP.txt, YARN-528.txt, YARN-528.txt I really would like to rip out most if not all of the abstraction layer that sits in-between Protocol Buffers, the RPC, and the actual user code. We have no plans to support any other serialization type, and the abstraction layer just, makes it more difficult to change protocols, makes changing them more error prone, and slows down the objects themselves. Completely doing that is a lot of work. This JIRA is a first step towards that. It makes the various ID objects immutable. If this patch is wel received I will try to go through other objects/classes of objects and update them in a similar way. This is probably the last time we will be able to make a change like this before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number
[ https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-618: - Attachment: YARN-618.1.patch fixed test failure Modify RM_INVALID_IDENTIFIER to a -ve number - Key: YARN-618 URL: https://issues.apache.org/jira/browse/YARN-618 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-618.1.patch, YARN-618.patch RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. Probably a -ve number is what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-326: Attachment: YARN-326-1.patch Add multi-resource scheduling to the fair scheduler --- Key: YARN-326 URL: https://issues.apache.org/jira/browse/YARN-326 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: FairSchedulerDRFDesignDoc-1.pdf, FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326-1.patch, YARN-326.patch, YARN-326.patch With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness. The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness. More details to come on how the corner cases with fair scheduler configs such as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-506) Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute
[ https://issues.apache.org/jira/browse/YARN-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644996#comment-13644996 ] Hadoop QA commented on YARN-506: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12580366/YARN-506.commonfileutils.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/841//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/841//console This message is automatically generated. Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute Key: YARN-506 URL: https://issues.apache.org/jira/browse/YARN-506 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: YARN-506.commonfileutils.2.patch, YARN-506.commonfileutils.patch Move to common utils described in HADOOP-9413 that work well cross-platform. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number
[ https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644997#comment-13644997 ] Hadoop QA commented on YARN-618: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581060/YARN-618.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/840//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/840//console This message is automatically generated. Modify RM_INVALID_IDENTIFIER to a -ve number - Key: YARN-618 URL: https://issues.apache.org/jira/browse/YARN-618 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-618.1.patch, YARN-618.patch RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. Probably a -ve number is what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-513: --- Attachment: YARN-513.4.patch Fix -1 on javadoc warning Verify all clients will wait for RM to restart -- Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645015#comment-13645015 ] Hadoop QA commented on YARN-326: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581061/YARN-326-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/842//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/842//console This message is automatically generated. Add multi-resource scheduling to the fair scheduler --- Key: YARN-326 URL: https://issues.apache.org/jira/browse/YARN-326 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: FairSchedulerDRFDesignDoc-1.pdf, FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326-1.patch, YARN-326.patch, YARN-326.patch With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness. The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness. More details to come on how the corner cases with fair scheduler configs such as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-506) Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute
[ https://issues.apache.org/jira/browse/YARN-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645023#comment-13645023 ] Hudson commented on YARN-506: - Integrated in Hadoop-trunk-Commit #3695 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3695/]) YARN-506. Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute. Contributed by Ivan Mitic. (Revision 1477408) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1477408 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute Key: YARN-506 URL: https://issues.apache.org/jira/browse/YARN-506 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Fix For: 3.0.0 Attachments: YARN-506.commonfileutils.2.patch, YARN-506.commonfileutils.patch Move to common utils described in HADOOP-9413 that work well cross-platform. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645042#comment-13645042 ] Hadoop QA commented on YARN-513: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581065/YARN-513.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/843//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/843//console This message is automatically generated. Verify all clients will wait for RM to restart -- Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-142) Change YARN APIs to throw IOException
[ https://issues.apache.org/jira/browse/YARN-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645060#comment-13645060 ] Siddharth Seth commented on YARN-142: - After HADOOP-9343, it should be possible for YarnException to not be rooted at IOException. So all methods can declare IOException and YarnException - and have the specializations of YarnException listed in the Javadoc. Change YARN APIs to throw IOException - Key: YARN-142 URL: https://issues.apache.org/jira/browse/YARN-142 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Siddharth Seth Assignee: Xuan Gong Priority: Blocker Attachments: YARN-142.1.patch, YARN-142.2.patch, YARN-142.3.patch, YARN-142.4.patch Ref: MAPREDUCE-4067 All YARN APIs currently throw YarnRemoteException. 1) This cannot be extended in it's current form. 2) The RPC layer can throw IOExceptions. These end up showing up as UndeclaredThrowableExceptions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-599) Refactoring submitApplication in ClientRMService and RMAppManager
[ https://issues.apache.org/jira/browse/YARN-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-599: - Attachment: YARN-599.2.patch In the newer patch, I've updated the comments in ClientRMService and RMAppManager, and added audit logging for user, and duplicate Id exceptions. Refactoring submitApplication in ClientRMService and RMAppManager - Key: YARN-599 URL: https://issues.apache.org/jira/browse/YARN-599 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-599.1.patch, YARN-599.2.patch Currently, ClientRMService#submitApplication call RMAppManager#handle, and consequently call RMAppMangager#submitApplication directly, though the code looks like scheduling an APP_SUBMIT event. In addition, the validation code before creating an RMApp instance is not well organized. Ideally, the dynamic validation, which depends on the RM's configuration, should be put in RMAppMangager#submitApplication. RMAppMangager#submitApplication is called by ClientRMService#submitApplication and RMAppMangager#recover. Since the configuration may be changed after RM restarts, the validation needs to be done again even in recovery mode. Therefore, resource request validation, which based on min/max resource limits, should be moved from ClientRMService#submitApplication to RMAppMangager#submitApplication. On the other hand, the static validation, which is independent of the RM's configuration should be put in ClientRMService#submitApplication, because it is only need to be done once during the first submission. Furthermore, try-catch flow in RMAppMangager#submitApplication has a flaw. RMAppMangager#submitApplication has a flaw is not synchronized. If two application submissions with the same application ID enter the function, and one progresses to the completion of RMApp instantiation, and the other progresses the completion of putting the RMApp instance into rmContext, the slower submission will cause an exception due to the duplicate application ID. However, the exception will cause the RMApp instance already in rmContext (belongs to the faster submission) being rejected with the current code flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-578) NodeManager should use SecureIOUtils for serving logs and intermediate outputs
[ https://issues.apache.org/jira/browse/YARN-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645240#comment-13645240 ] Vinod Kumar Vavilapalli commented on YARN-578: -- Can you use this only for YARN changes i.e. serving logs and open a separate MAPREDUCE ticket for ShuffleHandler? For the YARN changes: - Remove the comment above the code which talks about SecureIOUtils ;) - I think we should separate the exception message to clearly say whether this was an permission-issue or something else. NodeManager should use SecureIOUtils for serving logs and intermediate outputs -- Key: YARN-578 URL: https://issues.apache.org/jira/browse/YARN-578 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Vinod Kumar Vavilapalli Assignee: Omkar Vinit Joshi Attachments: yarn-578-20130426.patch Log servlets for serving logs and the ShuffleService for serving intermediate outputs both should use SecureIOUtils for avoiding symlink attacks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-621) RM triggers web auth failure before first job
[ https://issues.apache.org/jira/browse/YARN-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reassigned YARN-621: Assignee: Vinod Kumar Vavilapalli (was: Omkar Vinit Joshi) Allen, can you share your environment details, I am not able to reproduce this in my setup. RM triggers web auth failure before first job - Key: YARN-621 URL: https://issues.apache.org/jira/browse/YARN-621 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.4-alpha Reporter: Allen Wittenauer Assignee: Vinod Kumar Vavilapalli Priority: Critical On a secure YARN setup, before the first job is executed, going to the web interface of the resource manager triggers authentication errors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-599) Refactoring submitApplication in ClientRMService and RMAppManager
[ https://issues.apache.org/jira/browse/YARN-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645248#comment-13645248 ] Hadoop QA commented on YARN-599: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581118/YARN-599.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/844//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/844//console This message is automatically generated. Refactoring submitApplication in ClientRMService and RMAppManager - Key: YARN-599 URL: https://issues.apache.org/jira/browse/YARN-599 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-599.1.patch, YARN-599.2.patch Currently, ClientRMService#submitApplication call RMAppManager#handle, and consequently call RMAppMangager#submitApplication directly, though the code looks like scheduling an APP_SUBMIT event. In addition, the validation code before creating an RMApp instance is not well organized. Ideally, the dynamic validation, which depends on the RM's configuration, should be put in RMAppMangager#submitApplication. RMAppMangager#submitApplication is called by ClientRMService#submitApplication and RMAppMangager#recover. Since the configuration may be changed after RM restarts, the validation needs to be done again even in recovery mode. Therefore, resource request validation, which based on min/max resource limits, should be moved from ClientRMService#submitApplication to RMAppMangager#submitApplication. On the other hand, the static validation, which is independent of the RM's configuration should be put in ClientRMService#submitApplication, because it is only need to be done once during the first submission. Furthermore, try-catch flow in RMAppMangager#submitApplication has a flaw. RMAppMangager#submitApplication has a flaw is not synchronized. If two application submissions with the same application ID enter the function, and one progresses to the completion of RMApp instantiation, and the other progresses the completion of putting the RMApp instance into rmContext, the slower submission will cause an exception due to the duplicate application ID. However, the exception will cause the RMApp instance already in rmContext (belongs to the faster submission) being rejected with the current code flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-599) Refactoring submitApplication in ClientRMService and RMAppManager
[ https://issues.apache.org/jira/browse/YARN-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645288#comment-13645288 ] Vinod Kumar Vavilapalli commented on YARN-599: -- Hm, it isn't straight-forward to figure that failures during RMAppManager.submitApplication() are properly put in Audit logs. But they are, I just verified. The latest patch looks good to me. +1, checking it in.. Refactoring submitApplication in ClientRMService and RMAppManager - Key: YARN-599 URL: https://issues.apache.org/jira/browse/YARN-599 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-599.1.patch, YARN-599.2.patch Currently, ClientRMService#submitApplication call RMAppManager#handle, and consequently call RMAppMangager#submitApplication directly, though the code looks like scheduling an APP_SUBMIT event. In addition, the validation code before creating an RMApp instance is not well organized. Ideally, the dynamic validation, which depends on the RM's configuration, should be put in RMAppMangager#submitApplication. RMAppMangager#submitApplication is called by ClientRMService#submitApplication and RMAppMangager#recover. Since the configuration may be changed after RM restarts, the validation needs to be done again even in recovery mode. Therefore, resource request validation, which based on min/max resource limits, should be moved from ClientRMService#submitApplication to RMAppMangager#submitApplication. On the other hand, the static validation, which is independent of the RM's configuration should be put in ClientRMService#submitApplication, because it is only need to be done once during the first submission. Furthermore, try-catch flow in RMAppMangager#submitApplication has a flaw. RMAppMangager#submitApplication has a flaw is not synchronized. If two application submissions with the same application ID enter the function, and one progresses to the completion of RMApp instantiation, and the other progresses the completion of putting the RMApp instance into rmContext, the slower submission will cause an exception due to the duplicate application ID. However, the exception will cause the RMApp instance already in rmContext (belongs to the faster submission) being rejected with the current code flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-599) Refactoring submitApplication in ClientRMService and RMAppManager
[ https://issues.apache.org/jira/browse/YARN-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645300#comment-13645300 ] Hudson commented on YARN-599: - Integrated in Hadoop-trunk-Commit #3698 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3698/]) YARN-599. Refactoring submitApplication in ClientRMService and RMAppManager to separate out various validation checks depending on whether they rely on RM configuration or not. Contributed by Zhijie Shen. (Revision 1477478) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1477478 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManagerEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManagerSubmitEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java Refactoring submitApplication in ClientRMService and RMAppManager - Key: YARN-599 URL: https://issues.apache.org/jira/browse/YARN-599 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.0.5-beta Attachments: YARN-599.1.patch, YARN-599.2.patch Currently, ClientRMService#submitApplication call RMAppManager#handle, and consequently call RMAppMangager#submitApplication directly, though the code looks like scheduling an APP_SUBMIT event. In addition, the validation code before creating an RMApp instance is not well organized. Ideally, the dynamic validation, which depends on the RM's configuration, should be put in RMAppMangager#submitApplication. RMAppMangager#submitApplication is called by ClientRMService#submitApplication and RMAppMangager#recover. Since the configuration may be changed after RM restarts, the validation needs to be done again even in recovery mode. Therefore, resource request validation, which based on min/max resource limits, should be moved from ClientRMService#submitApplication to RMAppMangager#submitApplication. On the other hand, the static validation, which is independent of the RM's configuration should be put in ClientRMService#submitApplication, because it is only need to be done once during the first submission. Furthermore, try-catch flow in RMAppMangager#submitApplication has a flaw. RMAppMangager#submitApplication has a flaw is not synchronized. If two application submissions with the same application ID enter the function, and one progresses to the completion of RMApp instantiation, and the other progresses the completion of putting the RMApp instance into rmContext, the slower submission will cause an exception due to the duplicate application ID. However, the exception will cause the RMApp instance already in rmContext (belongs to the faster submission) being rejected with the current code flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira