[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221859#comment-14221859 ] Varun Saxena commented on YARN-2890: Oh. I saw it unassigned for several hours so assigned it to myself. You can assign it back to yourself, if you want. > MiniMRYarnCluster should turn on timeline service if configured to do so > > > Key: YARN-2890 > URL: https://issues.apache.org/jira/browse/YARN-2890 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Varun Saxena > Fix For: 2.6.1 > > > Currently the MiniMRYarnCluster does not consider the configuration value for > enabling timeline service before starting. The MiniYarnCluster should only > start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN
[ https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221783#comment-14221783 ] Karthik Kambatla commented on YARN-2139: Valid points, Bikas. [~ywskycn] and I will spend sometime and propose a design that would allow plugging in these multiple dimensions. > [Umbrella] Support for Disk as a Resource in YARN > -- > > Key: YARN-2139 > URL: https://issues.apache.org/jira/browse/YARN-2139 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wei Yan > Attachments: Disk_IO_Scheduling_Design_1.pdf, > Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, > YARN-2139-prototype.patch > > > YARN should consider disk as another resource for (1) scheduling tasks on > nodes, (2) isolation at runtime, (3) spindle locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221745#comment-14221745 ] Wangda Tan commented on YARN-2877: -- Thanks [~sriramsrao] for bringing up the great idea and [~kkaranasos]/[~curino]'s explanations. Definitely we need such mechanisms to have low-latency container launching to support millisec-level-latency tasks. Some questions about this, # Since the LocalRMs will be totally distributed, does it still possible to enforce capacity between queues? # Will such opportunistical containers come to view of the central RM (used to schedule CONSERVATIVE containers)? ## If yes, will the central RM can decide if a opportunistical container is valid or not (saying #containers excesses the app's limitation)? And will the preemption still works for opportunistical containers ## If no, should we have someone to coordinate such containers? # Will central scheduler state (maybe not completely, but important info like queue used resource, etc.) broadcast to distributed LocalRMs? I think it might be usaful for LocalRMs to decide which opportunistical container should go first. Thanks in advance! Wangda > Extend YARN to support distributed scheduling > - > > Key: YARN-2877 > URL: https://issues.apache.org/jira/browse/YARN-2877 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Sriram Rao > > This is an umbrella JIRA that proposes to extend YARN to support distributed > scheduling. Briefly, some of the motivations for distributed scheduling are > the following: > 1. Improve cluster utilization by opportunistically executing tasks otherwise > idle resources on individual machines. > 2. Reduce allocation latency. Tasks where the scheduling time dominates > (i.e., task execution time is much less compared to the time required for > obtaining a container from the RM). > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2881) Implement PlanFollower for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221698#comment-14221698 ] Subru Krishnan commented on YARN-2881: -- I meant YARN-2738 :). > Implement PlanFollower for FairScheduler > > > Key: YARN-2881 > URL: https://issues.apache.org/jira/browse/YARN-2881 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2881.prelim.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2881) Implement PlanFollower for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221695#comment-14221695 ] Subru Krishnan commented on YARN-2881: -- [~adhoot], thanks for the patch. It's good to see that majority of the code can be reused between both Fair & Capacity Scheduler. A few comments: * Are you assuming that parent queue names are unique in FS? * _run()_ need not be synchronized. I know this is from previous code but it would be good to clean it up since we are refactoring the code. * _getChildReservationQueues()_ could be implemented by the _AbstractSchedulerPlanFollower_ using _Queue::getQueueInfo_ ? * I think we can add a _getResourceCalculator_ to _YarnScheduler_ as it makes sense. Then we need not override _calculateTargetCapacity()_ and _isPlanResourcesLessThanReservations()_. * Minor: spurious white lines in imports of _CapacitySchedulerPlanFollower_ & _FairSchedulerPlanFollower_. We should be able to see reservation system running end2end with this patch in conjunction with YARN-2378. > Implement PlanFollower for FairScheduler > > > Key: YARN-2881 > URL: https://issues.apache.org/jira/browse/YARN-2881 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2881.prelim.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2669) FairScheduler: queue names shouldn't allow periods
[ https://issues.apache.org/jira/browse/YARN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221671#comment-14221671 ] Hadoop QA commented on YARN-2669: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682990/YARN-2669-5.patch against trunk revision 1e9a3f4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5904//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5904//console This message is automatically generated. > FairScheduler: queue names shouldn't allow periods > -- > > Key: YARN-2669 > URL: https://issues.apache.org/jira/browse/YARN-2669 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wei Yan >Assignee: Wei Yan > Fix For: 2.7.0 > > Attachments: YARN-2669-1.patch, YARN-2669-2.patch, YARN-2669-3.patch, > YARN-2669-4.patch, YARN-2669-5.patch > > > For an allocation file like: > {noformat} > > > 4096mb,4vcores > > > {noformat} > Users may wish to config minResources for a queue with full path "root.q1". > However, right now, fair scheduler will treat this configureation for the > queue with full name "root.root.q1". We need to print out a warning msg to > notify users about this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN
[ https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221669#comment-14221669 ] Wangda Tan commented on YARN-2139: -- Thanks [~bikassaha] and [~kasha], +1 for work on a branch, there might be some great amount of changes across all the major modules, frequently rebasing might be a issue if this is based on trunk. And also totally agree about having an abstract policy to wrap disk affinity / iops / bandwidth, etc. > [Umbrella] Support for Disk as a Resource in YARN > -- > > Key: YARN-2139 > URL: https://issues.apache.org/jira/browse/YARN-2139 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wei Yan > Attachments: Disk_IO_Scheduling_Design_1.pdf, > Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, > YARN-2139-prototype.patch > > > YARN should consider disk as another resource for (1) scheduling tasks on > nodes, (2) isolation at runtime, (3) spindle locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221662#comment-14221662 ] Gera Shegalov commented on YARN-2893: - Here is the stack trace: {code} Got exception: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at java.io.DataInputStream.readFully(DataInputStream.java:169) at org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:189) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.setupTokens(AMLauncher.java:225) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.createAMContainerLaunchContext(AMLauncher.java:196) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:107) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:250) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} Since the launch context is corrupt all subsequent max app attempts fail as well . This is a non-deterministic Heisenbug that does not reproduce on job re-submission. > AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream > -- > > Key: YARN-2893 > URL: https://issues.apache.org/jira/browse/YARN-2893 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Gera Shegalov > > MapReduce jobs on our clusters experience sporadic failures due to corrupt > tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2056) Disable preemption at Queue level
[ https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221659#comment-14221659 ] Wangda Tan commented on YARN-2056: -- Thanks [~jlowe]'s review. [~curino] wanna take a look? > Disable preemption at Queue level > - > > Key: YARN-2056 > URL: https://issues.apache.org/jira/browse/YARN-2056 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Mayank Bansal >Assignee: Eric Payne > Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, > YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, > YARN-2056.201409181916.txt, YARN-2056.201409210049.txt, > YARN-2056.201409232329.txt, YARN-2056.201409242210.txt, > YARN-2056.201410132225.txt, YARN-2056.201410141330.txt, > YARN-2056.201410232244.txt, YARN-2056.201410311746.txt, > YARN-2056.201411041635.txt, YARN-2056.201411072153.txt, > YARN-2056.201411122305.txt, YARN-2056.201411132215.txt, > YARN-2056.201411142002.txt > > > We need to be able to disable preemption at individual queue level -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
Gera Shegalov created YARN-2893: --- Summary: AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2669) FairScheduler: queue names shouldn't allow periods
[ https://issues.apache.org/jira/browse/YARN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221627#comment-14221627 ] Hudson commented on YARN-2669: -- FAILURE: Integrated in Hadoop-trunk-Commit #6589 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6589/]) YARN-2669. FairScheduler: queue names shouldn't allow periods (Wei Yan via Sandy Ryza) (sandy: rev a128cca305cecb215a2eef2ef543d1bf9b23a41b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueuePlacementPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/PeriodGroupsMapping.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java > FairScheduler: queue names shouldn't allow periods > -- > > Key: YARN-2669 > URL: https://issues.apache.org/jira/browse/YARN-2669 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wei Yan >Assignee: Wei Yan > Fix For: 2.7.0 > > Attachments: YARN-2669-1.patch, YARN-2669-2.patch, YARN-2669-3.patch, > YARN-2669-4.patch, YARN-2669-5.patch > > > For an allocation file like: > {noformat} > > > 4096mb,4vcores > > > {noformat} > Users may wish to config minResources for a queue with full path "root.q1". > However, right now, fair scheduler will treat this configureation for the > queue with full name "root.root.q1". We need to print out a warning msg to > notify users about this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2669) FairScheduler: queue names shouldn't allow periods
[ https://issues.apache.org/jira/browse/YARN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-2669: - Priority: Major (was: Minor) > FairScheduler: queue names shouldn't allow periods > -- > > Key: YARN-2669 > URL: https://issues.apache.org/jira/browse/YARN-2669 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2669-1.patch, YARN-2669-2.patch, YARN-2669-3.patch, > YARN-2669-4.patch, YARN-2669-5.patch > > > For an allocation file like: > {noformat} > > > 4096mb,4vcores > > > {noformat} > Users may wish to config minResources for a queue with full path "root.q1". > However, right now, fair scheduler will treat this configureation for the > queue with full name "root.root.q1". We need to print out a warning msg to > notify users about this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2669) FairScheduler: queue names shouldn't allow periods
[ https://issues.apache.org/jira/browse/YARN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-2669: - Summary: FairScheduler: queue names shouldn't allow periods (was: FairScheduler: queueName shouldn't allow periods the allocation.xml) > FairScheduler: queue names shouldn't allow periods > -- > > Key: YARN-2669 > URL: https://issues.apache.org/jira/browse/YARN-2669 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wei Yan >Assignee: Wei Yan >Priority: Minor > Attachments: YARN-2669-1.patch, YARN-2669-2.patch, YARN-2669-3.patch, > YARN-2669-4.patch, YARN-2669-5.patch > > > For an allocation file like: > {noformat} > > > 4096mb,4vcores > > > {noformat} > Users may wish to config minResources for a queue with full path "root.q1". > However, right now, fair scheduler will treat this configureation for the > queue with full name "root.root.q1". We need to print out a warning msg to > notify users about this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2669) FairScheduler: queueName shouldn't allow periods the allocation.xml
[ https://issues.apache.org/jira/browse/YARN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221604#comment-14221604 ] Sandy Ryza commented on YARN-2669: -- +1 > FairScheduler: queueName shouldn't allow periods the allocation.xml > --- > > Key: YARN-2669 > URL: https://issues.apache.org/jira/browse/YARN-2669 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wei Yan >Assignee: Wei Yan >Priority: Minor > Attachments: YARN-2669-1.patch, YARN-2669-2.patch, YARN-2669-3.patch, > YARN-2669-4.patch, YARN-2669-5.patch > > > For an allocation file like: > {noformat} > > > 4096mb,4vcores > > > {noformat} > Users may wish to config minResources for a queue with full path "root.q1". > However, right now, fair scheduler will treat this configureation for the > queue with full name "root.root.q1". We need to print out a warning msg to > notify users about this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN
[ https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221591#comment-14221591 ] Bikas Saha commented on YARN-2139: -- Given that this design and possible implementation might go through unstable rounds and are currently not abstracted enough in the core code, doing this on a branch seems prudent. Given that SSDs are becoming common, thinking of storage as only spinning disks may be limited. Multiple writers may affect each other more negatively on spinning disk vs SSDs. It may be useful to see if the consideration of storage could be abstracted into a plugin so that storage could have a different resource allocation policy by storage type (e.g. allocate/share by spindle for spinning disk storage vs allocate/share by iops on ssd storage vs allocate/share by network bandwidth for non-DAS storage). If we can abstract the policy into a plugin on trunk itself then perhaps we would not need a branch. Secondly, it will probably take a long time to agree on what a common policy should be and the consensus decision will probably not be a good fit for a large percentage of real clusters because of hardware variety. So making this a plugin would enable quicker development, trial and usage of disk based allocation compared to arriving at a grand unified allocation model for storage. > [Umbrella] Support for Disk as a Resource in YARN > -- > > Key: YARN-2139 > URL: https://issues.apache.org/jira/browse/YARN-2139 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wei Yan > Attachments: Disk_IO_Scheduling_Design_1.pdf, > Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, > YARN-2139-prototype.patch > > > YARN should consider disk as another resource for (1) scheduling tasks on > nodes, (2) isolation at runtime, (3) spindle locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2056) Disable preemption at Queue level
[ https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221583#comment-14221583 ] Jason Lowe commented on YARN-2056: -- I'm +1 on the latest patch as well. I'll commit this sometime early next week unless there are objections. > Disable preemption at Queue level > - > > Key: YARN-2056 > URL: https://issues.apache.org/jira/browse/YARN-2056 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Mayank Bansal >Assignee: Eric Payne > Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, > YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, > YARN-2056.201409181916.txt, YARN-2056.201409210049.txt, > YARN-2056.201409232329.txt, YARN-2056.201409242210.txt, > YARN-2056.201410132225.txt, YARN-2056.201410141330.txt, > YARN-2056.201410232244.txt, YARN-2056.201410311746.txt, > YARN-2056.201411041635.txt, YARN-2056.201411072153.txt, > YARN-2056.201411122305.txt, YARN-2056.201411132215.txt, > YARN-2056.201411142002.txt > > > We need to be able to disable preemption at individual queue level -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2669) FairScheduler: queueName shouldn't allow periods the allocation.xml
[ https://issues.apache.org/jira/browse/YARN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2669: -- Attachment: YARN-2669-5.patch Thanks, [~sandyr]. A new patch is updated. > FairScheduler: queueName shouldn't allow periods the allocation.xml > --- > > Key: YARN-2669 > URL: https://issues.apache.org/jira/browse/YARN-2669 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wei Yan >Assignee: Wei Yan >Priority: Minor > Attachments: YARN-2669-1.patch, YARN-2669-2.patch, YARN-2669-3.patch, > YARN-2669-4.patch, YARN-2669-5.patch > > > For an allocation file like: > {noformat} > > > 4096mb,4vcores > > > {noformat} > Users may wish to config minResources for a queue with full path "root.q1". > However, right now, fair scheduler will treat this configureation for the > queue with full name "root.root.q1". We need to print out a warning msg to > notify users about this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2669) FairScheduler: queueName shouldn't allow periods the allocation.xml
[ https://issues.apache.org/jira/browse/YARN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221573#comment-14221573 ] Hadoop QA commented on YARN-2669: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682965/YARN-2669-4.patch against trunk revision 23dacb3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5903//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5903//console This message is automatically generated. > FairScheduler: queueName shouldn't allow periods the allocation.xml > --- > > Key: YARN-2669 > URL: https://issues.apache.org/jira/browse/YARN-2669 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wei Yan >Assignee: Wei Yan >Priority: Minor > Attachments: YARN-2669-1.patch, YARN-2669-2.patch, YARN-2669-3.patch, > YARN-2669-4.patch > > > For an allocation file like: > {noformat} > > > 4096mb,4vcores > > > {noformat} > Users may wish to config minResources for a queue with full path "root.q1". > However, right now, fair scheduler will treat this configureation for the > queue with full name "root.root.q1". We need to print out a warning msg to > notify users about this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221571#comment-14221571 ] Wangda Tan commented on YARN-2801: -- And added this as sub task of YARN-2492. > Documentation development for Node labels requirment > > > Key: YARN-2801 > URL: https://issues.apache.org/jira/browse/YARN-2801 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Gururaj Shetty > > Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221568#comment-14221568 ] Wangda Tan commented on YARN-2801: -- [~gururaj], Thanks for volunteering to do this, but I've a WIP patch for this, would you mind me take over this task? Wangda > Documentation development for Node labels requirment > > > Key: YARN-2801 > URL: https://issues.apache.org/jira/browse/YARN-2801 > Project: Hadoop YARN > Issue Type: New Feature > Components: documentation >Reporter: Gururaj Shetty > > Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2801: - Issue Type: Sub-task (was: New Feature) Parent: YARN-2492 > Documentation development for Node labels requirment > > > Key: YARN-2801 > URL: https://issues.apache.org/jira/browse/YARN-2801 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Gururaj Shetty > > Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN
[ https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221561#comment-14221561 ] Karthik Kambatla commented on YARN-2139: [~leftnoteasy] - completely agree with both Arun and you on the spindle-locality-affinity front. The design doc hints at it, but doesn't cover it in as much detail as it should. I am all up for accomplishing that too here, I can work on fleshing out the locality-affinity pieces as we start getting the remaining parts in. I am considering starting the development on a feature-branch so we have a chance to change things before merging into trunk and branch-2. Are people okay with that? > [Umbrella] Support for Disk as a Resource in YARN > -- > > Key: YARN-2139 > URL: https://issues.apache.org/jira/browse/YARN-2139 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wei Yan > Attachments: Disk_IO_Scheduling_Design_1.pdf, > Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, > YARN-2139-prototype.patch > > > YARN should consider disk as another resource for (1) scheduling tasks on > nodes, (2) isolation at runtime, (3) spindle locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN
[ https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221556#comment-14221556 ] Wangda Tan commented on YARN-2139: -- Thanks [~ywskycn] for the design doc and prototype. I have similar feeling like what [~acmurthy] commented, the disk resource is a little different from vcore. CPU is a shared resource, processes/threads can occupy cpu cores and also can be easily switch to another cores. But disks is not, (in spite of RAID), if a process write to a file on local disk (like Kafka), you cannot switch the file being writing to another disk easily. And also, we need consider if there're multiple containers scheduled to a same physical disk, it is possible that the total bandwidth of these containers will drop very fast. So I think the scheduling for disks is more like *affinity* to disks (like give disk#1,#2,#4 to the container) instead of just limit number of processes on each node. Any thoughts? Please feel free to correct me if I was wrong. Thanks, Wangda > [Umbrella] Support for Disk as a Resource in YARN > -- > > Key: YARN-2139 > URL: https://issues.apache.org/jira/browse/YARN-2139 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wei Yan > Attachments: Disk_IO_Scheduling_Design_1.pdf, > Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, > YARN-2139-prototype.patch > > > YARN should consider disk as another resource for (1) scheduling tasks on > nodes, (2) isolation at runtime, (3) spindle locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2669) FairScheduler: queueName shouldn't allow periods the allocation.xml
[ https://issues.apache.org/jira/browse/YARN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221513#comment-14221513 ] Sandy Ryza commented on YARN-2669: -- This is looking good. A few comments. Can we add documentation for this behavior in FairScheduler.apt.vm? We should be doing the same conversion for group names, right? {code} + + " submitted by user " + user + " with an illegal queue name (" + + queueName + "). " {code} Nit: I think it's better not to surround the queue name with parentheses. {code} +return queueName + "." + convertUsername(user); {code} Can we call convertUsername something like cleanUsername to be a little more descriptive? > FairScheduler: queueName shouldn't allow periods the allocation.xml > --- > > Key: YARN-2669 > URL: https://issues.apache.org/jira/browse/YARN-2669 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wei Yan >Assignee: Wei Yan >Priority: Minor > Attachments: YARN-2669-1.patch, YARN-2669-2.patch, YARN-2669-3.patch, > YARN-2669-4.patch > > > For an allocation file like: > {noformat} > > > 4096mb,4vcores > > > {noformat} > Users may wish to config minResources for a queue with full path "root.q1". > However, right now, fair scheduler will treat this configureation for the > queue with full name "root.root.q1". We need to print out a warning msg to > notify users about this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2679) Add metric for container launch duration
[ https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221514#comment-14221514 ] Hudson commented on YARN-2679: -- FAILURE: Integrated in Hadoop-trunk-Commit #6587 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6587/]) YARN-2679. Add metric for container launch duration. (Zhihai Xu via kasha) (kasha: rev 233b61e495e136a843dabb7315bbb9ea37e7adce) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/metrics/TestNodeManagerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/metrics/NodeManagerMetrics.java > Add metric for container launch duration > > > Key: YARN-2679 > URL: https://issues.apache.org/jira/browse/YARN-2679 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Fix For: 2.7.0 > > Attachments: YARN-2679.000.patch, YARN-2679.001.patch, > YARN-2679.002.patch > > > add metrics in NodeManagerMetrics to get prepare time to launch container. > The prepare time is the duration between sending > ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving > ContainerEventType.CONTAINER_LAUNCHED event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.
[ https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221512#comment-14221512 ] Hadoop QA commented on YARN-2664: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682956/YARN-2664.5.patch against trunk revision 23dacb3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 5 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebAppFairScheduler org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5902//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5902//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5902//console This message is automatically generated. > Improve RM webapp to expose info about reservations. > > > Key: YARN-2664 > URL: https://issues.apache.org/jira/browse/YARN-2664 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Carlo Curino >Assignee: Matteo Mazzucchelli > Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, > YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, > YARN-2664.patch > > > YARN-1051 provides a new functionality in the RM to ask for reservation on > resources. Exposing this through the webapp GUI is important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.
[ https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221509#comment-14221509 ] Karthik Kambatla commented on YARN-2675: Looks good. [~vinodkv] - do you want to take a look as well? > the containersKilled metrics is not updated when the container is killed > during localization. > - > > Key: YARN-2675 > URL: https://issues.apache.org/jira/browse/YARN-2675 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2675.000.patch, YARN-2675.001.patch, > YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch > > > The containersKilled metrics is not updated when the container is killed > during localization. We should add KILLING state in finished of > ContainerImpl.java to update killedContainer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2892) Unable to get AMRMToken in unmanaged AM when using a secure cluster
[ https://issues.apache.org/jira/browse/YARN-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sevada Abraamyan updated YARN-2892: --- Description: An AMRMToken is retrieved from the ApplicationReport by the YarnClient. When the RM creates the ApplicationReport and sends it back to the client it makes a simple security check whether it should include the AMRMToken in the report (See createAndGetApplicationReport in RMAppImpl).This security check verifies that the user who submitted the original application is the same user who is requesting the ApplicationReport. If they are indeed the same user then it includes the AMRMToken, otherwise it does not include it. The problem arises from the fact that when an application is submitted, the RM saves the short username of the user who created the application (See submitApplication in ClientRmService). Afterwards when the ApplicationReport is requested, the system tries to match the full username of the requester against the previously stored short username. In a secure cluster using Kerberos this check fails because the principle is stripped from the username when we request a short username. So for example the short username might be "Foo" whereas the full username is "f...@company.com" Note: A very similar problem has been previously reported ([Yarn-2232|https://issues.apache.org/jira/browse/YARN-2232]) was: An AMRMToken is retrieved from the ApplicationReport by the YarnClient. When the RM creates the ApplicationReport and sends it back to the client it makes a simple security check whether it should include the AMRMToken in the report (See createAndGetApplicationReport in RMAppImpl).This security check verifies that the user who submitted the original application is the same user who is requesting the ApplicationReport. If they are indeed the same user then it includes the AMRMToken, otherwise it does not include it. The problem arises from the fact that when an application is submitted, the RM saves the short username of the user who created the application (See submitApplication in ClientRmService). Afterwards when the ApplicationReport is requested, the system tries to match the full username of the requester against the previously stored short username. In a secure cluster using Kerberos this check fails because the principle is stripped from the username when we request a short username. So for example the short username might be "Foo" whereas the full username is "f...@company.com" Note: A very similar problem has been previously reported in the past in [Yarn-2232|https://issues.apache.org/jira/browse/YARN-2232]. > Unable to get AMRMToken in unmanaged AM when using a secure cluster > --- > > Key: YARN-2892 > URL: https://issues.apache.org/jira/browse/YARN-2892 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Sevada Abraamyan > > An AMRMToken is retrieved from the ApplicationReport by the YarnClient. > When the RM creates the ApplicationReport and sends it back to the client it > makes a simple security check whether it should include the AMRMToken in the > report (See createAndGetApplicationReport in RMAppImpl).This security check > verifies that the user who submitted the original application is the same > user who is requesting the ApplicationReport. If they are indeed the same > user then it includes the AMRMToken, otherwise it does not include it. > The problem arises from the fact that when an application is submitted, the > RM saves the short username of the user who created the application (See > submitApplication in ClientRmService). Afterwards when the ApplicationReport > is requested, the system tries to match the full username of the requester > against the previously stored short username. > In a secure cluster using Kerberos this check fails because the principle is > stripped from the username when we request a short username. So for example > the short username might be "Foo" whereas the full username is > "f...@company.com" > Note: A very similar problem has been previously reported > ([Yarn-2232|https://issues.apache.org/jira/browse/YARN-2232]) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2892) Unable to get AMRMToken in unmanaged AM when using a secure cluster
Sevada Abraamyan created YARN-2892: -- Summary: Unable to get AMRMToken in unmanaged AM when using a secure cluster Key: YARN-2892 URL: https://issues.apache.org/jira/browse/YARN-2892 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Sevada Abraamyan An AMRMToken is retrieved from the ApplicationReport by the YarnClient. When the RM creates the ApplicationReport and sends it back to the client it makes a simple security check whether it should include the AMRMToken in the report (See createAndGetApplicationReport in RMAppImpl).This security check verifies that the user who submitted the original application is the same user who is requesting the ApplicationReport. If they are indeed the same user then it includes the AMRMToken, otherwise it does not include it. The problem arises from the fact that when an application is submitted, the RM saves the short username of the user who created the application (See submitApplication in ClientRmService). Afterwards when the ApplicationReport is requested, the system tries to match the full username of the requester against the previously stored short username. In a secure cluster using Kerberos this check fails because the principle is stripped from the username when we request a short username. So for example the short username might be "Foo" whereas the full username is "f...@company.com" Note: A very similar problem has been previously reported in the past in [Yarn-2232|https://issues.apache.org/jira/browse/YARN-2232]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2679) Add metric for container launch duration
[ https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2679: --- Summary: Add metric for container launch duration (was: add container launch prepare time metrics to NM.) > Add metric for container launch duration > > > Key: YARN-2679 > URL: https://issues.apache.org/jira/browse/YARN-2679 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2679.000.patch, YARN-2679.001.patch, > YARN-2679.002.patch > > > add metrics in NodeManagerMetrics to get prepare time to launch container. > The prepare time is the duration between sending > ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving > ContainerEventType.CONTAINER_LAUNCHED event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2669) FairScheduler: queueName shouldn't allow periods the allocation.xml
[ https://issues.apache.org/jira/browse/YARN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2669: -- Attachment: YARN-2669-4.patch Update a patch to handle the periods in the user's specified queue name. For queue name like ".A" or "A.", the scheduler will reject the job and print out a msg to the user. For queue name "A.B", it will be accepted. > FairScheduler: queueName shouldn't allow periods the allocation.xml > --- > > Key: YARN-2669 > URL: https://issues.apache.org/jira/browse/YARN-2669 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wei Yan >Assignee: Wei Yan >Priority: Minor > Attachments: YARN-2669-1.patch, YARN-2669-2.patch, YARN-2669-3.patch, > YARN-2669-4.patch > > > For an allocation file like: > {noformat} > > > 4096mb,4vcores > > > {noformat} > Users may wish to config minResources for a queue with full path "root.q1". > However, right now, fair scheduler will treat this configureation for the > queue with full name "root.root.q1". We need to print out a warning msg to > notify users about this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2765) Add leveldb-based implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221448#comment-14221448 ] Jason Lowe commented on YARN-2765: -- bq. Can't we do one "create if missing"? This is to distinguish between a state store that wasn't there (and thus needs to be created) vs. opening an empty, existing state store. We log different messages during startup so it's easy to distinguish between these cases. IMHO it's important to know when the state store wasn't there on startup and needed to be created. > Add leveldb-based implementation for RMStateStore > - > > Key: YARN-2765 > URL: https://issues.apache.org/jira/browse/YARN-2765 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-2765.patch, YARN-2765v2.patch > > > It would be nice to have a leveldb option to the resourcemanager recovery > store. Leveldb would provide some benefits over the existing filesystem store > such as better support for atomic operations, fewer I/O ops per state update, > and far fewer total files on the filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2664) Improve RM webapp to expose info about reservations.
[ https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Mazzucchelli updated YARN-2664: -- Attachment: YARN-2664.5.patch Hi Carlo, you are right. I have included the wrong library in the patch. > Improve RM webapp to expose info about reservations. > > > Key: YARN-2664 > URL: https://issues.apache.org/jira/browse/YARN-2664 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Carlo Curino >Assignee: Matteo Mazzucchelli > Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, > YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, > YARN-2664.patch > > > YARN-1051 provides a new functionality in the RM to ask for reservation on > resources. Exposing this through the webapp GUI is important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2679) add container launch prepare time metrics to NM.
[ https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221405#comment-14221405 ] Hadoop QA commented on YARN-2679: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682947/YARN-2679.002.patch against trunk revision 23dacb3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5901//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5901//console This message is automatically generated. > add container launch prepare time metrics to NM. > > > Key: YARN-2679 > URL: https://issues.apache.org/jira/browse/YARN-2679 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2679.000.patch, YARN-2679.001.patch, > YARN-2679.002.patch > > > add metrics in NodeManagerMetrics to get prepare time to launch container. > The prepare time is the duration between sending > ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving > ContainerEventType.CONTAINER_LAUNCHED event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2679) add container launch prepare time metrics to NM.
[ https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221382#comment-14221382 ] Karthik Kambatla commented on YARN-2679: +1, pending Jenkins. > add container launch prepare time metrics to NM. > > > Key: YARN-2679 > URL: https://issues.apache.org/jira/browse/YARN-2679 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2679.000.patch, YARN-2679.001.patch, > YARN-2679.002.patch > > > add metrics in NodeManagerMetrics to get prepare time to launch container. > The prepare time is the duration between sending > ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving > ContainerEventType.CONTAINER_LAUNCHED event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2679) add container launch prepare time metrics to NM.
[ https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221354#comment-14221354 ] zhihai xu commented on YARN-2679: - Uploaded new patch YARN-2679.002.patch to change the Metric description to "Container launch duration". > add container launch prepare time metrics to NM. > > > Key: YARN-2679 > URL: https://issues.apache.org/jira/browse/YARN-2679 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2679.000.patch, YARN-2679.001.patch, > YARN-2679.002.patch > > > add metrics in NodeManagerMetrics to get prepare time to launch container. > The prepare time is the duration between sending > ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving > ContainerEventType.CONTAINER_LAUNCHED event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2679) add container launch prepare time metrics to NM.
[ https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2679: Attachment: YARN-2679.002.patch > add container launch prepare time metrics to NM. > > > Key: YARN-2679 > URL: https://issues.apache.org/jira/browse/YARN-2679 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2679.000.patch, YARN-2679.001.patch, > YARN-2679.002.patch > > > add metrics in NodeManagerMetrics to get prepare time to launch container. > The prepare time is the duration between sending > ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving > ContainerEventType.CONTAINER_LAUNCHED event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221342#comment-14221342 ] Wangda Tan commented on YARN-2495: -- Hi [~Naganarasimha], Thanks for update, Several minor comments, 1) As same as ResourceTrackerService, it's better to have a field like isDecentralizedNodeLabelConfigurationEnabled (or some other name you like) in NodeStatusUpdaterImpl. Should be more clear than statement like {code} +if (nodeLabelsProvider!=null) { {code} 2) In ResourceTrackerService, The message: {code} String message = "NodeManager from node " + host + "(cmPort: " + cmPort + " httpPort: " + httpPort + ") " + "registered with capability: " + capability -+ ", assigned nodeId " + nodeId; ++ ", assigned nodeId " + nodeId + ", node labels { " ++ StringUtils.join(",", nodeLabels) + " } "; {code} Should add a check, only logging node labels message when replace is succeeded. Ideally you should have a StringBuilder do this 3) A style suggestion is, as convention, bi-opts like "=", "!=", "+", etc. should leave a space between and after it. I can see several occurrences in the patch. Wangda > Allow admin specify labels from each NM (Distributed configuration) > --- > > Key: YARN-2495 > URL: https://issues.apache.org/jira/browse/YARN-2495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, > YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, > YARN-2495.20141119-1.patch, YARN-2495_20141022.1.patch > > > Target of this JIRA is to allow admin specify labels in each NM, this covers > - User can set labels in each NM (by setting yarn-site.xml or using script > suggested by [~aw]) > - NM will send labels to RM via ResourceTracker API > - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2404) Remove ApplicationAttemptState and ApplicationState class in RMStateStore class
[ https://issues.apache.org/jira/browse/YARN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221334#comment-14221334 ] Jian He commented on YARN-2404: --- Tsuyoshi, thanks for updating the patch ! looks good overall, some minor comments - remove following in loadApplicationAttemptState {code} ApplicationAttemptId attemptId = ConverterUtils.toApplicationAttemptId(attemptIDStr); {code} - we may change the attemptTokens type to be Credentials. and do the convert from/to ByteBuffer inside the method, instead of the caller {code} public abstract ByteBuffer getAppAttemptTokens(); public abstract void setAppAttemptTokens(ByteBuffer attemptTokens); {code} - the following assert is always true {code} ApplicationId appId = appState.getApplicationSubmissionContext().getApplicationId(); // assert child node name is same as actual applicationId assert appId.equals( appState.getApplicationSubmissionContext().getApplicationId()); {code} - the credentials is not used. {code} Credentials credentials = null; if (attemptState.getAppAttemptTokens() != null) { credentials = new Credentials(); DataInputByteBuffer dibb = new DataInputByteBuffer(); dibb.reset(attemptState.getAppAttemptTokens()); credentials.readTokenStorageStream(dibb); } {code} > Remove ApplicationAttemptState and ApplicationState class in RMStateStore > class > > > Key: YARN-2404 > URL: https://issues.apache.org/jira/browse/YARN-2404 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2404.1.patch, YARN-2404.2.patch, YARN-2404.3.patch, > YARN-2404.4.patch, YARN-2404.5.patch, YARN-2404.6.patch > > > We can remove ApplicationState and ApplicationAttemptState class in > RMStateStore, given that we already have ApplicationStateData and > ApplicationAttemptStateData records. we may just replace ApplicationState > with ApplicationStateData, similarly for ApplicationAttemptState. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2891) Failed Container Executor does not provide a clear error message
[ https://issues.apache.org/jira/browse/YARN-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote updated YARN-2891: -- Issue Type: Improvement (was: Bug) > Failed Container Executor does not provide a clear error message > > > Key: YARN-2891 > URL: https://issues.apache.org/jira/browse/YARN-2891 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.5.1 > Environment: any >Reporter: Dustin Cote >Priority: Minor > > When checking access to directories, the container executor does not provide > clear information on which directory actually could not be accessed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2891) Failed Container Executor does not provide a clear error message
Dustin Cote created YARN-2891: - Summary: Failed Container Executor does not provide a clear error message Key: YARN-2891 URL: https://issues.apache.org/jira/browse/YARN-2891 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Environment: any Reporter: Dustin Cote Priority: Minor When checking access to directories, the container executor does not provide clear information on which directory actually could not be accessed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2765) Add leveldb-based implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221326#comment-14221326 ] Zhijie Shen commented on YARN-2765: --- bq. I knew rocksdb could be used as a cache of data that came from HDFS or could be backed-up to HDFS, but I didn't think it could read/write directly to it as part of normal operations. Hm... I should have wrongly understand the feature. Thanks for correction. One question about the patch: Why is it necessary to try create the DB with {{options.createIfMissing(false);}} and then {{options.createIfMissing(true);}} if it fails at first attempt? Can't we do one "create if missing"? > Add leveldb-based implementation for RMStateStore > - > > Key: YARN-2765 > URL: https://issues.apache.org/jira/browse/YARN-2765 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-2765.patch, YARN-2765v2.patch > > > It would be nice to have a leveldb option to the resourcemanager recovery > store. Leveldb would provide some benefits over the existing filesystem store > such as better support for atomic operations, fewer I/O ops per state update, > and far fewer total files on the filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM
[ https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221310#comment-14221310 ] Wangda Tan commented on YARN-2762: -- [~rohithsharma], Thanks for patch, I've updated the title a little bit to make it better describe what we want, Even though, with YARN-2843, all hosts/labels will be trimmed to make sure the correctness, but it is also good to have this checking in CLI side. Some suggestions: 1) Every labels should be trimmed before sending to RM 2) When there's no labels after trimmed, we should use the same error message as {code} else if ("-addToClusterNodeLabels".equals(cmd)) { if (i >= args.length) { System.err.println("No cluster node-labels are specified"); exitCode = -1; } else { exitCode = addToClusterNodeLabels(args[i]); } } {code} To make it consistent. 3) There's one error message is not correct {code} else if ("-replaceLabelsOnNode".equals(cmd)) { if (i >= args.length) { System.err.println("No cluster node-labels are specified"); exitCode = -1; } else { exitCode = replaceLabelsOnNodes(args[i]); } {code} It should be "no node-labels are specified when trying to replace labels on node" or something, I suggest you can address this together with your patch. Thanks, Wangda > RMAdminCLI node-labels-related args should be trimmed and checked before > sending to RM > -- > > Key: YARN-2762 > URL: https://issues.apache.org/jira/browse/YARN-2762 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Rohith >Assignee: Rohith >Priority: Minor > Attachments: YARN-2762.patch > > > All NodeLabel args validation's are done at server side. The same can be done > at RMAdminCLI so that unnecessary RPC calls can be avoided. > And for the input such as "x,y,,z,", no need to add empty string instead can > be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221309#comment-14221309 ] Sujeet Varakhedi commented on YARN-2877: + 1 for distributed scheduling and SQL engines for Hadoop can greatly benefit from it. We also need to look at a design we can give AMs more control over scheduling policies where RM just acts a source of overall cluster state, NM's have local queues and then based on NM queue wait times AM's can decide where to requests tasks. Similar to how Sparrow works. This kind of scheduling becomes important for services that need dedicated non-shared clusters like HBASE and HAWQ. > Extend YARN to support distributed scheduling > - > > Key: YARN-2877 > URL: https://issues.apache.org/jira/browse/YARN-2877 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Sriram Rao > > This is an umbrella JIRA that proposes to extend YARN to support distributed > scheduling. Briefly, some of the motivations for distributed scheduling are > the following: > 1. Improve cluster utilization by opportunistically executing tasks otherwise > idle resources on individual machines. > 2. Reduce allocation latency. Tasks where the scheduling time dominates > (i.e., task execution time is much less compared to the time required for > obtaining a container from the RM). > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM
[ https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2762: - Summary: RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM (was: Provide RMAdminCLI args validation for NodeLabelManager operations) > RMAdminCLI node-labels-related args should be trimmed and checked before > sending to RM > -- > > Key: YARN-2762 > URL: https://issues.apache.org/jira/browse/YARN-2762 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Rohith >Assignee: Rohith >Priority: Minor > Attachments: YARN-2762.patch > > > All NodeLabel args validation's are done at server side. The same can be done > at RMAdminCLI so that unnecessary RPC calls can be avoided. > And for the input such as "x,y,,z,", no need to add empty string instead can > be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2882) Introducing container types
[ https://issues.apache.org/jira/browse/YARN-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221288#comment-14221288 ] Konstantinos Karanasos commented on YARN-2882: -- [~Sujeet Varakhedi] The scope of pre-emption/killing is not only within an application. Whenever a guaranteed-start task arrives in an NM that cannot accommodate its execution due to running queueable tasks, it is allowed to pre-empt/kill one or more of those, even if they belong to another application. Clearly there can be policies that decide which of the running queueable tasks to pre-empt/kill (and one of them could be to avoid pre-empting/killing a task of another application, if there is a good reason for that). > Introducing container types > --- > > Key: YARN-2882 > URL: https://issues.apache.org/jira/browse/YARN-2882 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos > > This JIRA introduces the notion of container types. > We propose two initial types of containers: guaranteed-start and queueable > containers. > Guaranteed-start are the existing containers, which are allocated by the > central RM and are instantaneously started, once allocated. > Queueable is a new type of container, which allows containers to be queued in > the NM, thus their execution may be arbitrarily delayed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2727) In RMAdminCLI usage display, instead of "yarn.node-labels.fs-store.root-dir", "yarn.node-labels.fs-store.uri" is being displayed
[ https://issues.apache.org/jira/browse/YARN-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221281#comment-14221281 ] Wangda Tan commented on YARN-2727: -- [~Naganarasimha], I'll close this as duplicated, thanks for pointing out this issue. > In RMAdminCLI usage display, instead of "yarn.node-labels.fs-store.root-dir", > "yarn.node-labels.fs-store.uri" is being displayed > > > Key: YARN-2727 > URL: https://issues.apache.org/jira/browse/YARN-2727 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-2727.20141023.1.patch > > > In org.apache.hadoop.yarn.client.cli.RMAdminCLI usage display instead of > "yarn.node-labels.fs-store.root-dir", "yarn.node-labels.fs-store.uri" is > being used > And also some modifications for the description -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2880) Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled
[ https://issues.apache.org/jira/browse/YARN-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221271#comment-14221271 ] Wangda Tan commented on YARN-2880: -- [~rohithsharma], Thanks for taking up this JIRA, bq. IIUC, as of now recovery is not yet supported till YARN-2800 is committed. Not really, YARN-2800 is just to make a better user experience, recovery is already supported now. bq. Any document available? Not yet, documentation work is still in progress bq. How can I configure Nodelabels? Is it only rmadmin as of now? You can use rmadmin or REST API to configure node labels bq. I set labels to NM from rmadmin,but how do I make use of these labels? Before documentation available, you can take a look at testQueueParsing...Label... And also, you can take a look TestContainerAllocation#test..Labels, they're integration test in RM side. For end-to-end test, you can take a look at TestDistributedShellWithNodeLabels. Please let me know if you have any other questions. Thanks, Wangda > Add a test in TestRMRestart to make sure node labels will be recovered if it > is enabled > --- > > Key: YARN-2880 > URL: https://issues.apache.org/jira/browse/YARN-2880 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Rohith > > As suggested by [~ozawa], > [link|https://issues.apache.org/jira/browse/YARN-2800?focusedCommentId=14217569&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14217569]. > We should have a such test to make sure there will be no regression -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2517) Implement TimelineClientAsync
[ https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2517: -- Target Version/s: 2.7.0 > Implement TimelineClientAsync > - > > Key: YARN-2517 > URL: https://issues.apache.org/jira/browse/YARN-2517 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2517.1.patch, YARN-2517.2.patch > > > In some scenarios, we'd like to put timeline entities in another thread no to > block the current one. > It's good to have a TimelineClientAsync like AMRMClientAsync and > NMClientAsync. It can buffer entities, put them in a separate thread, and > have callback to handle the responses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2517) Implement TimelineClientAsync
[ https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221249#comment-14221249 ] Zhijie Shen commented on YARN-2517: --- I'm not sure Future is going to help the use case. Say I want to use TimelineClientAsync to do async put entity operations. With Future, putEntitiesAsync returns immediately. However, to know if my put entity operation is successful not, I still have to been blocked at Future#get(), or create a separate thread to wait for the response. But IMHO, one goal of TimelineClientAsync is to relieve users from multithreading details, such that Type (1) sounds better to me. Rethink whether we create a separate TimelineClientAsync or add async method in TimelineClient. We have putEntities and putDomain, and in the future we will have more get APIs. For now, the most concerned API is putEntities, as we don't want it to block the normal execution logic of an app. Maybe compromise now is to add putEntitiesAsync to TimelineClient. In the future, let's see if we want to have a separate TimelineClientAsync that contains a bunch of async APIs. Thoughts? > Implement TimelineClientAsync > - > > Key: YARN-2517 > URL: https://issues.apache.org/jira/browse/YARN-2517 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2517.1.patch, YARN-2517.2.patch > > > In some scenarios, we'd like to put timeline entities in another thread no to > block the current one. > It's good to have a TimelineClientAsync like AMRMClientAsync and > NMClientAsync. It can buffer entities, put them in a separate thread, and > have callback to handle the responses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node
[ https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221245#comment-14221245 ] Hudson commented on YARN-2604: -- FAILURE: Integrated in Hadoop-trunk-Commit #6585 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6585/]) YARN-2604. Scheduler should consider max-allocation-* in conjunction with the largest node. (Robert Kanter via kasha) (kasha: rev 3114d4731dcca7cb6c16aaa7c7a6550b7dd7dccb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java > Scheduler should consider max-allocation-* in conjunction with the largest > node > --- > > Key: YARN-2604 > URL: https://issues.apache.org/jira/browse/YARN-2604 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.5.1 >Reporter: Karthik Kambatla >Assignee: Robert Kanter > Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, > YARN-2604.patch, YARN-2604.patch, YARN-2604.patch > > > If the scheduler max-allocation-* values are larger than the resources > available on the largest node in the cluster, an application requesting > resources between the two values will be accepted by the scheduler but the > requests will never be satisfied. The app essentially hangs forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2882) Introducing container types
[ https://issues.apache.org/jira/browse/YARN-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221232#comment-14221232 ] Sujeet Varakhedi commented on YARN-2882: Is preemption idea only within the scope of an application? Can a guaranteed-start task result in preemption queued task of another application? > Introducing container types > --- > > Key: YARN-2882 > URL: https://issues.apache.org/jira/browse/YARN-2882 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos > > This JIRA introduces the notion of container types. > We propose two initial types of containers: guaranteed-start and queueable > containers. > Guaranteed-start are the existing containers, which are allocated by the > central RM and are instantaneously started, once allocated. > Queueable is a new type of container, which allows containers to be queued in > the NM, thus their execution may be arbitrarily delayed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2869) CapacityScheduler should trim sub queue names when parse configuration
[ https://issues.apache.org/jira/browse/YARN-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2869: - Attachment: YARN-2869-2.patch [~vinodkv], Thanks for review. The "mvn eclipse:eclipse" is not related. And added a test covers nested queue parsing needs trimming. > CapacityScheduler should trim sub queue names when parse configuration > -- > > Key: YARN-2869 > URL: https://issues.apache.org/jira/browse/YARN-2869 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2869-1.patch, YARN-2869-2.patch > > > Currently, capacity scheduler doesn't trim sub queue name when parsing queue > names, for example, the configuration > {code} > > > ...root.queues > a, b , c > > > ...root.b.capacity > 100 > > > ... > > {code} > Will fail with error: > {code} > java.lang.IllegalArgumentException: Illegal capacity of -1.0 for queue root. > a > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getCapacity(CapacitySchedulerConfiguration.java:332) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getCapacityFromConf(LeafQueue.java:196) > > {code} > It will try to find a queues with name " a", " b ", and " c", which is > apparently wrong, we should do trimming on these sub queue names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node
[ https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221211#comment-14221211 ] Karthik Kambatla commented on YARN-2604: Looks good. Thanks your patience through the reviews, Robert. +1, checking this in. > Scheduler should consider max-allocation-* in conjunction with the largest > node > --- > > Key: YARN-2604 > URL: https://issues.apache.org/jira/browse/YARN-2604 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.5.1 >Reporter: Karthik Kambatla >Assignee: Robert Kanter > Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, > YARN-2604.patch, YARN-2604.patch, YARN-2604.patch > > > If the scheduler max-allocation-* values are larger than the resources > available on the largest node in the cluster, an application requesting > resources between the two values will be accepted by the scheduler but the > requests will never be satisfied. The app essentially hangs forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221203#comment-14221203 ] Sriram Rao commented on YARN-2877: -- [~airbots] The number of AM's running on any machine is configurable/small---on the order of a few tens, and so the overhead on LocalRM should be negligible. > Extend YARN to support distributed scheduling > - > > Key: YARN-2877 > URL: https://issues.apache.org/jira/browse/YARN-2877 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Sriram Rao > > This is an umbrella JIRA that proposes to extend YARN to support distributed > scheduling. Briefly, some of the motivations for distributed scheduling are > the following: > 1. Improve cluster utilization by opportunistically executing tasks otherwise > idle resources on individual machines. > 2. Reduce allocation latency. Tasks where the scheduling time dominates > (i.e., task execution time is much less compared to the time required for > obtaining a container from the RM). > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221174#comment-14221174 ] Chen He commented on YARN-2877: --- This is a interesting idea. Distributed scheduling and global scheduling have their own pros and cons. For short, global scheduling can achieve optimal matching between tasks and resources but may have scalability problem when system becomes larger and larger. Distributed scheduling is scalable but may reach sub-optimal if there is no communication between those distributed schedulers. The LocalRM can reduce the RM's burden by doing communications to local AMs. It is a good idea. IMHO, the worker nodes become increasingly powerful and large (more mems and cores). Is that possible that the LocalRM affects NM's performance if there are many AMs running on a single server? > Extend YARN to support distributed scheduling > - > > Key: YARN-2877 > URL: https://issues.apache.org/jira/browse/YARN-2877 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Sriram Rao > > This is an umbrella JIRA that proposes to extend YARN to support distributed > scheduling. Briefly, some of the motivations for distributed scheduling are > the following: > 1. Improve cluster utilization by opportunistically executing tasks otherwise > idle resources on individual machines. > 2. Reduce allocation latency. Tasks where the scheduling time dominates > (i.e., task execution time is much less compared to the time required for > obtaining a container from the RM). > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node
[ https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221169#comment-14221169 ] Robert Kanter commented on YARN-2604: - My last comment should have said "vcores", not "scores" Apple _autocorrected_ it :) > Scheduler should consider max-allocation-* in conjunction with the largest > node > --- > > Key: YARN-2604 > URL: https://issues.apache.org/jira/browse/YARN-2604 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.5.1 >Reporter: Karthik Kambatla >Assignee: Robert Kanter > Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, > YARN-2604.patch, YARN-2604.patch, YARN-2604.patch > > > If the scheduler max-allocation-* values are larger than the resources > available on the largest node in the cluster, an application requesting > resources between the two values will be accepted by the scheduler but the > requests will never be satisfied. The app essentially hangs forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221052#comment-14221052 ] Hudson commented on YARN-2375: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #12 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/12/]) YARN-2375. Allow enabling/disabling timeline server per framework. (Mit Desai via jeagles) (jeagles: rev c298a9a845f89317eb9efad332e6657c56736a4d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java > Allow enabling/disabling timeline server per framework > -- > > Key: YARN-2375 > URL: https://issues.apache.org/jira/browse/YARN-2375 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Mit Desai > Fix For: 2.7.0, 2.6.1 > > Attachments: YARN-2375.1.patch, YARN-2375.patch, YARN-2375.patch, > YARN-2375.patch, YARN-2375.patch > > > This JIRA is to remove the ats enabled flag check within the > TimelineClientImpl. Example where this fails is below. > While running secure timeline server with ats flag set to disabled on > resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2087) YARN proxy doesn't relay verbs other than GET
[ https://issues.apache.org/jira/browse/YARN-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved YARN-2087. -- Resolution: Duplicate > YARN proxy doesn't relay verbs other than GET > - > > Key: YARN-2087 > URL: https://issues.apache.org/jira/browse/YARN-2087 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Steve Loughran > > the {{WebAppProxy}} class only proxies GET requests, REST verbs PUT, DELETE > and POST aren't handled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2031) YARN Proxy model doesn't support REST APIs in AMs
[ https://issues.apache.org/jira/browse/YARN-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2031: - Issue Type: Sub-task (was: Bug) Parent: YARN-2084 > YARN Proxy model doesn't support REST APIs in AMs > - > > Key: YARN-2031 > URL: https://issues.apache.org/jira/browse/YARN-2031 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > > AMs can't support REST APIs because > # the AM filter redirects all requests to the proxy with a 302 response (not > 307) > # the proxy doesn't forward PUT/POST/DELETE verbs > Either the AM filter needs to return 307 and the proxy to forward the verbs, > or Am filter should not filter a REST bit of the web site -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2087) YARN proxy doesn't relay verbs other than GET
[ https://issues.apache.org/jira/browse/YARN-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221037#comment-14221037 ] Steve Loughran commented on YARN-2087: -- Filed (and forgotten about) as YARN-2031 > YARN proxy doesn't relay verbs other than GET > - > > Key: YARN-2087 > URL: https://issues.apache.org/jira/browse/YARN-2087 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Steve Loughran > > the {{WebAppProxy}} class only proxies GET requests, REST verbs PUT, DELETE > and POST aren't handled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2031) YARN Proxy model doesn't support REST APIs in AMs
[ https://issues.apache.org/jira/browse/YARN-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran reassigned YARN-2031: Assignee: Steve Loughran > YARN Proxy model doesn't support REST APIs in AMs > - > > Key: YARN-2031 > URL: https://issues.apache.org/jira/browse/YARN-2031 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > > AMs can't support REST APIs because > # the AM filter redirects all requests to the proxy with a 302 response (not > 307) > # the proxy doesn't forward PUT/POST/DELETE verbs > Either the AM filter needs to return 307 and the proxy to forward the verbs, > or Am filter should not filter a REST bit of the web site -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221031#comment-14221031 ] Hudson commented on YARN-2375: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1964 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1964/]) YARN-2375. Allow enabling/disabling timeline server per framework. (Mit Desai via jeagles) (jeagles: rev c298a9a845f89317eb9efad332e6657c56736a4d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java > Allow enabling/disabling timeline server per framework > -- > > Key: YARN-2375 > URL: https://issues.apache.org/jira/browse/YARN-2375 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Mit Desai > Fix For: 2.7.0, 2.6.1 > > Attachments: YARN-2375.1.patch, YARN-2375.patch, YARN-2375.patch, > YARN-2375.patch, YARN-2375.patch > > > This JIRA is to remove the ats enabled flag check within the > TimelineClientImpl. Example where this fails is below. > While running secure timeline server with ats flag set to disabled on > resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2517) Implement TimelineClientAsync
[ https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221027#comment-14221027 ] Mit Desai commented on YARN-2517: - Yes I have been working on timeline service recently. I will take a look. > Implement TimelineClientAsync > - > > Key: YARN-2517 > URL: https://issues.apache.org/jira/browse/YARN-2517 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2517.1.patch, YARN-2517.2.patch > > > In some scenarios, we'd like to put timeline entities in another thread no to > block the current one. > It's good to have a TimelineClientAsync like AMRMClientAsync and > NMClientAsync. It can buffer entities, put them in a separate thread, and > have callback to handle the responses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221025#comment-14221025 ] Konstantinos Karanasos commented on YARN-2884: -- [~kasha], [~curino], [~subru], given that this proxy/agent will only focus on the AM-RM communication, we may also explicitly call it AMRMProxy or AMRMAgent (following the naming convention of the already existing AMRMClient* classes). [~djp] I just added a comment in the umbrella JIRA (YARN-2877), trying to give some more details. We are not proposing to substitute all scheduling decisions with distributed ones. The guaranteed-start containers will continue to be scheduled by the central RM. However, the queueable ones will be scheduled in a distributed fashion. The first candidate for queueable containers is the short-running tasks, in which the overhead of contacting the central RM is a significant part of the overall task execution time. Scheduling these requests without contacting the central RM will reduce their latency, increase the utilization of the cluster (no idle resources waiting to contact the RM), while it will offload the central RM (which is good for scaling in big clusters). > Proxying all AM-RM communications > - > > Key: YARN-2884 > URL: https://issues.apache.org/jira/browse/YARN-2884 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221012#comment-14221012 ] Konstantinos Karanasos commented on YARN-2877: -- Adding some more details, now that we have added the first sub-tasks. In YARN-2882 we introduce two *types of containers*: guaranteed-start and queueable. The former are the ones existing in YARN today (are allocated from the central RM, and once allocated, are guaranteed to start). The latter make it possible to queue container requests in the NMs and will be used for distributed scheduling. The *queuing of (queueable) container requests* in the NMs is proposed in YARN-2883. Each NM will now also have a *LocalRM* (Local ResourceManager) that will receive all container requests from the AMs running on the same machine: - For the guaranteed-start container requests, the LocalRM acts as a proxy (YARN-2884), forwarding them to the central RM. - For the queueable container requests, the LocalRM is responsible for sending them directly to the NM queues (bypassing the central RM). Deciding the NMs where these requests are queued is based on the estimated waiting time in the NM queues, as discussed in YARN-2886. Based on some policy (YARN-2887), each AM will determine *what type of containers to ask*: only guaranteed-start, only queueable, or a mix thereof. For instance, an AM may request guaranteed-start containers for its tasks that are expected to be long-running, whereas it may ask for queueable containers for its short tasks (in which the back-and-forth with the central RM may be longer than the task execution time). This way we reduce the scheduling latency, while increasing the utilization of the cluster (if we had to go to the central RM for all these short tasks, some resources of the cluster might remain idle in the meanwhile). To ensure the NM queues remain balanced, we propose *corrective mechanisms for NM queue rebalancing* in YARN-2888. Moreover, to ensure no AM is abusing the system by asking too many queueable containers, we can impose a limit in the *number of queueable containers* that each AM can receive (YARN-2889). > Extend YARN to support distributed scheduling > - > > Key: YARN-2877 > URL: https://issues.apache.org/jira/browse/YARN-2877 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Sriram Rao > > This is an umbrella JIRA that proposes to extend YARN to support distributed > scheduling. Briefly, some of the motivations for distributed scheduling are > the following: > 1. Improve cluster utilization by opportunistically executing tasks otherwise > idle resources on individual machines. > 2. Reduce allocation latency. Tasks where the scheduling time dominates > (i.e., task execution time is much less compared to the time required for > obtaining a container from the RM). > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-2375: Fix Version/s: 2.6.1 > Allow enabling/disabling timeline server per framework > -- > > Key: YARN-2375 > URL: https://issues.apache.org/jira/browse/YARN-2375 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Mit Desai > Fix For: 2.7.0, 2.6.1 > > Attachments: YARN-2375.1.patch, YARN-2375.patch, YARN-2375.patch, > YARN-2375.patch, YARN-2375.patch > > > This JIRA is to remove the ats enabled flag check within the > TimelineClientImpl. Example where this fails is below. > While running secure timeline server with ats flag set to disabled on > resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220995#comment-14220995 ] Mit Desai commented on YARN-2375: - Thanks for the quick reviews [~jeagles] and [~zjshen]. > Allow enabling/disabling timeline server per framework > -- > > Key: YARN-2375 > URL: https://issues.apache.org/jira/browse/YARN-2375 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Mit Desai > Fix For: 2.7.0 > > Attachments: YARN-2375.1.patch, YARN-2375.patch, YARN-2375.patch, > YARN-2375.patch, YARN-2375.patch > > > This JIRA is to remove the ats enabled flag check within the > TimelineClientImpl. Example where this fails is below. > While running secure timeline server with ats flag set to disabled on > resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220971#comment-14220971 ] Mit Desai commented on YARN-2890: - Forgot to assign it to me. :P Thats fine. You can carry on. > MiniMRYarnCluster should turn on timeline service if configured to do so > > > Key: YARN-2890 > URL: https://issues.apache.org/jira/browse/YARN-2890 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Varun Saxena > Fix For: 2.6.1 > > > Currently the MiniMRYarnCluster does not consider the configuration value for > enabling timeline service before starting. The MiniYarnCluster should only > start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220970#comment-14220970 ] Mit Desai commented on YARN-2890: - [~varun_saxena] I was already working on the issue. > MiniMRYarnCluster should turn on timeline service if configured to do so > > > Key: YARN-2890 > URL: https://issues.apache.org/jira/browse/YARN-2890 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Varun Saxena > Fix For: 2.6.1 > > > Currently the MiniMRYarnCluster does not consider the configuration value for > enabling timeline service before starting. The MiniYarnCluster should only > start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220965#comment-14220965 ] Hudson commented on YARN-2375: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1940 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1940/]) YARN-2375. Allow enabling/disabling timeline server per framework. (Mit Desai via jeagles) (jeagles: rev c298a9a845f89317eb9efad332e6657c56736a4d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java * hadoop-yarn-project/CHANGES.txt > Allow enabling/disabling timeline server per framework > -- > > Key: YARN-2375 > URL: https://issues.apache.org/jira/browse/YARN-2375 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Mit Desai > Fix For: 2.7.0 > > Attachments: YARN-2375.1.patch, YARN-2375.patch, YARN-2375.patch, > YARN-2375.patch, YARN-2375.patch > > > This JIRA is to remove the ats enabled flag check within the > TimelineClientImpl. Example where this fails is below. > While running secure timeline server with ats flag set to disabled on > resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220940#comment-14220940 ] Hudson commented on YARN-2375: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #12 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/12/]) YARN-2375. Allow enabling/disabling timeline server per framework. (Mit Desai via jeagles) (jeagles: rev c298a9a845f89317eb9efad332e6657c56736a4d) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java > Allow enabling/disabling timeline server per framework > -- > > Key: YARN-2375 > URL: https://issues.apache.org/jira/browse/YARN-2375 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Mit Desai > Fix For: 2.7.0 > > Attachments: YARN-2375.1.patch, YARN-2375.patch, YARN-2375.patch, > YARN-2375.patch, YARN-2375.patch > > > This JIRA is to remove the ats enabled flag check within the > TimelineClientImpl. Example where this fails is below. > While running secure timeline server with ats flag set to disabled on > resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220818#comment-14220818 ] Hudson commented on YARN-2375: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #750 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/750/]) YARN-2375. Allow enabling/disabling timeline server per framework. (Mit Desai via jeagles) (jeagles: rev c298a9a845f89317eb9efad332e6657c56736a4d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java > Allow enabling/disabling timeline server per framework > -- > > Key: YARN-2375 > URL: https://issues.apache.org/jira/browse/YARN-2375 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Mit Desai > Fix For: 2.7.0 > > Attachments: YARN-2375.1.patch, YARN-2375.patch, YARN-2375.patch, > YARN-2375.patch, YARN-2375.patch > > > This JIRA is to remove the ats enabled flag check within the > TimelineClientImpl. Example where this fails is below. > While running secure timeline server with ats flag set to disabled on > resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220807#comment-14220807 ] Hudson commented on YARN-2375: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #12 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/12/]) YARN-2375. Allow enabling/disabling timeline server per framework. (Mit Desai via jeagles) (jeagles: rev c298a9a845f89317eb9efad332e6657c56736a4d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/CHANGES.txt > Allow enabling/disabling timeline server per framework > -- > > Key: YARN-2375 > URL: https://issues.apache.org/jira/browse/YARN-2375 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Mit Desai > Fix For: 2.7.0 > > Attachments: YARN-2375.1.patch, YARN-2375.patch, YARN-2375.patch, > YARN-2375.patch, YARN-2375.patch > > > This JIRA is to remove the ats enabled flag check within the > TimelineClientImpl. Example where this fails is below. > While running secure timeline server with ats flag set to disabled on > resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220784#comment-14220784 ] Junping Du commented on YARN-2637: -- Hi [~cwelch], thanks for your patch update. Could you please check the failed tests are related to your latest patch? Thanks! > maximum-am-resource-percent could be violated when resource of AM is > > minimumAllocation > > > Key: YARN-2637 > URL: https://issues.apache.org/jira/browse/YARN-2637 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Wangda Tan >Assignee: Craig Welch >Priority: Critical > Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.2.patch, > YARN-2637.6.patch > > > Currently, number of AM in leaf queue will be calculated in following way: > {code} > max_am_resource = queue_max_capacity * maximum_am_resource_percent > #max_am_number = max_am_resource / minimum_allocation > #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor > {code} > And when submit new application to RM, it will check if an app can be > activated in following way: > {code} > for (Iterator i=pendingApplications.iterator(); > i.hasNext(); ) { > FiCaSchedulerApp application = i.next(); > > // Check queue limit > if (getNumActiveApplications() >= getMaximumActiveApplications()) { > break; > } > > // Check user limit > User user = getUser(application.getUser()); > if (user.getActiveApplications() < > getMaximumActiveApplicationsPerUser()) { > user.activateApplication(); > activeApplications.add(application); > i.remove(); > LOG.info("Application " + application.getApplicationId() + > " from user: " + application.getUser() + > " activated in queue: " + getQueueName()); > } > } > {code} > An example is, > If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum > resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be > launched is 200, and if user uses 5M for each AM (> minimum_allocation). All > apps can still be activated, and it will occupy all resource of a queue > instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220780#comment-14220780 ] Junping Du commented on YARN-2884: -- I don't think the name matter too much ... IMO, this sounds like a complicated effort. Before we go ahead, may be we should have analysis on the motivation towards "distributed scheduling decisions"? - What we could gain there and what we could lost in potential? > Proxying all AM-RM communications > - > > Key: YARN-2884 > URL: https://issues.apache.org/jira/browse/YARN-2884 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220663#comment-14220663 ] Hadoop QA commented on YARN-2637: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682825/YARN-2637.6.patch against trunk revision c298a9a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId org.apache.hadoop.yarn.server.resourcemanager.TestResourceManager org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacitySchedulerPlanFollower org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestRM The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5900//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5900//console This message is automatically generated. > maximum-am-resource-percent could be violated when resource of AM is > > minimumAllocation > > > Key: YARN-2637 > URL: https://issues.apache.org/jira/browse/YARN-2637 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Wangda Tan >Assignee: Craig Welch >Priority: Critical > Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.2.patch, > YARN-2637.6.patch > > > Currently, number of AM in leaf queue will be calculated in following way: > {code} > max_am_resource = queue_max_capacity * maximum_am_resource_percent > #max_am_number = max_am_resource / minimum_allocation > #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor > {code} > And when submit new application to RM, it will check if an app can be > activated in following way: > {code} > for (Iterator i=pendingApplications.iterator(); > i.hasNext(); ) { > FiCaSchedulerApp application = i.next(); > > // Check queue limit > if (getNumActiveApplications() >= getMaximumActiveApplications()) { > break; > } > > // Check user limit > User user = getUser(application.getUser()); > if (user.getActiveApplications() < > getMaximumActiveApplicationsPerUser()) { > user.activateApplication(); > activeApplications.add(application); > i.remove(); > LOG.info(
[jira] [Assigned] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-2890: -- Assignee: Varun Saxena > MiniMRYarnCluster should turn on timeline service if configured to do so > > > Key: YARN-2890 > URL: https://issues.apache.org/jira/browse/YARN-2890 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Varun Saxena > Fix For: 2.6.1 > > > Currently the MiniMRYarnCluster does not consider the configuration value for > enabling timeline service before starting. The MiniYarnCluster should only > start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2188) Client service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220639#comment-14220639 ] Chris Trezzo commented on YARN-2188: Thanks for the comments! I will post an updated patch. > Client service for cache manager > > > Key: YARN-2188 > URL: https://issues.apache.org/jira/browse/YARN-2188 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-2188-trunk-v1.patch, YARN-2188-trunk-v2.patch, > YARN-2188-trunk-v3.patch, YARN-2188-trunk-v4.patch > > > Implement the client service for the shared cache manager. This service is > responsible for handling client requests to use and release resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)