[jira] [Created] (YARN-3163) admin support for YarnAuthorizationProvider
Sunil G created YARN-3163: - Summary: admin support for YarnAuthorizationProvider Key: YARN-3163 URL: https://issues.apache.org/jira/browse/YARN-3163 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Runtime configuration support for YarnAuthorizationProvider. Using admin commands, one should be able to set and get permission from the YarnAuthorizationProvider. This mechanism will help users without updating config files and firing reload commands. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268756#comment-14268756 ] Sunil G commented on YARN-2933: --- Hi [~mayank_bansal] and [~wangda] This is a very needed implementation w.r.t node labels in preemption scenario. However I have a concern, please discard if this is been considered already. An application's(if not specified any labels during submission time) containers, may fall in to nodes where it can be labelled or not labelled. Am I correct? if so, with this implementation, preemption will always happen to those containers which are running in a non-labelled node. This may not be accurate. So is it possible to do preemption only for applications which are submitted without any node labels? -Sunil Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch, YARN-2933-5.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269868#comment-14269868 ] Sunil G commented on YARN-2933: --- Hi [~mayank_bansal] Thank you for the clarification. I have one more small nit in a test case {code} if(setAMContainer i == 0){ cLive.add(mockContainer(appAttId, cAlloc, unit, 0)); }else if(setLabeledContainer i ==1){ cLive.add(mockContainer(appAttId, cAlloc, unit, 2)); } else{ cLive.add(mockContainer(appAttId, cAlloc, unit, 1)); } {code} For *mockContainer*, last parameter is integer. And it represents 0 for AM container, 1 for normal container, and 2 for labelled container. Could we make it with a macro and more generic. So in future, it will be easy to add a type of container and for readability. We can have an array of different container types, and it can added as needed later. This array index can be used with Enum to create a mock container. Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3016) (Refactoring) Merge internalAdd/Remove/ReplaceLabels to one method in CommonNodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268736#comment-14268736 ] Sunil G commented on YARN-3016: --- HI [~wangda] Thanks for bringing up this. I have a doubt on this. Do you mean similar methods in CommonNodeLabelsManager and RMNodeLabelsManager ? (Refactoring) Merge internalAdd/Remove/ReplaceLabels to one method in CommonNodeLabelsManager - Key: YARN-3016 URL: https://issues.apache.org/jira/browse/YARN-3016 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Now we have separated but similar implementations for add/remove/replace labels on node in CommonNodeLabelsManager, we should merge it to a single one for easier modify them and better readability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2896) Server side PB changes for Priority Label Manager and Admin CLI support
[ https://issues.apache.org/jira/browse/YARN-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2896: -- Attachment: 0002-YARN-2896.patch Uploading common PB changes as a single patch This covers all the PB related changes in ResourceManager level. Also added tests in PBImplRecords I will upload a prototype patch in parent JIRA, and will upload subjira patches. As per comments, this can be made into production quality. Server side PB changes for Priority Label Manager and Admin CLI support --- Key: YARN-2896 URL: https://issues.apache.org/jira/browse/YARN-2896 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2896.patch, 0002-YARN-2896.patch Common changes: * PB support changes required for Admin APIs * PB support for File System store (Priority Label Store) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G reassigned YARN-3226: - Assignee: Sunil G UI changes for decommissioning node --- Key: YARN-3226 URL: https://issues.apache.org/jira/browse/YARN-3226 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Sunil G Some initial thought is: decommissioning nodes should still show up in the active nodes list since they are still running containers. A separate decommissioning tab to filter for those nodes would be nice, although I suppose users can also just use the jquery table to sort/search for nodes in that state from the active nodes list if it's too crowded to add yet another node state tab (or maybe get rid of some effectively dead tabs like the reboot state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3224) Notify AM with containers (on decommissioning node) could be preempted after timeout.
[ https://issues.apache.org/jira/browse/YARN-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G reassigned YARN-3224: - Assignee: Sunil G Notify AM with containers (on decommissioning node) could be preempted after timeout. - Key: YARN-3224 URL: https://issues.apache.org/jira/browse/YARN-3224 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Sunil G -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328437#comment-14328437 ] Sunil G commented on YARN-3225: --- Another point is, suppose if we fire same command with different time units immediately. And if first timeout is still ongoing, do we need to update timeout? New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Devaraj K New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367477#comment-14367477 ] Sunil G commented on YARN-2693: --- Thank you [~wangda] for sharing comments. As we move in the queue specific config inside scheduler.Queue, are we also taking ACLs back to scheduler (ACL wrt priority). Its better to control ACL from outside YarnAuthorizer and config only can be kept w.r.t scheduler. Pls share your thoughts. Regarding methods in ApplicationPrioirtyManager, it looks overall fine but I suggest we may need * getClusterApplicationPriorities (if its range, that can be sent back) Priority Label Manager in RM to manage application priority based on configuration -- Key: YARN-2693 URL: https://issues.apache.org/jira/browse/YARN-2693 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch, 0006-YARN-2693.patch Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * Expose interface to RM to validate priority label TO have simplified interface, Priority Manager will support only configuration file in contrast with admin cli and REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365604#comment-14365604 ] Sunil G commented on YARN-3136: --- Thank you [~jlowe] for pointing out. I will fix and upload a new patch. getTransferredContainers can be a bottleneck during AM registration --- Key: YARN-3136 URL: https://issues.apache.org/jira/browse/YARN-3136 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Sunil G Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, 0006-YARN-3136.patch While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2003) Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side]
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367465#comment-14367465 ] Sunil G commented on YARN-2003: --- Thank you [~leftnoteasy] for sharing the comments Yes, YARN-2003 will focus on RM related changes excluding changes from Scheduler. I will rearrange code as per same and update. Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side] -- Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2003.patch, 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365408#comment-14365408 ] Sunil G commented on YARN-1963: --- HI [~vinodkv] , [~leftnoteasy] , [~eepayne] [~jlowe] Using a priority as integer itself to scheduler's will be the first target, A manager which can act as a single point of contact which can translate from label to integer and vice versa. Yes, this will be an added complexity in RM, but if it can be taken out of Scheduler it reduces much of manipulation logic. {noformat} alias 0:VERY_LOW, 1:LOW, 2:NORMAL, 3:HIGH, 4:VERY_HIGH /alias {noformat} I feel we can make such label config in a common place which can be accessible for any schedulers. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: 0001-YARN-1963-prototype.patch, YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 0006-YARN-3136.patch getTransferredContainers can be a bottleneck during AM registration --- Key: YARN-3136 URL: https://issues.apache.org/jira/browse/YARN-3136 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Sunil G Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, 0006-YARN-3136.patch While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353143#comment-14353143 ] Sunil G commented on YARN-3136: --- bq.createReleaseCache schedules a timer task that Sorry. I also missed that. Agreeing to make 'applications' as a concurrent map. As its private and unstable, its fine to make its concurrent. But any schedulers which uses this will have to make change. Do we need to document that? Also attaching a patch for same. getTransferredContainers can be a bottleneck during AM registration --- Key: YARN-3136 URL: https://issues.apache.org/jira/browse/YARN-3136 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Sunil G Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357241#comment-14357241 ] Sunil G commented on YARN-1963: --- Thank you [~vinodkv] and [~nroberts] for the comments. Considering usability ways, labels will be handy. And scheduler must be agnostic of labels and should handle only integers like in linux. This will have a complexity on priority manager inside RM which will translate label - integer an vice versa. But a call can be taken by seeing all possibilities and can be standardized the same so that a minimal working version can be pushed in by improvising on the patches submitted (working prototype was attached). Hoping [~leftnoteasy] and [~eepayne] to join the discussion. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: 0001-YARN-1963-prototype.patch, YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350165#comment-14350165 ] Sunil G commented on YARN-3136: --- Thank you [~jlowe] and [~jianhe] Comments are added and also few other occurrences *applications.get* is replaced with the new api. Fair/Fifo schedulers also now overloaded with the new method. However as Jian mentioned, we need not have to protect createReleaseCache as its only used locally in serviceInit. I also feel the way as Jian, to make AbstractYarnScheduler as Private and Unstable. There are no much chances of compatibility issues Uploaded a patch addressing these points. getTransferredContainers can be a bottleneck during AM registration --- Key: YARN-3136 URL: https://issues.apache.org/jira/browse/YARN-3136 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Sunil G Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 0005-YARN-3136.patch getTransferredContainers can be a bottleneck during AM registration --- Key: YARN-3136 URL: https://issues.apache.org/jira/browse/YARN-3136 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Sunil G Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359782#comment-14359782 ] Sunil G commented on YARN-3136: --- Thank u Jian. All failed cases are passing locally and these are unrelated. getTransferredContainers can be a bottleneck during AM registration --- Key: YARN-3136 URL: https://issues.apache.org/jira/browse/YARN-3136 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Sunil G Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, 0006-YARN-3136.patch While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 0009-YARN-3136.patch Hi [~jlowe] and [~jianhe] I used ConcurrentMap for 'applications'. But findbugs warnings are coming for non-synchronized access on this map. Hope that is acceptable, pls share your opinion. getTransferredContainers can be a bottleneck during AM registration --- Key: YARN-3136 URL: https://issues.apache.org/jira/browse/YARN-3136 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Sunil G Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, 0006-YARN-3136.patch, 0007-YARN-3136.patch, 0008-YARN-3136.patch, 0009-YARN-3136.patch While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2003) Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side]
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2003: -- Attachment: 0005-YARN-2003.patch Uploading a patch as per the comments. [~leftnoteasy] kindly check. Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side] -- Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2003.patch, 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 0005-YARN-2003.patch AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387114#comment-14387114 ] Sunil G commented on YARN-3136: --- Hi [~jlowe] [~jianhe] {noformat} Bug type IS2_INCONSISTENT_SYNC (click for details) In class org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler Field org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.applications Synchronized 90% of the time Unsynchronized access at AbstractYarnScheduler.java:[line 138] Unsynchronized access at AbstractYarnScheduler.java:[line 165] Unsynchronized access at AbstractYarnScheduler.java:[line 233] {noformat} As applications is now a concurrent version, I feel we do not need a lock. Kindly share your opinion. test case failure is not related. getTransferredContainers can be a bottleneck during AM registration --- Key: YARN-3136 URL: https://issues.apache.org/jira/browse/YARN-3136 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Sunil G Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, 0006-YARN-3136.patch, 0007-YARN-3136.patch, 0008-YARN-3136.patch, 0009-YARN-3136.patch While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 0007-YARN-3136.patch Uploading patch to check findbugs warnings. getTransferredContainers can be a bottleneck during AM registration --- Key: YARN-3136 URL: https://issues.apache.org/jira/browse/YARN-3136 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Sunil G Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, 0006-YARN-3136.patch, 0007-YARN-3136.patch While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-1963: -- Attachment: 0001-YARN-1963-prototype.patch Uploading a prototype version based on configuration file. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: 0001-YARN-1963-prototype.patch, YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2693: -- Attachment: (was: 0006-YARN-2693.patch) Priority Label Manager in RM to manage application priority based on configuration -- Key: YARN-2693 URL: https://issues.apache.org/jira/browse/YARN-2693 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * Expose interface to RM to validate priority label TO have simplified interface, Priority Manager will support only configuration file in contrast with admin cli and REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2003) Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side]
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2003: -- Attachment: 0003-YARN-2003.patch Rebasing against YARN-2693 Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side] -- Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2003.patch, 0002-YARN-2003.patch, 0003-YARN-2003.patch AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2004: -- Attachment: 0003-YARN-2004.patch Updating patch with minor variations Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2693: -- Attachment: 0006-YARN-2693.patch Uploading with minor corrections. findbugs warnings seems unrelated Priority Label Manager in RM to manage application priority based on configuration -- Key: YARN-2693 URL: https://issues.apache.org/jira/browse/YARN-2693 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch, 0006-YARN-2693.patch Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * Expose interface to RM to validate priority label TO have simplified interface, Priority Manager will support only configuration file in contrast with admin cli and REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2693: -- Attachment: (was: 0006-YARN-2693.patch) Priority Label Manager in RM to manage application priority based on configuration -- Key: YARN-2693 URL: https://issues.apache.org/jira/browse/YARN-2693 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * Expose interface to RM to validate priority label TO have simplified interface, Priority Manager will support only configuration file in contrast with admin cli and REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2693: -- Attachment: 0006-YARN-2693.patch Priority Label Manager in RM to manage application priority based on configuration -- Key: YARN-2693 URL: https://issues.apache.org/jira/browse/YARN-2693 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch, 0006-YARN-2693.patch Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * Expose interface to RM to validate priority label TO have simplified interface, Priority Manager will support only configuration file in contrast with admin cli and REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2693: -- Attachment: 0006-YARN-2693.patch Rebasing and uploading the patch. Priority Label Manager in RM to manage application priority based on configuration -- Key: YARN-2693 URL: https://issues.apache.org/jira/browse/YARN-2693 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch, 0006-YARN-2693.patch Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * Expose interface to RM to validate priority label TO have simplified interface, Priority Manager will support only configuration file in contrast with admin cli and REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2693: -- Attachment: (was: 0006-YARN-2693.patch) Priority Label Manager in RM to manage application priority based on configuration -- Key: YARN-2693 URL: https://issues.apache.org/jira/browse/YARN-2693 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch, 0006-YARN-2693.patch Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * Expose interface to RM to validate priority label TO have simplified interface, Priority Manager will support only configuration file in contrast with admin cli and REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347045#comment-14347045 ] Sunil G commented on YARN-3136: --- HI [~jlowe] and [~jianhe] Could u please take a look on the patch. getTransferredContainers can be a bottleneck during AM registration --- Key: YARN-3136 URL: https://issues.apache.org/jira/browse/YARN-3136 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Sunil G Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 0004-YARN-3136.patch getTransferredContainers can be a bottleneck during AM registration --- Key: YARN-3136 URL: https://issues.apache.org/jira/browse/YARN-3136 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Sunil G Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2693: -- Attachment: 0006-YARN-2693.patch Rebasing against trunk. Errors look unrelated Priority Label Manager in RM to manage application priority based on configuration -- Key: YARN-2693 URL: https://issues.apache.org/jira/browse/YARN-2693 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch, 0006-YARN-2693.patch Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * Expose interface to RM to validate priority label TO have simplified interface, Priority Manager will support only configuration file in contrast with admin cli and REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2693: -- Attachment: (was: 0006-YARN-2693.patch) Priority Label Manager in RM to manage application priority based on configuration -- Key: YARN-2693 URL: https://issues.apache.org/jira/browse/YARN-2693 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * Expose interface to RM to validate priority label TO have simplified interface, Priority Manager will support only configuration file in contrast with admin cli and REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2004: -- Attachment: 0004-YARN-2004.patch Rebasing the patch Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2003) Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side]
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2003: -- Attachment: 0004-YARN-2003.patch Re basing as per changes priority manager class. Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side] -- Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2003.patch, 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348975#comment-14348975 ] Sunil G commented on YARN-3136: --- Errors seems unrelated. getTransferredContainers can be a bottleneck during AM registration --- Key: YARN-3136 URL: https://issues.apache.org/jira/browse/YARN-3136 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Sunil G Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2986) (Umbrella) Support hierarchical and unified scheduler configuration
[ https://issues.apache.org/jira/browse/YARN-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329949#comment-14329949 ] Sunil G commented on YARN-2986: --- Thank you [~leftnoteasy], this is much awaited ticket :) I have few inputs on same. 1. {noformat} policy-properties resource-calculator org.apache.hadoop.yarn.util.resource.DominantResourceCalculator /resource-calculator /policy-properties {noformat} and {noformat} policy-properties user-limit-factor2/user-limit-factor {noformat} This is inside queue. Do you mean that, non repeating items are kept outside loop of queues and changing items are kept inside each queue? Hoewever if i have only one set userlimit, node labels etc, and if i keep all of those policy-properties outside *queue* section, then will it be applicable for all queues? If not, I suggest we can have policy-property name concept. {noformat} queue name=default stateRUNNING/state acl_submit_applications*/acl_submit_applications acl_administer_queue*/acl_administer_queue accessible-node-labelsx/accessible-node-labels policy-propertiesgpu/policy-properties /queue queue name=queueA stateRUNNING/state acl_submit_applications*/acl_submit_applications acl_administer_queue*/acl_administer_queue accessible-node-labelsx/accessible-node-labels policy-propertiesgpu/policy-properties /queue policy-properties name=gpu user-limit-factor2/user-limit-factor node-labels node-label name=x capacity20/capacity maximum-capacity50/maximum-capacity /node-label /node-labels {noformat} It cab be shared as needed across queues. And will make the queue part more readable. (Umbrella) Support hierarchical and unified scheduler configuration --- Key: YARN-2986 URL: https://issues.apache.org/jira/browse/YARN-2986 Project: Hadoop YARN Issue Type: Improvement Reporter: Vinod Kumar Vavilapalli Assignee: Wangda Tan Attachments: YARN-2986.1.patch Today's scheduler configuration is fragmented and non-intuitive, and needs to be improved. Details in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3250) Support admin cli interface in Application Priority Manager (server side)
Sunil G created YARN-3250: - Summary: Support admin cli interface in Application Priority Manager (server side) Key: YARN-3250 URL: https://issues.apache.org/jira/browse/YARN-3250 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Current Application Priority Manager supports only configuration via file. To support runtime configurations for admin cli and REST, a common management interface has to be added which can be shared with NodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335021#comment-14335021 ] Sunil G commented on YARN-3251: --- [~jlowe] Recent getAbsoluteMaxAvailCapacity changes cause this. CapacityScheduler deadlock when computing absolute max avail capacity - Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Priority: Blocker The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2693: -- Description: Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * Expose interface to RM to validate priority label TO have simplified interface, Priority Manager will support only configuration file in contrast with admin cli and REST. was: Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * ACL support in queue level for priority label * Expose interface to RM to validate priority label Storage for this labels will be done in FileSystem and in Memory similar to NodeLabel * FileSystem Based : persistent across RM restart * Memory Based: non-persistent across RM restart Priority Label Manager in RM to manage application priority based on configuration -- Key: YARN-2693 URL: https://issues.apache.org/jira/browse/YARN-2693 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * Expose interface to RM to validate priority label TO have simplified interface, Priority Manager will support only configuration file in contrast with admin cli and REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2693: -- Summary: Priority Label Manager in RM to manage application priority based on configuration (was: Priority Label Manager in RM to manage priority labels) Priority Label Manager in RM to manage application priority based on configuration -- Key: YARN-2693 URL: https://issues.apache.org/jira/browse/YARN-2693 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * ACL support in queue level for priority label * Expose interface to RM to validate priority label Storage for this labels will be done in FileSystem and in Memory similar to NodeLabel * FileSystem Based : persistent across RM restart * Memory Based: non-persistent across RM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335043#comment-14335043 ] Sunil G commented on YARN-3251: --- Its better to compute the available capacity during the call to root.assignContainers. In that scenario, a simpler get will retrieve the available capacity. CapacityScheduler deadlock when computing absolute max avail capacity - Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Priority: Blocker The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2693: -- Attachment: 0006-YARN-2693.patch Attaching a minimal version of Application Priority manager where only configuration support is present. YARN-3250 on longer run will handle admin cli and REST support. Priority Label Manager in RM to manage application priority based on configuration -- Key: YARN-2693 URL: https://issues.apache.org/jira/browse/YARN-2693 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch, 0006-YARN-2693.patch Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * Expose interface to RM to validate priority label TO have simplified interface, Priority Manager will support only configuration file in contrast with admin cli and REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289432#comment-14289432 ] Sunil G commented on YARN-3075: --- Hi Varun {code} + removeNodeFromLabels(nodeId, labels); host.labels.removeAll(labels); + for (EntryNodeId, Node nmEntry : host.nms.entrySet()) { +Node node = nmEntry.getValue(); if (node.labels != null) { node.labels.removeAll(labels); } +removeNodeFromLabels(nmEntry.getKey(), labels); } {code} I think first call to removeNodeFromLabels can be removed. Only loop should be enough. NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289465#comment-14289465 ] Sunil G commented on YARN-3075: --- Thank you [~varun_saxena] for clarifying. So as we discussed, you are saving hosts including port 0. Hence I got confused. If possible can try to keep the same storage structure, and it will be easier later for management. NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3118) clustering of ATS reader instances
[ https://issues.apache.org/jira/browse/YARN-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298212#comment-14298212 ] Sunil G commented on YARN-3118: --- Thank You Sangjin, I will also have a look on the comment given by Zhijie Zhen on YARN-3047 and will work based on same. clustering of ATS reader instances -- Key: YARN-3118 URL: https://issues.apache.org/jira/browse/YARN-3118 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sunil G YARN-3047 introduces the ATS reader basically as a single daemon. As a follow-up, we should consider clustering of ATS reader instances to be able to handle more traffic volume (large clusters, many use cases, etc.). It doesn't have to be in phase 1 (maybe for phase 2?). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3118) clustering of ATS reader instances
[ https://issues.apache.org/jira/browse/YARN-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G reassigned YARN-3118: - Assignee: Sunil G clustering of ATS reader instances -- Key: YARN-3118 URL: https://issues.apache.org/jira/browse/YARN-3118 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sunil G YARN-3047 introduces the ATS reader basically as a single daemon. As a follow-up, we should consider clustering of ATS reader instances to be able to handle more traffic volume (large clusters, many use cases, etc.). It doesn't have to be in phase 1 (maybe for phase 2?). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3118) clustering of ATS reader instances
[ https://issues.apache.org/jira/browse/YARN-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298122#comment-14298122 ] Sunil G commented on YARN-3118: --- Hi Sangjin Does this mean that a third party monitor also needed for managing all reader instances. I could see this as not taken up, and I am interested in taking this. If its not planned, kindly let me know. clustering of ATS reader instances -- Key: YARN-3118 URL: https://issues.apache.org/jira/browse/YARN-3118 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee YARN-3047 introduces the ATS reader basically as a single daemon. As a follow-up, we should consider clustering of ATS reader instances to be able to handle more traffic volume (large clusters, many use cases, etc.). It doesn't have to be in phase 1 (maybe for phase 2?). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3089) LinuxContainerExecutor does not handle file arguments to deleteAsUser
[ https://issues.apache.org/jira/browse/YARN-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298371#comment-14298371 ] Sunil G commented on YARN-3089: --- Hi [~eepayne] Thank you for bringing this up.. I have a comment on same. {code} int subDirEmptyStr = (subdir == NULL || subdir[0] == 0); {code} I think strlen(subdir) also has to be checked against 0, correct? LinuxContainerExecutor does not handle file arguments to deleteAsUser - Key: YARN-3089 URL: https://issues.apache.org/jira/browse/YARN-3089 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Eric Payne Priority: Blocker Attachments: YARN-3089.v1.txt YARN-2468 added the deletion of individual logs that are aggregated, but this fails to delete log files when the LCE is being used. The LCE native executable assumes the paths being passed are paths and the delete fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 0008-YARN-3136.patch getTransferredContainers can be a bottleneck during AM registration --- Key: YARN-3136 URL: https://issues.apache.org/jira/browse/YARN-3136 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Sunil G Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, 0006-YARN-3136.patch, 0007-YARN-3136.patch, 0008-YARN-3136.patch While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2004: -- Attachment: 0005-YARN-2004.patch Uploading CS changes. Hi [~leftnoteasy] YARN-2004 need to have some changes in CS and LeafQueue. But same methods dummy impl is added in YARN-2003. Hence it will be dependable with YARN-2003, but not opposite. Kindly share your opinion. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504219#comment-14504219 ] Sunil G commented on YARN-2004: --- Thank you [~jlowe] {noformat} @@ -327,6 +328,29 @@ private RMAppImpl createAndPopulateNewRMApp( ApplicationId applicationId = submissionContext.getApplicationId(); ResourceRequest amReq = validateAndCreateResourceRequest(submissionContext, isRecovery); + +Priority appPriority = submissionContext.getPriority(); +if (null != appPriority) { + try { +rmContext.getScheduler().authenticateApplicationPriority( +submissionContext.getPriority(), user, +submissionContext.getQueue(), applicationId); + } catch (IOException e) { +throw RPCUtil.getRemoteException(e.getMessage()); + } +} else { + // Get the default priority from Queue and set to Application + try { +appPriority = rmContext.getScheduler() + .getDefaultApplicationPriorityFromQueue(submissionContext.getQueue()); + } catch (IOException e) { {noformat} Above code snippet is from YARN-2003 which is handing changes in RM and Events for priority. When an app is submitted w/o priority, we would like to fill in with default priority from queue. bq.why would we want to limit which priorities are running within a queue? queueA: default=low queueB: default=medium The type of apps which we run may vary from queueA to B. So by keeping default priority different for each queue will help to handle such case. Assume more high level apps are running in queueA often, and medium level in queueB. Making different default priority can help here. [~leftnoteasy] Do you mean a global max priority which can help to limit the number associated with a priority ? bq. we just cap it by max-priority-limit instead of throw exception? Yes. I will update this part as against throwing exception. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only
[ https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505301#comment-14505301 ] Sunil G commented on YARN-3517: --- Thanks [~vvasudev] Patch looks good. RM web ui for dumping scheduler logs should be for admins only -- Key: YARN-3517 URL: https://issues.apache.org/jira/browse/YARN-3517 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, security Affects Versions: 2.7.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Blocker Labels: security Attachments: YARN-3517.001.patch, YARN-3517.002.patch, YARN-3517.003.patch YARN-3294 allows users to dump scheduler logs from the web UI. This should be for admins only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels
[ https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G reassigned YARN-3521: - Assignee: Sunil G Support return structured NodeLabel objects in REST API when call getClusterNodeLabels -- Key: YARN-3521 URL: https://issues.apache.org/jira/browse/YARN-3521 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should make the same change in REST API side to make them consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels
[ https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505437#comment-14505437 ] Sunil G commented on YARN-3521: --- Recently have done few work in Rest. I wud like to take over, pls reassign otherwise. Support return structured NodeLabel objects in REST API when call getClusterNodeLabels -- Key: YARN-3521 URL: https://issues.apache.org/jira/browse/YARN-3521 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should make the same change in REST API side to make them consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507501#comment-14507501 ] Sunil G commented on YARN-1963: --- Thank you [~grey] for sharing the thoughts. As per the design, integer will be used in schedulers all alone. Hence all comparisons and operations can be done on integer. However we can have a label mapping for the integer which can be used while application submission, and to view in UI etc. Labels can be added as only a mappings to integer. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: 0001-YARN-1963-prototype.patch, YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3534) Report node resource utilization
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508467#comment-14508467 ] Sunil G commented on YARN-3534: --- bq.resource utilization of the nodes Is this going to be configurable item to look for, or all resources will be monitored? Report node resource utilization Key: YARN-3534 URL: https://issues.apache.org/jira/browse/YARN-3534 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Assignee: Inigo Goiri Original Estimate: 336h Remaining Estimate: 336h YARN should be aware of the resource utilization of the nodes when scheduling containers. For this, this task will implement the NodeResourceMonitor and send this information to the Resource Manager in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2003) Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side]
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2003: -- Attachment: 00010-YARN-2003.patch Removing dependency with YARN-2004. Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side] -- Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 0008-YARN-2003.patch, 0009-YARN-2003.patch AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503158#comment-14503158 ] Sunil G commented on YARN-2004: --- Thank you very much for the comments. bq. default of default-priority is -1 I also have similar opinion as told by [~jlowe]. If we are looking for linux like priority and with range (-N,N), we may need the support of negative. But as a simple comparison, both do not matter much. For maintainability, I also support use of +ve integer and 0 as default. bq. We don't need per-user settings to get the basic A user can submit an application with a given priority. This priority will be validated against 1) whether is a valid priority as per the cluster priority list (0:Low, 1:Medium, 2:High) 2) whether is valid for the given queue config (QueueA {default=Low, max=Medium}) Hence Low and Medium are accessible for QueueA 3) ACLs (This will be done with a separate ticket) Now if user didnt submit app with a priority, we can take the default priority (Here for QueueA it is Low) configured for given queue. In earlier patch, this point was not added. I will add the same in subsequent patch. Coming to the point of discussion, I feel we can do this above design first, and then can handle per-user priority feature as a separate ticket. [~leftnoteasy] and [~jlowe] pls suggest your thoughts bq. There appear to be some missing NULL checks I am sorry for this, it will be removed. As suggested, I will change the log part and will upload a new version of patch. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2004: -- Attachment: 0006-YARN-2004.patch Updated patch as per the comments. [~leftnoteasy] [~jlowe] Please check the same bq. If a1.getApplicationPriority() returns non-null but a2.getApplicationPriority() returns null I have considered default priority scenario where if submitted app does not gave any priority, then default will be taken. So chances of null here in above scenario wont happen. {noformat} public int compare(FiCaSchedulerApp a1, FiCaSchedulerApp a2) { if (!a1.getApplicationPriority().equals(a2.getApplicationPriority())) { return a1.getApplicationPriority().compareTo( a2.getApplicationPriority()); } return a1.getApplicationId().compareTo(a2.getApplicationId()); } {noformat} Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506574#comment-14506574 ] Sunil G commented on YARN-2268: --- HI [~rohithsharma] I feel we can keep the file in statestore itself. And take file lock from there, then as you mentioned we may get exposed to race conditions such as RM is killed etc and lock file remains. Due to which *format* cannot be performed at all. Could we have a *-soft* and *-hard* format options here. *-soft* can acquire lock file to perform option. *-hard* format can go in and format w/o caring the lock. Pls share your thoughts. Disallow formatting the RMStateStore when there is an RM running Key: YARN-2268 URL: https://issues.apache.org/jira/browse/YARN-2268 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Rohith Attachments: 0001-YARN-2268.patch YARN-2131 adds a way to format the RMStateStore. However, it can be a problem if we format the store while an RM is actively using it. It would be nice to fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only
[ https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505193#comment-14505193 ] Sunil G commented on YARN-3517: --- HI [~vvasudev] I have couple of comments on same.. 1. {code} + if (callerUGI != null adminACLsManager.isAdmin(callerUGI)) { +isAdmin = true; + } {code} If adminACLsManager.areACLsEnabled() is false, do we need above check? 2. a minor nit {code}.append( b = confirm(\Are you sure you wish to generate{code} Can be changed as Are you sure to generate RM web ui for dumping scheduler logs should be for admins only -- Key: YARN-3517 URL: https://issues.apache.org/jira/browse/YARN-3517 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, security Affects Versions: 2.7.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Labels: security Attachments: YARN-3517.001.patch, YARN-3517.002.patch YARN-3294 allows users to dump scheduler logs from the web UI. This should be for admins only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517310#comment-14517310 ] Sunil G commented on YARN-2004: --- Yes [~jlowe] You are correct. We cannot compare highest priority across queues. If we do not do that, then there is not much meaning of keeping MAX priority per queue level. Initially I plan to change that part in another jira where we can have the max priority application running in queue also to take into consideration while processing node heartbeat [tries to select which queue can be considered based on resource consumption]. But this make things more complicated now in CS. I will be keeping this max in cluster level for now, so it can be accessible across all queues to make it simple. [~jlowe] [~leftnoteasy] [~vinodkv], pls share your thoughts. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517317#comment-14517317 ] Sunil G commented on YARN-2004: --- Extremely sorry [~eepayne] I mistyped your name as Jason. Hope you understood my comment about priority config across queue. Pls let me know your thoughts. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels
[ https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3521: -- Attachment: 0001-YARN-3521.patch Attaching an initial version. [~leftnoteasy] pls check the same as I have the changed the method interface of *getClusterNodeLabels* and *addToClusterNodeLabels* to pass argument to *ListNodeLabelInfo*. Support return structured NodeLabel objects in REST API when call getClusterNodeLabels -- Key: YARN-3521 URL: https://issues.apache.org/jira/browse/YARN-3521 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G Attachments: 0001-YARN-3521.patch In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should make the same change in REST API side to make them consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels
[ https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516214#comment-14516214 ] Sunil G commented on YARN-3521: --- Yes. [~leftnoteasy] This change suggestion looks fine for me. I will update patch on same. Also i will rename the Ticket name based on the last update. Support return structured NodeLabel objects in REST API when call getClusterNodeLabels -- Key: YARN-3521 URL: https://issues.apache.org/jira/browse/YARN-3521 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should make the same change in REST API side to make them consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels
[ https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3521: -- Attachment: 0002-YARN-3521.patch Thank you [~leftnoteasy] for sharing the comments. Pls find an updated patch addressing comments. Pls check the same and let me know your thoughts. Support return structured NodeLabel objects in REST API when call getClusterNodeLabels -- Key: YARN-3521 URL: https://issues.apache.org/jira/browse/YARN-3521 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G Attachments: 0001-YARN-3521.patch, 0002-YARN-3521.patch In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should make the same change in REST API side to make them consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3557) Support Intel Trusted Execution Technology(TXT) in YARN scheduler
[ https://issues.apache.org/jira/browse/YARN-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526399#comment-14526399 ] Sunil G commented on YARN-3557: --- bq.Currently for centralized node label configuration, it only supports admin configure node label through CLI. Apart from CLI and REST, do u mean like exposing these configuration for a specific user (i assume this user will have some security approval in the cluster) so that this user can make the config via REST or api's. Support Intel Trusted Execution Technology(TXT) in YARN scheduler - Key: YARN-3557 URL: https://issues.apache.org/jira/browse/YARN-3557 Project: Hadoop YARN Issue Type: New Feature Reporter: Dian Fu Attachments: Support TXT in YARN high level design doc.pdf Intel TXT defines platform-level enhancements that provide the building blocks for creating trusted platforms. A TXT aware YARN scheduler can schedule security sensitive jobs on TXT enabled nodes only. YARN-2492 provides the capacity to restrict YARN applications to run only on cluster nodes that have a specified node label. This is a good mechanism that be utilized for TXT aware YARN scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.
[ https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526423#comment-14526423 ] Sunil G commented on YARN-2305: --- Yes. This is can be closed. I have checked, and it was not occurring. Still i will perform few more tests, and if persists, I will reopen. Thank you [~leftnoteasy] When a container is in reserved state then total cluster memory is displayed wrongly. - Key: YARN-2305 URL: https://issues.apache.org/jira/browse/YARN-2305 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: J.Andreina Assignee: Sunil G Attachments: Capture.jpg ENV Details: = 3 queues : a(50%),b(25%),c(25%) --- All max utilization is set to 100 2 Node cluster with total memory as 16GB TestSteps: = Execute following 3 jobs with different memory configurations for Map , reducer and AM task ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 /dir8 /preempt_85 (application_1405414066690_0023) ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 /dir2 /preempt_86 (application_1405414066690_0025) ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 /dir2 /preempt_62 Issue = when 2GB memory is in reserved state totoal memory is shown as 15GB and used as 15GB ( while total memory is 16GB) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2293) Scoring for NMs to identify a better candidate to launch AMs
[ https://issues.apache.org/jira/browse/YARN-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526441#comment-14526441 ] Sunil G commented on YARN-2293: --- Hi [~zjshen] This work is moved to YARN-2005, I will share a basic prototype soon in that. This can be made as duplicated to YARN-2005. Scoring for NMs to identify a better candidate to launch AMs Key: YARN-2293 URL: https://issues.apache.org/jira/browse/YARN-2293 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Reporter: Sunil G Assignee: Sunil G Container exit status from NM is giving indications of reasons for its failure. Some times, it may be because of container launching problems in NM. In a heterogeneous cluster, some machines with weak hardware may cause more failures. It will be better not to launch AMs there more often. Also I would like to clear that container failures because of buggy job should not result in decreasing score. As mentioned earlier, based on exit status if a scoring mechanism is added for NMs in RM, then NMs with better scores can be given for launching AMs. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2293) Scoring for NMs to identify a better candidate to launch AMs
[ https://issues.apache.org/jira/browse/YARN-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G resolved YARN-2293. --- Resolution: Duplicate Scoring for NMs to identify a better candidate to launch AMs Key: YARN-2293 URL: https://issues.apache.org/jira/browse/YARN-2293 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Reporter: Sunil G Assignee: Sunil G Container exit status from NM is giving indications of reasons for its failure. Some times, it may be because of container launching problems in NM. In a heterogeneous cluster, some machines with weak hardware may cause more failures. It will be better not to launch AMs there more often. Also I would like to clear that container failures because of buggy job should not result in decreasing score. As mentioned earlier, based on exit status if a scoring mechanism is added for NMs in RM, then NMs with better scores can be given for launching AMs. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2267) Auxiliary Service support in RM
[ https://issues.apache.org/jira/browse/YARN-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526446#comment-14526446 ] Sunil G commented on YARN-2267: --- It would be a good feature if we can plugin few resource monitoring services to RM such as mentioned in *Scenario 1* above. Could you please share the design thoughts for same, and main question will be like how this can be done in controlled way. By this what i meant is, an introduction of plugin should not conflict the existing behavior of scheduler's etc. Auxiliary Service support in RM --- Key: YARN-2267 URL: https://issues.apache.org/jira/browse/YARN-2267 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Naganarasimha G R Assignee: Rohith Currently RM does not have a provision to run any Auxiliary services. For health/monitoring in RM, its better to make a plugin mechanism in RM itself, similar to NM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1662) Capacity Scheduler reservation issue cause Job Hang
[ https://issues.apache.org/jira/browse/YARN-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G resolved YARN-1662. --- Resolution: Invalid Capacity Scheduler reservation issue cause Job Hang --- Key: YARN-1662 URL: https://issues.apache.org/jira/browse/YARN-1662 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.2.0 Environment: Suse 11 SP1 + Linux Reporter: Sunil G There are 2 node managers in my cluster. NM1 with 8GB NM2 with 8GB I am submitting a Job with below details: AM with 2GB Map needs 5GB Reducer needs 3GB slowstart is enabled with 0.5 10maps and 50reducers are assigned. 5maps are completed. Now few reducers got scheduled. Now NM1 has 2GB AM and 3Gb Reducer_1[Used 5GB] NM2 has 3Gb Reducer_2 [Used 3GB] A Map has now reserved(5GB) in NM1 which has only 3Gb free. It hangs forever. Potential issue is, reservation is now blocked in NM1 for a Map which needs 5GB. But the Reducer_1 hangs by waiting for few map ouputs. Reducer side preemption also not happened as few headroom is still available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1662) Capacity Scheduler reservation issue cause Job Hang
[ https://issues.apache.org/jira/browse/YARN-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526473#comment-14526473 ] Sunil G commented on YARN-1662: --- Yes [~jianhe] we can close this issue. After YARN-1769, we have a better reservation too. I checked this and its not happening now. Capacity Scheduler reservation issue cause Job Hang --- Key: YARN-1662 URL: https://issues.apache.org/jira/browse/YARN-1662 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.2.0 Environment: Suse 11 SP1 + Linux Reporter: Sunil G There are 2 node managers in my cluster. NM1 with 8GB NM2 with 8GB I am submitting a Job with below details: AM with 2GB Map needs 5GB Reducer needs 3GB slowstart is enabled with 0.5 10maps and 50reducers are assigned. 5maps are completed. Now few reducers got scheduled. Now NM1 has 2GB AM and 3Gb Reducer_1[Used 5GB] NM2 has 3Gb Reducer_2 [Used 3GB] A Map has now reserved(5GB) in NM1 which has only 3Gb free. It hangs forever. Potential issue is, reservation is now blocked in NM1 for a Map which needs 5GB. But the Reducer_1 hangs by waiting for few map ouputs. Reducer side preemption also not happened as few headroom is still available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels
[ https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526747#comment-14526747 ] Sunil G commented on YARN-3521: --- 1. bq.Should be exclusivity. Yes. I updated the same 2. bq.Did we ever call these APIs stable? No. I have changed to a NodeLabelsInfo object and added new getter which can supply list/set of string names. 3. Why are we not dropping the name-only records? I have removed *NodeLabelsName*. And instead use *NodeLabelsInfo*, also added a new getter which can give back String of label names. NodeToLabelsName is renamed as NodeToLabelsInfo and internally it also uses NodeLabelInfo. Support return structured NodeLabel objects in REST API when call getClusterNodeLabels -- Key: YARN-3521 URL: https://issues.apache.org/jira/browse/YARN-3521 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G Attachments: 0001-YARN-3521.patch, 0002-YARN-3521.patch, 0003-YARN-3521.patch In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should make the same change in REST API side to make them consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels
[ https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3521: -- Attachment: 0004-YARN-3521.patch [~vinodkv] and [~leftnoteasy] Pls share your thoughts on this updated patch. IMO I also feel that NodeLabelManager apis can use Object rather than Strings. Admin interface can take this conversion logic. Support return structured NodeLabel objects in REST API when call getClusterNodeLabels -- Key: YARN-3521 URL: https://issues.apache.org/jira/browse/YARN-3521 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G Attachments: 0001-YARN-3521.patch, 0002-YARN-3521.patch, 0003-YARN-3521.patch, 0004-YARN-3521.patch In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should make the same change in REST API side to make them consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels
[ https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3521: -- Attachment: 0003-YARN-3521.patch Thank you [~leftnoteasy] for the comments. I am uploading now an interim patch which has the new test case. However some more change is needed to correct xml structure, for that i ll upload another patch. Support return structured NodeLabel objects in REST API when call getClusterNodeLabels -- Key: YARN-3521 URL: https://issues.apache.org/jira/browse/YARN-3521 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G Attachments: 0001-YARN-3521.patch, 0002-YARN-3521.patch, 0003-YARN-3521.patch In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should make the same change in REST API side to make them consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels
[ https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527904#comment-14527904 ] Sunil G commented on YARN-3521: --- [~leftnoteasy] Yes, Its not a valid point. replaceLabelsOnNode and removeFromClusterNodeLabels doesn't need node label object, name is enough. Pls discard my earlier comment. Support return structured NodeLabel objects in REST API when call getClusterNodeLabels -- Key: YARN-3521 URL: https://issues.apache.org/jira/browse/YARN-3521 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G Attachments: 0001-YARN-3521.patch, 0002-YARN-3521.patch, 0003-YARN-3521.patch, 0004-YARN-3521.patch In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should make the same change in REST API side to make them consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3579) getLabelsToNodes in CommonNodeLabelsManager should support NodeLabel instead of label name as String
[ https://issues.apache.org/jira/browse/YARN-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529764#comment-14529764 ] Sunil G commented on YARN-3579: --- Sure [~leftnoteasy] I will keep the REST changes in YARN-3521 itself. I will open another ticket for YarnClient. Thank you. getLabelsToNodes in CommonNodeLabelsManager should support NodeLabel instead of label name as String Key: YARN-3579 URL: https://issues.apache.org/jira/browse/YARN-3579 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Priority: Minor CommonNodeLabelsManager#getLabelsToNodes returns label name as string. It is not passing information such as Exclusivity etc back to REST interface apis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.
Sunil G created YARN-3583: - Summary: Support of NodeLabel object instead of plain String in YarnClient side. Key: YARN-3583 URL: https://issues.apache.org/jira/browse/YARN-3583 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Similar to YARN-3521, use NodeLabel objects in its apis getLabelsToNodes/getNodeToLabels instead of using plain label name. This will help to bring other label details such as Exclusivity to client side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels
[ https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3521: -- Attachment: 0005-YARN-3521.patch Thank You [~leftnoteasy] for the detailed comments. I have updated patch as per same. Few notes: 1. NodeLabelsName and NodeToLabelsName classes are needed for replace and remove apis. Else as you mentioned, REST apis wont be clean enough. I hope this is fine. 2. bq.getLabelsToNodes should use object Done 3. bq.replace/remove should use list of label name only Done Support return structured NodeLabel objects in REST API when call getClusterNodeLabels -- Key: YARN-3521 URL: https://issues.apache.org/jira/browse/YARN-3521 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G Attachments: 0001-YARN-3521.patch, 0002-YARN-3521.patch, 0003-YARN-3521.patch, 0004-YARN-3521.patch, 0005-YARN-3521.patch In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should make the same change in REST API side to make them consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3592) Fix typos in RMNodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-3592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533159#comment-14533159 ] Sunil G commented on YARN-3592: --- testRMRestartGetApplicationList is passing locally. Test failure is unrelated. New test cases are not needed for this as its a typo issue. Fix typos in RMNodeLabelsManager Key: YARN-3592 URL: https://issues.apache.org/jira/browse/YARN-3592 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Junping Du Assignee: Sunil G Labels: newbie Attachments: 0001-YARN-3592.patch acccessibleNodeLabels = accessibleNodeLabels in many places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1019) YarnConfiguration validation for local disk path and http addresses.
[ https://issues.apache.org/jira/browse/YARN-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534149#comment-14534149 ] Sunil G commented on YARN-1019: --- I would like to take over this as part of bugBash. Please reassign the same if you are working in it. I am writing test case for this fix and will upload a patch soon. YarnConfiguration validation for local disk path and http addresses. Key: YARN-1019 URL: https://issues.apache.org/jira/browse/YARN-1019 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Omkar Vinit Joshi Priority: Minor Labels: BB2015-05-TBR, newbie Attachments: YARN-1019.0.patch Today we are not validating certain configuration parameters set in yarn-site.xml. 1) Configurations related to paths... such as local-dirs, log-dirs.. Our NM crashes during startup if they are set to relative paths rather than absolute paths. To avoid such failures we can enforce checks (absolute paths) before startup . i.e. before we actually startup...( i.e. directory handler creating directories). 2) Also for all the parameters using hostname:port unless we are ok with default port. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-1019) YarnConfiguration validation for local disk path and http addresses.
[ https://issues.apache.org/jira/browse/YARN-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G reassigned YARN-1019: - Assignee: Sunil G YarnConfiguration validation for local disk path and http addresses. Key: YARN-1019 URL: https://issues.apache.org/jira/browse/YARN-1019 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Omkar Vinit Joshi Assignee: Sunil G Priority: Minor Labels: BB2015-05-TBR, newbie Attachments: YARN-1019.0.patch Today we are not validating certain configuration parameters set in yarn-site.xml. 1) Configurations related to paths... such as local-dirs, log-dirs.. Our NM crashes during startup if they are set to relative paths rather than absolute paths. To avoid such failures we can enforce checks (absolute paths) before startup . i.e. before we actually startup...( i.e. directory handler creating directories). 2) Also for all the parameters using hostname:port unless we are ok with default port. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-64) Add cluster-level stats availabe via RPCs
[ https://issues.apache.org/jira/browse/YARN-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534025#comment-14534025 ] Sunil G commented on YARN-64: - Hi [~vinodkv] Now we have cluster stats in UI and REST. Do we need to have in command line/rpc level too. Add cluster-level stats availabe via RPCs - Key: YARN-64 URL: https://issues.apache.org/jira/browse/YARN-64 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Ravi Teja Ch N V MAPREDUCE-2738 already added the stats to the UI. It'll be helpful to add them to YarnClusterMetrics and make them available via the command-line/RPC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3592) Fix typos in RMNodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-3592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534259#comment-14534259 ] Sunil G commented on YARN-3592: --- Thank You [~devaraj.k] for commiting the same, and thank you Junping Du. Fix typos in RMNodeLabelsManager Key: YARN-3592 URL: https://issues.apache.org/jira/browse/YARN-3592 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Junping Du Assignee: Sunil G Labels: newbie Fix For: 2.8.0 Attachments: 0001-YARN-3592.patch acccessibleNodeLabels = accessibleNodeLabels in many places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3592) Fix typos in RMNodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-3592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532749#comment-14532749 ] Sunil G commented on YARN-3592: --- Hi [~djp] I will share a patch for same. Fix typos in RMNodeLabelsManager Key: YARN-3592 URL: https://issues.apache.org/jira/browse/YARN-3592 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Junping Du Assignee: Sunil G Labels: newbie acccessibleNodeLabels = accessibleNodeLabels in many places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3592) Fix typos in RMNodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-3592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G reassigned YARN-3592: - Assignee: Sunil G Fix typos in RMNodeLabelsManager Key: YARN-3592 URL: https://issues.apache.org/jira/browse/YARN-3592 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Junping Du Assignee: Sunil G Labels: newbie acccessibleNodeLabels = accessibleNodeLabels in many places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3579) getLabelsToNodes in CommonNodeLabelsManager should support NodeLabel instead of label name as String
[ https://issues.apache.org/jira/browse/YARN-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3579: -- Attachment: 0001-YARN-3579.patch Hi [~leftnoteasy] Attaching an initial version of patch. * getNodeLabelsInfo, which is a new version apis can be used in REST interface and admin side. This supports object instead of string * getNodeLabels will be still used in recover call flow where string is correct to use for. * getLabelsInfoToNodes and getLabelsToNodes are 2 different versions. getLabelsToNodes is marked as Deprecated and can be removed once REST and admin side started using new api. Pls share your thoughts on same. getLabelsToNodes in CommonNodeLabelsManager should support NodeLabel instead of label name as String Key: YARN-3579 URL: https://issues.apache.org/jira/browse/YARN-3579 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Priority: Minor Attachments: 0001-YARN-3579.patch CommonNodeLabelsManager#getLabelsToNodes returns label name as string. It is not passing information such as Exclusivity etc back to REST interface apis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3603) Application Attempts page confusing
[ https://issues.apache.org/jira/browse/YARN-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534860#comment-14534860 ] Sunil G commented on YARN-3603: --- I would like to work on this. Would u mind if I take this. Thank u... Application Attempts page confusing --- Key: YARN-3603 URL: https://issues.apache.org/jira/browse/YARN-3603 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.8.0 Reporter: Thomas Graves Assignee: Sunil G The application attempts page (http://RM:8088/cluster/appattempt/appattempt_1431101480046_0003_01) is a bit confusing on what is going on. I think the table of containers there is for only Running containers and when the app is completed or killed its empty. The table should have a label on it stating so. Also the AM Container field is a link when running but not when its killed. That might be confusing. There is no link to the logs in this page but there is in the app attempt table when looking at http:// rm:8088/cluster/app/application_1431101480046_0003 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3603) Application Attempts page confusing
[ https://issues.apache.org/jira/browse/YARN-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G reassigned YARN-3603: - Assignee: Sunil G Application Attempts page confusing --- Key: YARN-3603 URL: https://issues.apache.org/jira/browse/YARN-3603 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.8.0 Reporter: Thomas Graves Assignee: Sunil G The application attempts page (http://RM:8088/cluster/appattempt/appattempt_1431101480046_0003_01) is a bit confusing on what is going on. I think the table of containers there is for only Running containers and when the app is completed or killed its empty. The table should have a label on it stating so. Also the AM Container field is a link when running but not when its killed. That might be confusing. There is no link to the logs in this page but there is in the app attempt table when looking at http:// rm:8088/cluster/app/application_1431101480046_0003 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3579) getLabelsToNodes in CommonNodeLabelsManager should support NodeLabel instead of label name as String
Sunil G created YARN-3579: - Summary: getLabelsToNodes in CommonNodeLabelsManager should support NodeLabel instead of label name as String Key: YARN-3579 URL: https://issues.apache.org/jira/browse/YARN-3579 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Priority: Minor CommonNodeLabelsManager#getLabelsToNodes returns label name as string. It is not passing information such as Exclusivity etc back to REST interface apis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3579) getLabelsToNodes in CommonNodeLabelsManager should support NodeLabel instead of label name as String
[ https://issues.apache.org/jira/browse/YARN-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3579: -- Issue Type: Sub-task (was: Bug) Parent: YARN-2492 getLabelsToNodes in CommonNodeLabelsManager should support NodeLabel instead of label name as String Key: YARN-3579 URL: https://issues.apache.org/jira/browse/YARN-3579 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Priority: Minor CommonNodeLabelsManager#getLabelsToNodes returns label name as string. It is not passing information such as Exclusivity etc back to REST interface apis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3592) Fix typos in RMNodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-3592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3592: -- Attachment: 0001-YARN-3592.patch Fix typos in RMNodeLabelsManager Key: YARN-3592 URL: https://issues.apache.org/jira/browse/YARN-3592 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Junping Du Assignee: Sunil G Labels: newbie Attachments: 0001-YARN-3592.patch acccessibleNodeLabels = accessibleNodeLabels in many places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels
[ https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532966#comment-14532966 ] Sunil G commented on YARN-3521: --- Hi [~leftnoteasy] List of NodeToLabelsEntry will also have issues. By default queryParams can have primitive types. If we have objects as list, then these objects need to have constructor for string. In here, we need String and ListString for ctor. Support return structured NodeLabel objects in REST API when call getClusterNodeLabels -- Key: YARN-3521 URL: https://issues.apache.org/jira/browse/YARN-3521 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G Attachments: 0001-YARN-3521.patch, 0002-YARN-3521.patch, 0003-YARN-3521.patch, 0004-YARN-3521.patch, 0005-YARN-3521.patch In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should make the same change in REST API side to make them consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1019) YarnConfiguration validation for local disk path and http addresses.
[ https://issues.apache.org/jira/browse/YARN-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534371#comment-14534371 ] Sunil G commented on YARN-1019: --- Looks like Configuration.java in windows machine has some UTF chars and causing file saving problem,. i will rebase from a unix machine. YarnConfiguration validation for local disk path and http addresses. Key: YARN-1019 URL: https://issues.apache.org/jira/browse/YARN-1019 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Omkar Vinit Joshi Assignee: Sunil G Priority: Minor Labels: BB2015-05-TBR, newbie Attachments: YARN-1019.0.patch Today we are not validating certain configuration parameters set in yarn-site.xml. 1) Configurations related to paths... such as local-dirs, log-dirs.. Our NM crashes during startup if they are set to relative paths rather than absolute paths. To avoid such failures we can enforce checks (absolute paths) before startup . i.e. before we actually startup...( i.e. directory handler creating directories). 2) Also for all the parameters using hostname:port unless we are ok with default port. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels
[ https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3521: -- Attachment: 0006-YARN-3521.patch Hi [~leftnoteasy] [~vinodkv] Uploading a newer version of patch. Addressed all the comments from Wangda except the NodeToLabelsName and NodeLabelsName.. NodeToLabelsName now contains a List of NodeToLabelEntry, and I use this class as the parameter for replaceNodeLabels api. I tried using List of NodeToLabelEntry w/o the wrapper class called NodeToLabelsName ,but some how i was not able to test this from test case. Pls suggest you thoughts on using this outer wrapper class rather List of NodeToLabelEntry Support return structured NodeLabel objects in REST API when call getClusterNodeLabels -- Key: YARN-3521 URL: https://issues.apache.org/jira/browse/YARN-3521 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G Attachments: 0001-YARN-3521.patch, 0002-YARN-3521.patch, 0003-YARN-3521.patch, 0004-YARN-3521.patch, 0005-YARN-3521.patch, 0006-YARN-3521.patch In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should make the same change in REST API side to make them consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3579) getLabelsToNodes in CommonNodeLabelsManager should support NodeLabel instead of label name as String
[ https://issues.apache.org/jira/browse/YARN-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3579: -- Attachment: 0002-YARN-3579.patch Thank You [~leftnoteasy] for the comments. I updated the patch as per same. I used a generic method to remove the code duplication, kindly check the same. getLabelsToNodes in CommonNodeLabelsManager should support NodeLabel instead of label name as String Key: YARN-3579 URL: https://issues.apache.org/jira/browse/YARN-3579 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Priority: Minor Attachments: 0001-YARN-3579.patch, 0002-YARN-3579.patch CommonNodeLabelsManager#getLabelsToNodes returns label name as string. It is not passing information such as Exclusivity etc back to REST interface apis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)