[jira] [Commented] (YARN-10633) setup yarn federation failed
[ https://issues.apache.org/jira/browse/YARN-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17289299#comment-17289299 ] Subramaniam Krishnan commented on YARN-10633: - [~hanfrank], there are multiple other configs require to enable federation as well. Can you follow the detailed steps in the Configuration section under: https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/Federation.html > setup yarn federation failed > > > Key: YARN-10633 > URL: https://issues.apache.org/jira/browse/YARN-10633 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.2.2 >Reporter: yuguang >Priority: Major > > Hi > I am trying to setup yarn federation mode. But after I add below > configuration in etc/hadoop/yarn-site.xml > > yarn.federation.enabled > true > > then when I run yarn node -list . Get below error . Also the historyserver > service can not be started either . > I am using hadoop-3.2.2 version . > [root@yarna hadoop-3.2.2]# yarn node -list > 2021-02-18 05:51:39,178 INFO service.AbstractService: Service > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl failed in state > STARTEDjava.lang.ArrayIndexOutOfBoundsException: Index 0 out of bounds for > length 0 at > org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.init(ConfiguredRMFailoverProxyProvider.java:62) > at > org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:175) > at org.apache.hadoop.yarn.client.RMProxy.newProxyInstance(RMProxy.java:130) > at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:103) at > org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceStart(YarnClientImpl.java:233) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.client.cli.YarnCLI.createAndStartYarnClient(YarnCLI.java:55) > at org.apache.hadoop.yarn.client.cli.NodeCLI.run(NodeCLI.java:110) at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at > org.apache.hadoop.yarn.client.cli.NodeCLI.main(NodeCLI.java:62)Exception in > thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 0 out of bounds > for length 0 at > org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.init(ConfiguredRMFailoverProxyProvider.java:62) > at > org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:175) > at org.apache.hadoop.yarn.client.RMProxy.newProxyInstance(RMProxy.java:130) > at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:103) at > org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceStart(YarnClientImpl.java:233) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.client.cli.YarnCLI.createAndStartYarnClient(YarnCLI.java:55) > at org.apache.hadoop.yarn.client.cli.NodeCLI.run(NodeCLI.java:110) at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at > org.apache.hadoop.yarn.client.cli.NodeCLI.main(NodeCLI.java:62) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10125) In Federation, kill application from client does not kill Unmanaged AM's and containers launched by Unmanaged AM
[ https://issues.apache.org/jira/browse/YARN-10125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17289294#comment-17289294 ] Subramaniam Krishnan commented on YARN-10125: - Thanks [~brahmareddy] for looping me in, agree it should be handled. I recall testing this as well in the initial version years back as well, not sure how it got dropped in the interim. Thanks [~dmmkr] for working on this! > In Federation, kill application from client does not kill Unmanaged AM's and > containers launched by Unmanaged AM > > > Key: YARN-10125 > URL: https://issues.apache.org/jira/browse/YARN-10125 > Project: Hadoop YARN > Issue Type: Bug > Components: client, federation, router >Reporter: D M Murali Krishna Reddy >Assignee: D M Murali Krishna Reddy >Priority: Major > Attachments: YARN-10125.001.patch > > > In Federation, killing an application from client using "bin/yarn application > -kill ", kills the containers only of the home subcluster, > the Unmanaged AM and the containers launched in other subcluster are not > being killed causing blocking of resources. > The containers get killed after the task gets completed and The unmanaged AM > gets killed after 10 minutes of killing the application, killing any > remaining running containers in that subcluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1187) Add discrete event-based simulation to yarn scheduler simulator
[ https://issues.apache.org/jira/browse/YARN-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262891#comment-17262891 ] Subramaniam Krishnan commented on YARN-1187: Thanks [~ywskycn] , I have assigned it to [~afchung90] . > Add discrete event-based simulation to yarn scheduler simulator > --- > > Key: YARN-1187 > URL: https://issues.apache.org/jira/browse/YARN-1187 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wei Yan >Assignee: Andrew Chung >Priority: Major > > Follow the discussion in YARN-1021. > Discrete event simulation decouples the running from any real-world clock. > This allows users to step through the execution, set debug points, and > definitely get a deterministic rexec. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-1187) Add discrete event-based simulation to yarn scheduler simulator
[ https://issues.apache.org/jira/browse/YARN-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan reassigned YARN-1187: -- Assignee: Andrew Chung (was: Wei Yan) > Add discrete event-based simulation to yarn scheduler simulator > --- > > Key: YARN-1187 > URL: https://issues.apache.org/jira/browse/YARN-1187 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wei Yan >Assignee: Andrew Chung >Priority: Major > > Follow the discussion in YARN-1021. > Discrete event simulation decouples the running from any real-world clock. > This allows users to step through the execution, set debug points, and > definitely get a deterministic rexec. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2475) ReservationSystem: replan upon capacity reduction
[ https://issues.apache.org/jira/browse/YARN-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-2475: --- Attachment: YARN-2475.patch Thanks [~chris.douglas] for your diligent review. I am attaching a patch that has the minor tweaks you suggested. ReservationSystem: replan upon capacity reduction - Key: YARN-2475 URL: https://issues.apache.org/jira/browse/YARN-2475 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-2475.patch, YARN-2475.patch, YARN-2475.patch In the context of YARN-1051, if capacity of the cluster drops significantly upon machine failures we need to trigger a reorganization of the planned reservations. As reservations are absolute it is possible that they will not all fit, and some need to be rejected a-posteriori. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1709) Admission Control: Reservation subsystem
[ https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1709: --- Attachment: YARN-1709.patch Thanks [~chris.douglas] for your diligent review. I am attaching a patch that has the minor tweaks you suggested. Admission Control: Reservation subsystem Key: YARN-1709 URL: https://issues.apache.org/jira/browse/YARN-1709 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch This JIRA is about the key data structure used to track resources over time to enable YARN-1051. The Reservation subsystem is conceptually a plan of how the scheduler will allocate resources over-time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2475) ReservationSystem: replan upon capacity reduction
[ https://issues.apache.org/jira/browse/YARN-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132158#comment-14132158 ] Subramaniam Krishnan commented on YARN-2475: bq. Just a minor clarification: as this iterates over each instant of the plan, are others allowed to modify it? I prefer current approach to global locking of plan as users can submit requests for new (or modify existing reservations) for future. Additional requests within the replanning window will be rejected at the validation stage itself even before they reach the plan because execution of replanner implies there is no spare capacity. Makes sense? ReservationSystem: replan upon capacity reduction - Key: YARN-2475 URL: https://issues.apache.org/jira/browse/YARN-2475 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-2475.patch, YARN-2475.patch, YARN-2475.patch In the context of YARN-1051, if capacity of the cluster drops significantly upon machine failures we need to trigger a reorganization of the planned reservations. As reservations are absolute it is possible that they will not all fit, and some need to be rejected a-posteriori. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1708) Add a public API to reserve resources (part of YARN-1051)
[ https://issues.apache.org/jira/browse/YARN-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1708: --- Attachment: YARN-1708.patch Rebased patch after sync-ing branch yarn-1051 with trunk Add a public API to reserve resources (part of YARN-1051) - Key: YARN-1708 URL: https://issues.apache.org/jira/browse/YARN-1708 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan Attachments: YARN-1708.patch, YARN-1708.patch, YARN-1708.patch, YARN-1708.patch This JIRA tracks the definition of a new public API for YARN, which allows users to reserve resources (think of time-bounded queues). This is part of the admission control enhancement proposed in YARN-1051. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1707: --- Attachment: YARN-1707.10.patch Rebased patch after sync-ing branch yarn-1051 with trunk Making the CapacityScheduler more dynamic - Key: YARN-1707 URL: https://issues.apache.org/jira/browse/YARN-1707 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler Attachments: YARN-1707.10.patch, YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.6.patch, YARN-1707.7.patch, YARN-1707.8.patch, YARN-1707.9.patch, YARN-1707.patch The CapacityScheduler is a rather static at the moment, and refreshqueue provides a rather heavy-handed way to reconfigure it. Moving towards long-running services (tracked in YARN-896) and to enable more advanced admission control and resource parcelling we need to make the CapacityScheduler more dynamic. This is instrumental to the umbrella jira YARN-1051. Concretely this require the following changes: * create queues dynamically * destroy queues dynamically * dynamically change queue parameters (e.g., capacity) * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% instead of ==100% We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1709) Admission Control: Reservation subsystem
[ https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1709: --- Attachment: YARN-1709.patch Rebased patch after sync-ing branch yarn-1051 with trunk Admission Control: Reservation subsystem Key: YARN-1709 URL: https://issues.apache.org/jira/browse/YARN-1709 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch This JIRA is about the key data structure used to track resources over time to enable YARN-1051. The Reservation subsystem is conceptually a plan of how the scheduler will allocate resources over-time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
[ https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1711: --- Attachment: YARN-1711.3.patch Rebased patch after sync-ing branch yarn-1051 with trunk CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709 -- Key: YARN-1711 URL: https://issues.apache.org/jira/browse/YARN-1711 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino Assignee: Carlo Curino Labels: reservations Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.3.patch, YARN-1711.patch This JIRA tracks the development of a policy that enforces user quotas (a time-extension of the notion of capacity) in the inventory subsystem discussed in YARN-1709. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager
[ https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-2080: --- Attachment: YARN-2080.patch Rebased patch after sync-ing branch yarn-1051 with trunk Admission Control: Integrate Reservation subsystem with ResourceManager --- Key: YARN-2080 URL: https://issues.apache.org/jira/browse/YARN-2080 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Subramaniam Krishnan Assignee: Subramaniam Krishnan Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, YARN-2080.patch This JIRA tracks the integration of Reservation subsystem data structures introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring of YARN-1051. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1712) Admission Control: plan follower
[ https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1712: --- Attachment: YARN-1712.4.patch Thanks [~leftnoteasy] for reviewing the patch. I have wrapped the debug logs with _isDebugEnabled()_. This patch is also rebased post sync-ing of branch yarn-1051 with trunk Admission Control: plan follower Key: YARN-1712 URL: https://issues.apache.org/jira/browse/YARN-1712 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Labels: reservations, scheduler Attachments: YARN-1712.1.patch, YARN-1712.2.patch, YARN-1712.3.patch, YARN-1712.4.patch, YARN-1712.patch This JIRA tracks a thread that continuously propagates the current state of an inventory subsystem to the scheduler. As the inventory subsystem store the plan of how the resources should be subdivided, the work we propose in this JIRA realizes such plan by dynamically instructing the CapacityScheduler to add/remove/resize queues to follow the plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2475) ReservationSystem: replan upon capacity reduction
[ https://issues.apache.org/jira/browse/YARN-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-2475: --- Attachment: YARN-2475.patch Thanks [~chris.douglas] for reviewing the patch. I am uploading a patch that addresses all your comments (skipping relisting them). bq. Why is the enforcement window tied to CapacitySchedulerConfiguration? The replanner can be configured per plan which in turn translates to a leaf queue in capacity scheduler configuration. Consequently the enforcement window is configured for the replanner via the capacity scheduler leaf queue configuration. ReservationSystem: replan upon capacity reduction - Key: YARN-2475 URL: https://issues.apache.org/jira/browse/YARN-2475 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-2475.patch, YARN-2475.patch In the context of YARN-1051, if capacity of the cluster drops significantly upon machine failures we need to trigger a reorganization of the planned reservations. As reservations are absolute it is possible that they will not all fit, and some need to be rejected a-posteriori. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1709) Admission Control: Reservation subsystem
[ https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1709: --- Attachment: YARN-1709.patch The patch has only modification - an additional constructor in InMemoryPlan which takes in the clock that can be used to pass in mock clocks in test cases as [suggested | https://issues.apache.org/jira/browse/YARN-2475?focusedCommentId=14129041] by [~chris.douglas] Admission Control: Reservation subsystem Key: YARN-1709 URL: https://issues.apache.org/jira/browse/YARN-1709 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch This JIRA is about the key data structure used to track resources over time to enable YARN-1051. The Reservation subsystem is conceptually a plan of how the scheduler will allocate resources over-time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager
[ https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-2080: --- Attachment: YARN-2080.patch Thanks [~vinodkv] for reviewing the patch. I am uploading a new patch that has includes your feedback: * Renamed all Yarn config variables as you suggested. I prefer using the standalone configs as it gives us more flexibility. * Removed duplicate logging in _ClientRMService_ _ReservationInputValidator_. Consistenly uses RMAuditLogger throughout. * Fixes in AbstractReservationSystem as you suggested. * Updated stale references to queues in Javadocs of _YarnClient.submitReservation()_ * _TestYarnClient_ _TestClientRMService_ use newInstance instead of PBImpls * Renamed _ReservationRequest.setLeaseDuration()_ was renamed to be simply _setDuration()_ * Moved _CapacitySchedulerConfiguration_ to YARN-1711 bq. ReservationInputValidator: Deleting a request shouldn't need validateReservationUpdateRequest-validateReservationDefinition. We only need the ID validation That's exactly what's being done. ReservationDefinitions are validated only for submission/update. bq. checkReservationACLs: Today anyone who can submit applications can also submit reservations. We may want to separate them, if you agree, I'll file a ticket for future separation of these ACLs. I agree. I have a set of follow up enhancement JIRAs to YARN-1051 in mind one of which was exactly to consider separation of ACLs as you pointed out. Admission Control: Integrate Reservation subsystem with ResourceManager --- Key: YARN-2080 URL: https://issues.apache.org/jira/browse/YARN-2080 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Subramaniam Krishnan Assignee: Subramaniam Krishnan Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, YARN-2080.patch This JIRA tracks the integration of Reservation subsystem data structures introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring of YARN-1051. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1712) Admission Control: plan follower
[ https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1712: --- Attachment: YARN-1712.3.patch Thanks [~jianhe] for your detailed feedback. I am attaching a patch with the following updates: * Made move apps logic synchronous and move is to defReservationQueue (renamed) * Removed the synchronized on scheduler as individual calls are already synchronized * Fixed comment formatting and variable names * Created a common method to calculate lhsRes and rhsRes * Optimized the loop as suggested Some clarifications: * Exceptions are suppressed deliberately as PlanFollower is a background timer thread and we don't want it to exit * _plan.getReservationsAtTime(now)_ is used by others like Replanners. We need the reservations and not just the names even in PlanFollower so leaving it as is * Tried moving the default queue creating to when PlanQueue is initialized in CapacityScheduler but it was getting overly complex mainly due to the relaxed constraint of child capacities =100% for PlanQueues. This is just an additional hashmap lookup with the code being much cleaner so not moving it for now. If it is still a concern, I can add a flag to Plan and check that instead of CapacityScheduler#getQueue Admission Control: plan follower Key: YARN-1712 URL: https://issues.apache.org/jira/browse/YARN-1712 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Labels: reservations, scheduler Attachments: YARN-1712.1.patch, YARN-1712.2.patch, YARN-1712.3.patch, YARN-1712.patch This JIRA tracks a thread that continuously propagates the current state of an inventory subsystem to the scheduler. As the inventory subsystem store the plan of how the resources should be subdivided, the work we propose in this JIRA realizes such plan by dynamically instructing the CapacityScheduler to add/remove/resize queues to follow the plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
[ https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1711: --- Attachment: YARN-1712.3.patch Updated patch to include *CapacitySchedulerConfiguration* based on by [~vinodkv]'s [suggestion | https://issues.apache.org/jira/browse/YARN-2080?focusedCommentId=14125994] as the _majority_ of the configurations or for enforcement policies CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709 -- Key: YARN-1711 URL: https://issues.apache.org/jira/browse/YARN-1711 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino Assignee: Carlo Curino Labels: reservations Attachments: YARN-1711.1.patch, YARN-1711.patch, YARN-1712.3.patch This JIRA tracks the development of a policy that enforces user quotas (a time-extension of the notion of capacity) in the inventory subsystem discussed in YARN-1709. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
[ https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1711: --- Attachment: (was: YARN-1712.3.patch) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709 -- Key: YARN-1711 URL: https://issues.apache.org/jira/browse/YARN-1711 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino Assignee: Carlo Curino Labels: reservations Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.patch This JIRA tracks the development of a policy that enforces user quotas (a time-extension of the notion of capacity) in the inventory subsystem discussed in YARN-1709. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
[ https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1711: --- Attachment: YARN-1711.2.patch CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709 -- Key: YARN-1711 URL: https://issues.apache.org/jira/browse/YARN-1711 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino Assignee: Carlo Curino Labels: reservations Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.patch This JIRA tracks the development of a policy that enforces user quotas (a time-extension of the notion of capacity) in the inventory subsystem discussed in YARN-1709. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1709) Admission Control: Reservation subsystem
[ https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1709: --- Attachment: YARN-1709.patch Thanks [~chris.douglas] for your exhaustive review. I am uploading a patch that has the following fixes: * Cloned _ZERO_RESOURCE_, _minimumAllocation_ and _maximumAllocation_ to prevent leaking of mutable data * Removed MessageFormat. Have to concat strings in few cases where they are both logged and included as part of exception message * Fixed the code readability and lock scope in _addReservation()_ * Added assertions for _isWriteLockedByCurrentThread()_ in private methods that assume locks * Removed redundant _this_ in get methods * toString uses StringBuilder instead of StringBuffer now * Fixed Javadoc - content (_getEarliestStartTime()_) and whitespaces * Made _ReservationInterval_ immutable, good catch The ReservationSystem uses UTCClock (added as part of YARN-1708) to enforce UTC times. Admission Control: Reservation subsystem Key: YARN-1709 URL: https://issues.apache.org/jira/browse/YARN-1709 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch This JIRA is about the key data structure used to track resources over time to enable YARN-1051. The Reservation subsystem is conceptually a plan of how the scheduler will allocate resources over-time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1712) Admission Control: plan follower
[ https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1712: --- Attachment: YARN-1712.2.patch [~leftnoteasy] , good to hear that you got the full context. Thanks for reviewing the patch. I am uploading a new patch that has the following changes: * Fix the Log message. * Replace stale references to sessions with reservations, good catch. The currentReservations might have new reservations which just start now so were not active before. These will not yet have corresponding reservation queues in CapacityScheduler as we create them after sorting. This is done to ensure the what you highlighted earlier - we never exceed total capacity. Admission Control: plan follower Key: YARN-1712 URL: https://issues.apache.org/jira/browse/YARN-1712 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Labels: reservations, scheduler Attachments: YARN-1712.1.patch, YARN-1712.2.patch, YARN-1712.patch This JIRA tracks a thread that continuously propagates the current state of an inventory subsystem to the scheduler. As the inventory subsystem store the plan of how the resources should be subdivided, the work we propose in this JIRA realizes such plan by dynamically instructing the CapacityScheduler to add/remove/resize queues to follow the plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1707: --- Attachment: YARN-1707.9.patch Uploading a new patch with a minor change. Renamed ReservationQueue#changeCapacity to ReservationQueue#setEntitlement for consistency. Making the CapacityScheduler more dynamic - Key: YARN-1707 URL: https://issues.apache.org/jira/browse/YARN-1707 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.6.patch, YARN-1707.7.patch, YARN-1707.8.patch, YARN-1707.9.patch, YARN-1707.patch The CapacityScheduler is a rather static at the moment, and refreshqueue provides a rather heavy-handed way to reconfigure it. Moving towards long-running services (tracked in YARN-896) and to enable more advanced admission control and resource parcelling we need to make the CapacityScheduler more dynamic. This is instrumental to the umbrella jira YARN-1051. Concretely this require the following changes: * create queues dynamically * destroy queues dynamically * dynamically change queue parameters (e.g., capacity) * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% instead of ==100% We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1708) Add a public API to reserve resources (part of YARN-1051)
[ https://issues.apache.org/jira/browse/YARN-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1708: --- Attachment: YARN-1708.patch Thanks [~vinodkv] for reviewing the patch. I am uploading a new patch that has the following fixes based on your comments: * All the newInstance methods and setters in the Reservation*Response objects should be marked as private. * Replaced hashCode with IDE generated one in ReservationId * Renamed ReservationRequests.{set|get}Type - {set|get}Interpretor, also in ReservationRequestsProto.type. * Renamed ReservationRequest.leaseDuration to be simply duration to make it consistent with ReservationRequestProto.duration Add a public API to reserve resources (part of YARN-1051) - Key: YARN-1708 URL: https://issues.apache.org/jira/browse/YARN-1708 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan Attachments: YARN-1708.patch, YARN-1708.patch, YARN-1708.patch This JIRA tracks the definition of a new public API for YARN, which allows users to reserve resources (think of time-bounded queues). This is part of the admission control enhancement proposed in YARN-1051. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122234#comment-14122234 ] Subramaniam Krishnan commented on YARN-1707: Thanks [~jianhe] and [~leftnoteasy] for taking the time to do a thorough review. I am proxying for [~curino] also as he did most of the work for the patch. As discussed we will commit this to YARN-1051 branch once we have +1s for few other sub-JIRAs. Making the CapacityScheduler more dynamic - Key: YARN-1707 URL: https://issues.apache.org/jira/browse/YARN-1707 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.6.patch, YARN-1707.7.patch, YARN-1707.8.patch, YARN-1707.9.patch, YARN-1707.patch The CapacityScheduler is a rather static at the moment, and refreshqueue provides a rather heavy-handed way to reconfigure it. Moving towards long-running services (tracked in YARN-896) and to enable more advanced admission control and resource parcelling we need to make the CapacityScheduler more dynamic. This is instrumental to the umbrella jira YARN-1051. Concretely this require the following changes: * create queues dynamically * destroy queues dynamically * dynamically change queue parameters (e.g., capacity) * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% instead of ==100% We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1707: --- Attachment: YARN-1707.7.patch Thanks [~jianhe] for your comments. I am updating a patch that has the following fixes: * renamed dyQConf/sesConf to entitlement * userLimit be reinitialized in ReservationQueue/PlanQueue * Indendation fixed * Renamed SchedulerConfigEditException to SchedulerDynamicEditException * Consistently used showReservationsAsQueues for both method as well as the flag The newly parsed queues will have the maxApps* as CapacityScheduler#reinitialize() invokes parseQueues() which is where they are updated. Making the CapacityScheduler more dynamic - Key: YARN-1707 URL: https://issues.apache.org/jira/browse/YARN-1707 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.6.patch, YARN-1707.7.patch, YARN-1707.patch The CapacityScheduler is a rather static at the moment, and refreshqueue provides a rather heavy-handed way to reconfigure it. Moving towards long-running services (tracked in YARN-896) and to enable more advanced admission control and resource parcelling we need to make the CapacityScheduler more dynamic. This is instrumental to the umbrella jira YARN-1051. Concretely this require the following changes: * create queues dynamically * destroy queues dynamically * dynamically change queue parameters (e.g., capacity) * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% instead of ==100% We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1712) Admission Control: plan follower
[ https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120729#comment-14120729 ] Subramaniam Krishnan commented on YARN-1712: Thanks [~leftnoteasy] for taking a look at the patch. Since [~curino] is traveling, I'll try to answer your questions. Your understanding is very close as your steps of 1-7 are correct. There is slight context which is missing that I will try to explain. bq. Question: Why not do 2) after 4)? Is it better to do shrink after excluded expired reservations? Shrinking might be required as reservations are absolute while queues express relative (% of cluster) capacity. We need to shrink first as shrinking might result in additional expired reservations. The expired reservations are determined as those reservations that exist in the scheduler but are not currently active in the Plan (post shrinking if required). I should add that shrinking is a rare exception case when we loose large chuks of cluster capacity. bq. 6) Sort all reservations, from less to more satisfied, and set their new entitlement. bq. Question: Is it possible totalAssignedCapacity 1? Could you please explain how to avoid it happen? We sort all reservations based on what was promised at this moment of time. That can vary because we support skylines for reservations, i.e. varied resource requirements over time. This is required to handle DAGs as in the case of Tez, Oozie, Hive or Pig queries as the nodes of the DAG will have different resource needs. This is explained in detail in the tech report we uploaded as part of YARN-1051. The totalAssignedCapacity will never exceed 1 because: 1) We always release all excess capacity before starting to allocate fresh capacity. 2) The reservations themselves are validated before being added to the Plan to ensure that they never exceed (YARN-1709 YARN-1711) the total capacity of the Plan. Like mentioned above, shrinking will handle large transient cluster failures. {quote} One comment is, Current compare and sort reservation is comparing (allocatedResource - guaranteedResource), one feeling at top of my mind is, this may make larger queue can get resource easier than small queue. Is it possible an app can get more resource than other by lying to RM that it needs more resource when fierce competition on resource? {quote} To prevent exactly we do our allocations starting from smallest to largest reservation queue. We enforce sharing policies (YARN-1711) to prevent a single user/app to reserve the entire cluster resources or cause starvation by hoarding resources. Hope this clarifies the logic. Feel free to revert if you have any further questions. Admission Control: plan follower Key: YARN-1712 URL: https://issues.apache.org/jira/browse/YARN-1712 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Labels: reservations, scheduler Attachments: YARN-1712.1.patch, YARN-1712.patch This JIRA tracks a thread that continuously propagates the current state of an inventory subsystem to the scheduler. As the inventory subsystem store the plan of how the resources should be subdivided, the work we propose in this JIRA realizes such plan by dynamically instructing the CapacityScheduler to add/remove/resize queues to follow the plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1707: --- Attachment: YARN-1707.8.patch Thanks [~jianhe] for your insights. I am uploading a new patch that has the following fixes: * Removed display name * Reverted unnecessary visibility change null check * Pass QueueEnititlement to changeCapacity() * Handling move to Plan Queue, including unit test case I have not removed YarnException from setEntitlement as it is thrown getAndCheckLeafQueue() Making the CapacityScheduler more dynamic - Key: YARN-1707 URL: https://issues.apache.org/jira/browse/YARN-1707 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.6.patch, YARN-1707.7.patch, YARN-1707.8.patch, YARN-1707.patch The CapacityScheduler is a rather static at the moment, and refreshqueue provides a rather heavy-handed way to reconfigure it. Moving towards long-running services (tracked in YARN-896) and to enable more advanced admission control and resource parcelling we need to make the CapacityScheduler more dynamic. This is instrumental to the umbrella jira YARN-1051. Concretely this require the following changes: * create queues dynamically * destroy queues dynamically * dynamically change queue parameters (e.g., capacity) * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% instead of ==100% We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1707: --- Attachment: YARN-1707.6.patch Thanks [~jianhe] for your feedback. I am uploading a new patch that addresses your comments. Making the CapacityScheduler more dynamic - Key: YARN-1707 URL: https://issues.apache.org/jira/browse/YARN-1707 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.6.patch, YARN-1707.patch The CapacityScheduler is a rather static at the moment, and refreshqueue provides a rather heavy-handed way to reconfigure it. Moving towards long-running services (tracked in YARN-896) and to enable more advanced admission control and resource parcelling we need to make the CapacityScheduler more dynamic. This is instrumental to the umbrella jira YARN-1051. Concretely this require the following changes: * create queues dynamically * destroy queues dynamically * dynamically change queue parameters (e.g., capacity) * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% instead of ==100% We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1709) Admission Control: Reservation subsystem
[ https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1709: --- Attachment: YARN-1709.patch Updating the patch as result of API changes based on [~vinodkv] [feedback |https://issues.apache.org/jira/browse/YARN-1708?focusedCommentId=14112669] on YARN-1708. Admission Control: Reservation subsystem Key: YARN-1709 URL: https://issues.apache.org/jira/browse/YARN-1709 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch This JIRA is about the key data structure used to track resources over time to enable YARN-1051. The Reservation subsystem is conceptually a plan of how the scheduler will allocate resources over-time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager
[ https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-2080: --- Attachment: YARN-2080.patch Uploading a new patch that adds a scheduler agnostic AbstractReservationSystem which is extended by the CapacityReservationSystem scheduler configuration as suggested by [~kasha]. CapacityReservationSystem essentially just loads configs from capacity scheduler xml. Attempted to converge this with Fair Scheduler as part of YARN-2386 but figured that it was not feasible. It has also minor changes as a result of API changes based on [~vinodkv] [feedback | https://issues.apache.org/jira/browse/YARN-1708?focusedCommentId=14112669] on YARN-1708. Admission Control: Integrate Reservation subsystem with ResourceManager --- Key: YARN-2080 URL: https://issues.apache.org/jira/browse/YARN-2080 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Subramaniam Krishnan Assignee: Subramaniam Krishnan Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch This JIRA tracks the integration of Reservation subsystem data structures introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring of YARN-1051. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager
[ https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115860#comment-14115860 ] Subramaniam Krishnan commented on YARN-2080: Typo in previous comment. Read it as: Uploading a new patch that adds a scheduler agnostic AbstractReservationSystem which is extended by the CapacityReservationSystem for capacity scheduler as suggested by [~kasha]. CapacityReservationSystem essentially just loads configs from capacity scheduler xml. Attempted to converge this with Fair Scheduler as part of YARN-2386 but figured that it was not feasible. It has also minor changes as a result of API changes based on [~vinodkv] [feedback | https://issues.apache.org/jira/browse/YARN-1708?focusedCommentId=14112669] on YARN-1708. Admission Control: Integrate Reservation subsystem with ResourceManager --- Key: YARN-2080 URL: https://issues.apache.org/jira/browse/YARN-2080 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Subramaniam Krishnan Assignee: Subramaniam Krishnan Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch This JIRA tracks the integration of Reservation subsystem data structures introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring of YARN-1051. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2385) Consider splitting getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue
[ https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115863#comment-14115863 ] Subramaniam Krishnan commented on YARN-2385: Thanks [~sunilg] for verifying. I am fine either ways, i.e. if you want to take up the splitting now or later as currently we have ensured that the behavior of CS FS are consistent for _getAppsInQueue_. [~leftnoteasy], [~zjshen] what do you guys feel? Consider splitting getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue -- Key: YARN-2385 URL: https://issues.apache.org/jira/browse/YARN-2385 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler Reporter: Subramaniam Krishnan Labels: abstractyarnscheduler Currently getAppsinQueue returns both pending running apps. The purpose of the JIRA is to explore splitting it to getRunningAppsInQueue + getPendingAppsInQueue that will provide more flexibility to callers -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1708) Add a public API to reserve resources (part of YARN-1051)
[ https://issues.apache.org/jira/browse/YARN-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1708: --- Attachment: YARN-1708.patch Thanks [~vinodkv] for taking the time to review and for the follow up discussions. I am uploading a new API patch based on the consensus we reached. The summary of the changes are: - Make all proto fields as optional, with default and add server side code to check for required fields. - Rename ReservationCreateRequestProto - ReservationSubmissionRequestProto - Rename ReservationDescriptionProto - ReservationRequestsProto - Rename ReservationResourceRequestProto - ReservationRequestsProto - Added a new ReservationRequestProto which will be specifically to specify resources to reserve instead of reusing ResourceRequestProto as currently reservations does not use locality constraints. In future we see convergence of both. - Rename ReservationDescriptionInterpreterProto - ReservationRequestInterpreterProto. Added examples for each reservation type in Javadoc. - ReservationHandle is not needed. - Add ReservationIdProto: ClusterTimeStamp + long id : Similar to appIDs - Rename ReservationCreateResponseProto - ReservationSubmissionResponseProto - ReservationUpdateRequestProto: No need to pass queue-name -- Instead should specify ReservationID and the effect will be to replace the existing reservation with new one. - ReservationUpdateResponseProto: Can just be empty - Add a ReservationDeleteRequestProto with ReservationId which will be deleted - ReservationDeleteResponseProto: again can just be empty Add a public API to reserve resources (part of YARN-1051) - Key: YARN-1708 URL: https://issues.apache.org/jira/browse/YARN-1708 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan Attachments: YARN-1708.patch, YARN-1708.patch This JIRA tracks the definition of a new public API for YARN, which allows users to reserve resources (think of time-bounded queues). This is part of the admission control enhancement proposed in YARN-1051. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.
[ https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan reassigned YARN-1051: -- Assignee: Subramaniam Krishnan (was: Carlo Curino) YARN Admission Control/Planner: enhancing the resource allocation model with time. -- Key: YARN-1051 URL: https://issues.apache.org/jira/browse/YARN-1051 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager, scheduler Reporter: Carlo Curino Assignee: Subramaniam Krishnan Attachments: YARN-1051-design.pdf, curino_MSR-TR-2013-108.pdf, techreport.pdf In this umbrella JIRA we propose to extend the YARN RM to handle time explicitly, allowing users to reserve capacity over time. This is an important step towards SLAs, long-running services, workflows, and helps for gang scheduling. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2385) Consider splitting getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue
[ https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14111737#comment-14111737 ] Subramaniam Krishnan commented on YARN-2385: [~sunilg], the behavior of *getAppsInQueue* should be same for both CS FS unless I am missing something. As part of YARN-2378, I added pending apps also to CS#getAppsInQueue. Consider splitting getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue -- Key: YARN-2385 URL: https://issues.apache.org/jira/browse/YARN-2385 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler Reporter: Subramaniam Krishnan Labels: abstractyarnscheduler Currently getAppsinQueue returns both pending running apps. The purpose of the JIRA is to explore splitting it to getRunningAppsInQueue + getPendingAppsInQueue that will provide more flexibility to callers -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2385) Consider splitting getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue
[ https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-2385: --- Summary: Consider splitting getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue (was: Adding support for listing all applications in a queue) Consider splitting getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue -- Key: YARN-2385 URL: https://issues.apache.org/jira/browse/YARN-2385 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler Reporter: Subramaniam Krishnan Labels: abstractyarnscheduler This JIRA proposes adding a method in AbstractYarnScheduler to get all the pending/active applications. Fair scheduler already supports moving a single application from one queue to another. Support for the same is being added to Capacity Scheduler as part of YARN-2378 and YARN-2248. So with the addition of this method, we can transparently add support for moving all applications from source queue to target queue and draining a queue, i.e. killing all applications in a queue as proposed by YARN-2389 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2385) Consider splitting getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue
[ https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-2385: --- Description: Currently getAppsinQueue returns both pending running apps. The purpose of the JIRA is to explore splitting it to getRunningAppsInQueue + getPendingAppsInQueue that will provide more flexibility to callers (was: This JIRA proposes adding a method in AbstractYarnScheduler to get all the pending/active applications. Fair scheduler already supports moving a single application from one queue to another. Support for the same is being added to Capacity Scheduler as part of YARN-2378 and YARN-2248. So with the addition of this method, we can transparently add support for moving all applications from source queue to target queue and draining a queue, i.e. killing all applications in a queue as proposed by YARN-2389) Consider splitting getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue -- Key: YARN-2385 URL: https://issues.apache.org/jira/browse/YARN-2385 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler Reporter: Subramaniam Krishnan Labels: abstractyarnscheduler Currently getAppsinQueue returns both pending running apps. The purpose of the JIRA is to explore splitting it to getRunningAppsInQueue + getPendingAppsInQueue that will provide more flexibility to callers -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2385) Adding support for listing all applications in a queue
[ https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101034#comment-14101034 ] Subramaniam Krishnan commented on YARN-2385: [~sunilg], [~leftnoteasy], [~zjshen] I suggest we either open a new JIRA to discuss splitting of getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue or update the current JIRA to reflect the discussion? Adding support for listing all applications in a queue -- Key: YARN-2385 URL: https://issues.apache.org/jira/browse/YARN-2385 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler Reporter: Subramaniam Krishnan Assignee: Karthik Kambatla Labels: abstractyarnscheduler This JIRA proposes adding a method in AbstractYarnScheduler to get all the pending/active applications. Fair scheduler already supports moving a single application from one queue to another. Support for the same is being added to Capacity Scheduler as part of YARN-2378 and YARN-2248. So with the addition of this method, we can transparently add support for moving all applications from source queue to target queue and draining a queue, i.e. killing all applications in a queue as proposed by YARN-2389 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2385) Adding support for listing all applications in a queue
[ https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-2385: --- Assignee: (was: Karthik Kambatla) Adding support for listing all applications in a queue -- Key: YARN-2385 URL: https://issues.apache.org/jira/browse/YARN-2385 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler Reporter: Subramaniam Krishnan Labels: abstractyarnscheduler This JIRA proposes adding a method in AbstractYarnScheduler to get all the pending/active applications. Fair scheduler already supports moving a single application from one queue to another. Support for the same is being added to Capacity Scheduler as part of YARN-2378 and YARN-2248. So with the addition of this method, we can transparently add support for moving all applications from source queue to target queue and draining a queue, i.e. killing all applications in a queue as proposed by YARN-2389 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-2386) Refactor common scheduler configurations into a base ResourceSchedulerConfig class
[ https://issues.apache.org/jira/browse/YARN-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan resolved YARN-2386. Resolution: Invalid Took a look into both the scheduler configs and unfortunately the configurations are so disparate that there isn't much common to refactor out. Refactor common scheduler configurations into a base ResourceSchedulerConfig class -- Key: YARN-2386 URL: https://issues.apache.org/jira/browse/YARN-2386 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Subramaniam Krishnan Assignee: Subramaniam Krishnan As discussed with [~leftnoteasy], [~jianhe] and [~kasha], this JIRA proposes refactoring common configuration from Capacity Fair scheduler to a common base class to avoid duplicating configs. Currently Capacity Fair scheduler directly extend configuration and adding a common base Resource scheduler config class would also align with the Resource Scheduler hierarchy and enable other systems like reservation system (YARN-2080) to be scheduler implementation agnostic. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.
[ https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014565#comment-14014565 ] Subramaniam Krishnan commented on YARN-1051: We have posted patches for YARN-1709 and YARN-2080, looking for feedback. YARN Admission Control/Planner: enhancing the resource allocation model with time. -- Key: YARN-1051 URL: https://issues.apache.org/jira/browse/YARN-1051 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager, scheduler Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-1051-design.pdf, curino_MSR-TR-2013-108.pdf, techreport.pdf In this umbrella JIRA we propose to extend the YARN RM to handle time explicitly, allowing users to reserve capacity over time. This is an important step towards SLAs, long-running services, workflows, and helps for gang scheduling. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager
[ https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-2080: --- Attachment: YARN-2080.patch Attaching a patch file that wires the reservation APIs into existing YARN APIs. It introduces a new component *ReservationSystem* that essentially manages all the _Plans_ (#YARN-1709) configured in the ResourceSchedulers. The ReservationSystem is bootstrapped by ResourceManager if it is enabled in configuration. The ClientRMService has implementation of the reservation APIs which are additionally exposed via the YarnClient. Admission Control: Integrate Reservation subsystem with ResourceManager --- Key: YARN-2080 URL: https://issues.apache.org/jira/browse/YARN-2080 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Subramaniam Krishnan Assignee: Subramaniam Krishnan Attachments: YARN-2080.patch This JIRA tracks the integration of Reservation subsystem data structures introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring of YARN-1051. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1709) Admission Control: Reservation subsystem
[ https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1709: --- Attachment: YARN-1709.patch The attached patch contains in-memory data structures to track reservations over time: * _Plan_ : It is the central data structure of a reservation system that maintains the agenda for the cluster i.e. how client reservations that have been accepted so far will be honoured. * _ReservationAllocation_ : It represents a concrete instance of resources allocated over time to satisfy a single client reservation request. * _RLESparseResourceAllocation_ It is a run length encoded sparse data structure that maintains cumulative resource allocations over time. Admission Control: Reservation subsystem Key: YARN-1709 URL: https://issues.apache.org/jira/browse/YARN-1709 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan Attachments: YARN-1709.patch This JIRA is about the key data structure used to track resources over time to enable YARN-1051. The Reservation subsystem is conceptually a plan of how the scheduler will allocate resources over-time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1709) Admission Control: Reservation subsystem
[ https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1709: --- Description: This JIRA is about the key data structure used to track resources over time to enable YARN-1051. The Reservation subsystem is conceptually a plan of how the scheduler will allocate resources over-time. (was: This JIRA is about the key data structure used to track resources over time to enable YARN-1051. The inventory subsystem is conceptually a plan of how the capacity scheduler will be configured over-time.) Summary: Admission Control: Reservation subsystem (was: Admission Control: inventory subsystem) Admission Control: Reservation subsystem Key: YARN-1709 URL: https://issues.apache.org/jira/browse/YARN-1709 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan This JIRA is about the key data structure used to track resources over time to enable YARN-1051. The Reservation subsystem is conceptually a plan of how the scheduler will allocate resources over-time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager
Subramaniam Krishnan created YARN-2080: -- Summary: Admission Control: Integrate Reservation subsystem with ResourceManager Key: YARN-2080 URL: https://issues.apache.org/jira/browse/YARN-2080 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Subramaniam Krishnan Assignee: Subramaniam Krishnan This JIRA is about the key data structure used to track resources over time to enable YARN-1051. The Reservation subsystem is conceptually a plan of how the scheduler will allocate resources over-time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager
[ https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-2080: --- Description: This JIRA tracks the integration of Reservation subsystem data structures introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring of YARN-1051. (was: This JIRA is about the key data structure used to track resources over time to enable YARN-1051. The Reservation subsystem is conceptually a plan of how the scheduler will allocate resources over-time.) Admission Control: Integrate Reservation subsystem with ResourceManager --- Key: YARN-2080 URL: https://issues.apache.org/jira/browse/YARN-2080 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Subramaniam Krishnan Assignee: Subramaniam Krishnan This JIRA tracks the integration of Reservation subsystem data structures introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring of YARN-1051. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1708) Add a public API to reserve resources (part of YARN-1051)
[ https://issues.apache.org/jira/browse/YARN-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1708: --- Attachment: YARN-1708.patch Add a public API to reserve resources (part of YARN-1051) - Key: YARN-1708 URL: https://issues.apache.org/jira/browse/YARN-1708 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan Attachments: YARN-1708.patch This JIRA tracks the definition of a new public API for YARN, which allows users to reserve resources (think of time-bounded queues). This is part of the admission control enhancement proposed in YARN-1051. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1708) Add a public API to reserve resources (part of YARN-1051)
[ https://issues.apache.org/jira/browse/YARN-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990239#comment-13990239 ] Subramaniam Krishnan commented on YARN-1708: Attaching the patch Add a public API to reserve resources (part of YARN-1051) - Key: YARN-1708 URL: https://issues.apache.org/jira/browse/YARN-1708 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan Attachments: YARN-1708.patch This JIRA tracks the definition of a new public API for YARN, which allows users to reserve resources (think of time-bounded queues). This is part of the admission control enhancement proposed in YARN-1051. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.
[ https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1051: --- Attachment: techreport.pdf Attaching an updated Tech Report which enunciates more clearly what we intend to achieve, results from our P-o-C and also aligns with the design doc on how we propose to implement the same in YARN. YARN Admission Control/Planner: enhancing the resource allocation model with time. -- Key: YARN-1051 URL: https://issues.apache.org/jira/browse/YARN-1051 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager, scheduler Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-1051-design.pdf, curino_MSR-TR-2013-108.pdf, techreport.pdf In this umbrella JIRA we propose to extend the YARN RM to handle time explicitly, allowing users to reserve capacity over time. This is an important step towards SLAs, long-running services, workflows, and helps for gang scheduling. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1709) Admission Control: inventory subsystem
[ https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan reassigned YARN-1709: -- Assignee: Subramaniam Krishnan Admission Control: inventory subsystem -- Key: YARN-1709 URL: https://issues.apache.org/jira/browse/YARN-1709 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan This JIRA is about the key data structure used to track resources over time to enable YARN-1051. The inventory subsystem is conceptually a plan of how the capacity scheduler will be configured over-time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (YARN-1710) Admission Control: agents to allocate reservation
[ https://issues.apache.org/jira/browse/YARN-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan reassigned YARN-1710: -- Assignee: Subramaniam Krishnan Admission Control: agents to allocate reservation - Key: YARN-1710 URL: https://issues.apache.org/jira/browse/YARN-1710 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan This JIRA tracks the algorithms used to allocate a user ReservationRequest coming in from the new reservation API (YARN-1708), in the inventory subsystem (YARN-1709) maintaining the current plan for the cluster. The focus of this agents is to quickly find a solution for the set of contraints provided by the user, and the physical constraints of the plan. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.
[ https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1051: --- Attachment: YARN-1051-design.pdf Attaching the approach doc that describes the overall intent for interested readers. The doc also lists the breakdown into incremental sub-tasks.Any suggestions/thoughts are welcome, we will incorporate feedback as it comes in. YARN Admission Control/Planner: enhancing the resource allocation model with time. -- Key: YARN-1051 URL: https://issues.apache.org/jira/browse/YARN-1051 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager, scheduler Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-1051-design.pdf, curino_MSR-TR-2013-108.pdf In this umbrella JIRA we propose to extend the YARN RM to handle time explicitly, allowing users to reserve capacity over time. This is an important step towards SLAs, long-running services, workflows, and helps for gang scheduling. -- This message was sent by Atlassian JIRA (v6.1.5#6160)