[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-569: --- Attachment: YARN-569.3.patch CapacityScheduler: support for preemption (using a capacity monitor) Key: YARN-569 URL: https://issues.apache.org/jira/browse/YARN-569 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, preemption.2.patch, YARN-569.1.patch, YARN-569.2.patch, YARN-569.3.patch, YARN-569.patch, YARN-569.patch There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to applications resource requests, and node updates, and the more introspective, time-based considerations needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate mechanisms of the CapacityScheduler directly to add support for preemption by means of a Capacity Monitor, which can be run optionally as a separate service (much like the NMLivelinessMonitor). The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals (e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, performs off-line computation to determine if preemption is needed, and how best to edit the current schedule to improve capacity, and generates events that produce four possible actions: # Container de-reservations # Resource-based preemptions # Container-based preemptions # Container killing The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. Note that due to the lag in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers from a queue) and not trying to tightly and consistently micromanage container allocations. - Preemption policy (ProportionalCapacityPreemptionPolicy): - Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows: # it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*) # if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**) # it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and respecting bounds on the amount of preemption we allow for each round) # it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order) # it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits # (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left, # (if not enough) it moves onto unreserve and preempt from the next application. # containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed. Notes: (*) at the moment, in order to avoid double-counting of the requests, we only look at the ANY part of pending resource requests, which means we might not preempt on behalf of AMs that ask only for specific locations but not any. (**) The ideal balance state is one in which each queue has at least its guaranteed capacity, and the spare capacity is distributed among queues (that wants some) as a weighted fair share. Where the weighting is based on the guaranteed capacity of a queue, and the function runs to a fix point. Tunables of the ProportionalCapacityPreemptionPolicy: # observe-only mode (i.e., log the actions it would take, but behave as read-only) # how frequently to run the policy # how long to wait between preemption and kill of a container # which fraction of the containers I would like to obtain should I preempt (has to do with the natural rate at which containers are returned) # deadzone size, i.e., what % of over-capacity should I ignore (if we are off perfect balance by some small % we ignore it) # overall amount of
[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-569: --- Attachment: YARN-569.4.patch Rebase after YARN-635, YARN-735, YARN-748, YARN-749. Fixed findbugs warnings. CapacityScheduler: support for preemption (using a capacity monitor) Key: YARN-569 URL: https://issues.apache.org/jira/browse/YARN-569 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, preemption.2.patch, YARN-569.1.patch, YARN-569.2.patch, YARN-569.3.patch, YARN-569.4.patch, YARN-569.patch, YARN-569.patch There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to applications resource requests, and node updates, and the more introspective, time-based considerations needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate mechanisms of the CapacityScheduler directly to add support for preemption by means of a Capacity Monitor, which can be run optionally as a separate service (much like the NMLivelinessMonitor). The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals (e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, performs off-line computation to determine if preemption is needed, and how best to edit the current schedule to improve capacity, and generates events that produce four possible actions: # Container de-reservations # Resource-based preemptions # Container-based preemptions # Container killing The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. Note that due to the lag in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers from a queue) and not trying to tightly and consistently micromanage container allocations. - Preemption policy (ProportionalCapacityPreemptionPolicy): - Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows: # it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*) # if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**) # it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and respecting bounds on the amount of preemption we allow for each round) # it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order) # it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits # (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left, # (if not enough) it moves onto unreserve and preempt from the next application. # containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed. Notes: (*) at the moment, in order to avoid double-counting of the requests, we only look at the ANY part of pending resource requests, which means we might not preempt on behalf of AMs that ask only for specific locations but not any. (**) The ideal balance state is one in which each queue has at least its guaranteed capacity, and the spare capacity is distributed among queues (that wants some) as a weighted fair share. Where the weighting is based on the guaranteed capacity of a queue, and the function runs to a fix point. Tunables of the ProportionalCapacityPreemptionPolicy: # observe-only mode (i.e., log the actions it would take, but behave as read-only) # how frequently to run the policy # how long to wait between preemption and kill of a container # which fraction of the containers I would like to obtain should I preempt (has to do with the natural rate at which containers are returned) # deadzone size, i.e., what % of
[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-569: --- Attachment: YARN-569.6.patch CapacityScheduler: support for preemption (using a capacity monitor) Key: YARN-569 URL: https://issues.apache.org/jira/browse/YARN-569 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, preemption.2.patch, YARN-569.1.patch, YARN-569.2.patch, YARN-569.3.patch, YARN-569.4.patch, YARN-569.5.patch, YARN-569.6.patch, YARN-569.patch, YARN-569.patch There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to applications resource requests, and node updates, and the more introspective, time-based considerations needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate mechanisms of the CapacityScheduler directly to add support for preemption by means of a Capacity Monitor, which can be run optionally as a separate service (much like the NMLivelinessMonitor). The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals (e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, performs off-line computation to determine if preemption is needed, and how best to edit the current schedule to improve capacity, and generates events that produce four possible actions: # Container de-reservations # Resource-based preemptions # Container-based preemptions # Container killing The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. Note that due to the lag in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers from a queue) and not trying to tightly and consistently micromanage container allocations. - Preemption policy (ProportionalCapacityPreemptionPolicy): - Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows: # it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*) # if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**) # it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and respecting bounds on the amount of preemption we allow for each round) # it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order) # it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits # (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left, # (if not enough) it moves onto unreserve and preempt from the next application. # containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed. Notes: (*) at the moment, in order to avoid double-counting of the requests, we only look at the ANY part of pending resource requests, which means we might not preempt on behalf of AMs that ask only for specific locations but not any. (**) The ideal balance state is one in which each queue has at least its guaranteed capacity, and the spare capacity is distributed among queues (that wants some) as a weighted fair share. Where the weighting is based on the guaranteed capacity of a queue, and the function runs to a fix point. Tunables of the ProportionalCapacityPreemptionPolicy: # observe-only mode (i.e., log the actions it would take, but behave as read-only) # how frequently to run the policy # how long to wait between preemption and kill of a container # which fraction of the containers I would like to obtain should I preempt (has to do with the natural rate at which containers are returned) # deadzone size, i.e., what % of over-capacity should I ignore (if we are off perfect
[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682571#comment-13682571 ] Chris Douglas commented on YARN-569: Thanks for the feedback; we revised the patch. We comment below on questions that required explanation, while all the small ones are addressed directly in the code following your suggestions. bq. This doesnt seem to affect the fair scheduler or does it? If not, then it can be misleading for users. bq. How do we envisage multiple policies working together without stepping on each other? Better off limiting to 1? The intent was for orthogonal policies to interact with the scheduler, or- if conflicting- be coordinated by a composite policy. Though you're right, the naming toward preemption is confusing; the patch renames the properties to refer to monitors, only. Because the only example is the {{ProportionalCapacityPreemptionPolicy}}, {{null}} seemed like the correct default. As for limiting to 1 monitor or not, we are experiencing with other policies that focus on different aspect of the schedule (e.g., deadlines and automatic tuning of queue capacity) and it seems possible to play nice with other policies (e.g., ProportionalCapacityPreemptionPolicy), so we would prefer to have the mechanism to remain capable of loading multiple monitors. bq. Not joining the thread to make sure its cleaned up? The contract for shutting down a monitor is not baked into the API, yet. While the proportional policy runs quickly, it's not obvious whether other policies would be both long running and respond to interrupts. By way of illustration, other monitors we've experimented with call into third party code for CPU-intensive calculation. Since YARN-117 went in a few hours ago, that might be a chance to define that more crisply. Thoughts? bq. Why no lock here when the other new methods have a lock? Do we not care that the app remains in applications during the duration of the operations? The semantics of the {{\@Lock}} annotation were not entirely clear from the examples in the code, so it's possible the inconsistency is our application of it. We're probably making the situation worse, so we omitted the annotations in the updated patch. To answer your question: we don't care, because the selected container already exited (part of the natural termination factor in the policy). bq. There is one critical difference between old and new behavior. The new code will not send the finish event to the container if its not part of the liveContainers. This probably is wrong. bq. FicaSchedulerNode.unreserveResource(). Checks have been added for the reserved container but will the code reach that point if there was no reservation actually left on that node? In the same vein, can it happen that the node has a new reservation that was made out of band of the preemption logic cycle. Hence, the reserved container on the node would exist but could be from a different application. Good catch, these are related. The change to boolean was necessary because we're calling the {{unreserve}} logic from a new context. Since only one application can have a single reservation on a node, and because we're freeing it through that application, we won't accidentally free another application's reservation. However, calling {{unreserve}} on a reservation that converted to a container will fail, so we need to know whether the state changed before updating the metric. bq. Couldnt quite grok this. What is delta? What is 0.5? A percentage? Whats the math behind the calculation? Should it be even absent preemption instead of even absent natural termination? Is this applied before or after TOTAL_PREEMPTION_PER_ROUND? The delta is the difference between the computed ideal capacity and the actual. A value of 0.5 would preempt only 50% of the containers the policy thinks should be preempted, as the rest are expected to exit naturally. The comment is saying that- even without any containers exiting on their own- the policy will geometrically push capacity into the deadzone. In this case, 50% per round, in 5 rounds the policy will be within a 5% deadzone of the ideal capacity. It's applied before the total preemption per round; the latter proportionally affects all preemption targets. Because some containers will complete while the policy runs, it may make sense to tune it aggressively (or affect it with observed completion rates), but we'll want to get some experience running with this. CapacityScheduler: support for preemption (using a capacity monitor) Key: YARN-569 URL: https://issues.apache.org/jira/browse/YARN-569 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter:
[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-569: --- Attachment: YARN-569.8.patch CapacityScheduler: support for preemption (using a capacity monitor) Key: YARN-569 URL: https://issues.apache.org/jira/browse/YARN-569 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, preemption.2.patch, YARN-569.1.patch, YARN-569.2.patch, YARN-569.3.patch, YARN-569.4.patch, YARN-569.5.patch, YARN-569.6.patch, YARN-569.8.patch, YARN-569.patch, YARN-569.patch There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to applications resource requests, and node updates, and the more introspective, time-based considerations needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate mechanisms of the CapacityScheduler directly to add support for preemption by means of a Capacity Monitor, which can be run optionally as a separate service (much like the NMLivelinessMonitor). The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals (e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, performs off-line computation to determine if preemption is needed, and how best to edit the current schedule to improve capacity, and generates events that produce four possible actions: # Container de-reservations # Resource-based preemptions # Container-based preemptions # Container killing The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. Note that due to the lag in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers from a queue) and not trying to tightly and consistently micromanage container allocations. - Preemption policy (ProportionalCapacityPreemptionPolicy): - Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows: # it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*) # if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**) # it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and respecting bounds on the amount of preemption we allow for each round) # it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order) # it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits # (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left, # (if not enough) it moves onto unreserve and preempt from the next application. # containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed. Notes: (*) at the moment, in order to avoid double-counting of the requests, we only look at the ANY part of pending resource requests, which means we might not preempt on behalf of AMs that ask only for specific locations but not any. (**) The ideal balance state is one in which each queue has at least its guaranteed capacity, and the spare capacity is distributed among queues (that wants some) as a weighted fair share. Where the weighting is based on the guaranteed capacity of a queue, and the function runs to a fix point. Tunables of the ProportionalCapacityPreemptionPolicy: # observe-only mode (i.e., log the actions it would take, but behave as read-only) # how frequently to run the policy # how long to wait between preemption and kill of a container # which fraction of the containers I would like to obtain should I preempt (has to do with the natural rate at which containers are returned) # deadzone size, i.e., what % of over-capacity should I ignore (if we are
[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687515#comment-13687515 ] Chris Douglas commented on YARN-569: Updated patch, rebased on YARN-117, etc. On configuration, we didn't include the knobs for the proportional policy, but left it as a default with a warning to look at the config for the policy. Does that seem reasonable? We can add a section on it as part of YARN-650. bq. We are setting values on the allocateresponse after replacing lastResponse in the responseMap. This entire section is guarded by the lastResponse value obtained from this map (questionable effectiveness perhaps but orthogonal). So we should probably be setting everything in the new response (the preemption stuff) before the new response replaces the lastResponse in the responseMap. You're saying the block updating the {{responseMap}} probably belongs just before the return? That makes sense, though I haven't traced it explicitly. CapacityScheduler: support for preemption (using a capacity monitor) Key: YARN-569 URL: https://issues.apache.org/jira/browse/YARN-569 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, preemption.2.patch, YARN-569.1.patch, YARN-569.2.patch, YARN-569.3.patch, YARN-569.4.patch, YARN-569.5.patch, YARN-569.6.patch, YARN-569.8.patch, YARN-569.patch, YARN-569.patch There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to applications resource requests, and node updates, and the more introspective, time-based considerations needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate mechanisms of the CapacityScheduler directly to add support for preemption by means of a Capacity Monitor, which can be run optionally as a separate service (much like the NMLivelinessMonitor). The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals (e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, performs off-line computation to determine if preemption is needed, and how best to edit the current schedule to improve capacity, and generates events that produce four possible actions: # Container de-reservations # Resource-based preemptions # Container-based preemptions # Container killing The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. Note that due to the lag in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers from a queue) and not trying to tightly and consistently micromanage container allocations. - Preemption policy (ProportionalCapacityPreemptionPolicy): - Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows: # it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*) # if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**) # it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and respecting bounds on the amount of preemption we allow for each round) # it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order) # it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits # (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left, # (if not enough) it moves onto unreserve and preempt from the next application. # containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed. Notes: (*) at the moment, in order to avoid double-counting of the requests, we only look at the ANY part of pending resource requests, which means we
[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-569: --- Attachment: YARN-569.9.patch bq. One other thing to check would be if the preemption policy will use refreshed values when the capacity scheduler config is refreshed on the fly. Looks like cloneQueues() will take the absolute used and guaranteed numbers on every clone. So we should be good wrt that. Would be good to check other values the policy looks at. *nod* Right now, the policy rebuilds its view of the scheduler at every pass, but it doesn't refresh its own config parameters. bq. Noticed formatting issues with spaces in the patch. eg. cloneQueues() Did another pass over the patch, fixed up spacing, formatting, and removed obvious whitespace changes. Sorry, did a few of these already, but missed a few. Also moved the check in the {{ApplicationMasterService}} as part of this patch. CapacityScheduler: support for preemption (using a capacity monitor) Key: YARN-569 URL: https://issues.apache.org/jira/browse/YARN-569 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, preemption.2.patch, YARN-569.1.patch, YARN-569.2.patch, YARN-569.3.patch, YARN-569.4.patch, YARN-569.5.patch, YARN-569.6.patch, YARN-569.8.patch, YARN-569.9.patch, YARN-569.patch, YARN-569.patch There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to applications resource requests, and node updates, and the more introspective, time-based considerations needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate mechanisms of the CapacityScheduler directly to add support for preemption by means of a Capacity Monitor, which can be run optionally as a separate service (much like the NMLivelinessMonitor). The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals (e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, performs off-line computation to determine if preemption is needed, and how best to edit the current schedule to improve capacity, and generates events that produce four possible actions: # Container de-reservations # Resource-based preemptions # Container-based preemptions # Container killing The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. Note that due to the lag in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers from a queue) and not trying to tightly and consistently micromanage container allocations. - Preemption policy (ProportionalCapacityPreemptionPolicy): - Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows: # it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*) # if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**) # it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and respecting bounds on the amount of preemption we allow for each round) # it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order) # it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits # (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left, # (if not enough) it moves onto unreserve and preempt from the next application. # containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed. Notes: (*) at the moment, in order to avoid double-counting of the requests, we only look at the ANY part of pending resource requests, which means we might not preempt on behalf of AMs that ask only
[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-569: --- Attachment: YARN-569.10.patch CapacityScheduler: support for preemption (using a capacity monitor) Key: YARN-569 URL: https://issues.apache.org/jira/browse/YARN-569 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, preemption.2.patch, YARN-569.10.patch, YARN-569.1.patch, YARN-569.2.patch, YARN-569.3.patch, YARN-569.4.patch, YARN-569.5.patch, YARN-569.6.patch, YARN-569.8.patch, YARN-569.9.patch, YARN-569.patch, YARN-569.patch There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to applications resource requests, and node updates, and the more introspective, time-based considerations needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate mechanisms of the CapacityScheduler directly to add support for preemption by means of a Capacity Monitor, which can be run optionally as a separate service (much like the NMLivelinessMonitor). The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals (e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, performs off-line computation to determine if preemption is needed, and how best to edit the current schedule to improve capacity, and generates events that produce four possible actions: # Container de-reservations # Resource-based preemptions # Container-based preemptions # Container killing The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. Note that due to the lag in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers from a queue) and not trying to tightly and consistently micromanage container allocations. - Preemption policy (ProportionalCapacityPreemptionPolicy): - Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows: # it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*) # if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**) # it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and respecting bounds on the amount of preemption we allow for each round) # it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order) # it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits # (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left, # (if not enough) it moves onto unreserve and preempt from the next application. # containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed. Notes: (*) at the moment, in order to avoid double-counting of the requests, we only look at the ANY part of pending resource requests, which means we might not preempt on behalf of AMs that ask only for specific locations but not any. (**) The ideal balance state is one in which each queue has at least its guaranteed capacity, and the spare capacity is distributed among queues (that wants some) as a weighted fair share. Where the weighting is based on the guaranteed capacity of a queue, and the function runs to a fix point. Tunables of the ProportionalCapacityPreemptionPolicy: # observe-only mode (i.e., log the actions it would take, but behave as read-only) # how frequently to run the policy # how long to wait between preemption and kill of a container # which fraction of the containers I would like to obtain should I preempt (has to do with the natural rate at which containers are returned) # deadzone size, i.e., what % of
[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692572#comment-13692572 ] Chris Douglas commented on YARN-569: {{TestAMAuthorization}} also fails on trunk, YARN-878 CapacityScheduler: support for preemption (using a capacity monitor) Key: YARN-569 URL: https://issues.apache.org/jira/browse/YARN-569 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, preemption.2.patch, YARN-569.10.patch, YARN-569.1.patch, YARN-569.2.patch, YARN-569.3.patch, YARN-569.4.patch, YARN-569.5.patch, YARN-569.6.patch, YARN-569.8.patch, YARN-569.9.patch, YARN-569.patch, YARN-569.patch There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to applications resource requests, and node updates, and the more introspective, time-based considerations needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate mechanisms of the CapacityScheduler directly to add support for preemption by means of a Capacity Monitor, which can be run optionally as a separate service (much like the NMLivelinessMonitor). The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals (e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, performs off-line computation to determine if preemption is needed, and how best to edit the current schedule to improve capacity, and generates events that produce four possible actions: # Container de-reservations # Resource-based preemptions # Container-based preemptions # Container killing The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. Note that due to the lag in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers from a queue) and not trying to tightly and consistently micromanage container allocations. - Preemption policy (ProportionalCapacityPreemptionPolicy): - Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows: # it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*) # if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**) # it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and respecting bounds on the amount of preemption we allow for each round) # it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order) # it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits # (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left, # (if not enough) it moves onto unreserve and preempt from the next application. # containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed. Notes: (*) at the moment, in order to avoid double-counting of the requests, we only look at the ANY part of pending resource requests, which means we might not preempt on behalf of AMs that ask only for specific locations but not any. (**) The ideal balance state is one in which each queue has at least its guaranteed capacity, and the spare capacity is distributed among queues (that wants some) as a weighted fair share. Where the weighting is based on the guaranteed capacity of a queue, and the function runs to a fix point. Tunables of the ProportionalCapacityPreemptionPolicy: # observe-only mode (i.e., log the actions it would take, but behave as read-only) # how frequently to run the policy # how long to wait between preemption and kill of a container # which fraction of the containers I would like to obtain should I preempt (has to do with
[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-569: --- Attachment: YARN-569.11.patch Rebase. CapacityScheduler: support for preemption (using a capacity monitor) Key: YARN-569 URL: https://issues.apache.org/jira/browse/YARN-569 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, preemption.2.patch, YARN-569.10.patch, YARN-569.11.patch, YARN-569.1.patch, YARN-569.2.patch, YARN-569.3.patch, YARN-569.4.patch, YARN-569.5.patch, YARN-569.6.patch, YARN-569.8.patch, YARN-569.9.patch, YARN-569.patch, YARN-569.patch There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to applications resource requests, and node updates, and the more introspective, time-based considerations needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate mechanisms of the CapacityScheduler directly to add support for preemption by means of a Capacity Monitor, which can be run optionally as a separate service (much like the NMLivelinessMonitor). The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals (e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, performs off-line computation to determine if preemption is needed, and how best to edit the current schedule to improve capacity, and generates events that produce four possible actions: # Container de-reservations # Resource-based preemptions # Container-based preemptions # Container killing The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. Note that due to the lag in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers from a queue) and not trying to tightly and consistently micromanage container allocations. - Preemption policy (ProportionalCapacityPreemptionPolicy): - Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows: # it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*) # if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**) # it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and respecting bounds on the amount of preemption we allow for each round) # it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order) # it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits # (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left, # (if not enough) it moves onto unreserve and preempt from the next application. # containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed. Notes: (*) at the moment, in order to avoid double-counting of the requests, we only look at the ANY part of pending resource requests, which means we might not preempt on behalf of AMs that ask only for specific locations but not any. (**) The ideal balance state is one in which each queue has at least its guaranteed capacity, and the spare capacity is distributed among queues (that wants some) as a weighted fair share. Where the weighting is based on the guaranteed capacity of a queue, and the function runs to a fix point. Tunables of the ProportionalCapacityPreemptionPolicy: # observe-only mode (i.e., log the actions it would take, but behave as read-only) # how frequently to run the policy # how long to wait between preemption and kill of a container # which fraction of the containers I would like to obtain should I preempt (has to do with the natural rate at which containers are
[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-569: --- Fix Version/s: 2.1.0-beta CapacityScheduler: support for preemption (using a capacity monitor) Key: YARN-569 URL: https://issues.apache.org/jira/browse/YARN-569 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.1.0-beta Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, preemption.2.patch, YARN-569.10.patch, YARN-569.11.patch, YARN-569.1.patch, YARN-569.2.patch, YARN-569.3.patch, YARN-569.4.patch, YARN-569.5.patch, YARN-569.6.patch, YARN-569.8.patch, YARN-569.9.patch, YARN-569.patch, YARN-569.patch There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to applications resource requests, and node updates, and the more introspective, time-based considerations needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate mechanisms of the CapacityScheduler directly to add support for preemption by means of a Capacity Monitor, which can be run optionally as a separate service (much like the NMLivelinessMonitor). The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals (e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, performs off-line computation to determine if preemption is needed, and how best to edit the current schedule to improve capacity, and generates events that produce four possible actions: # Container de-reservations # Resource-based preemptions # Container-based preemptions # Container killing The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. Note that due to the lag in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers from a queue) and not trying to tightly and consistently micromanage container allocations. - Preemption policy (ProportionalCapacityPreemptionPolicy): - Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows: # it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*) # if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**) # it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and respecting bounds on the amount of preemption we allow for each round) # it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order) # it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits # (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left, # (if not enough) it moves onto unreserve and preempt from the next application. # containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed. Notes: (*) at the moment, in order to avoid double-counting of the requests, we only look at the ANY part of pending resource requests, which means we might not preempt on behalf of AMs that ask only for specific locations but not any. (**) The ideal balance state is one in which each queue has at least its guaranteed capacity, and the spare capacity is distributed among queues (that wants some) as a weighted fair share. Where the weighting is based on the guaranteed capacity of a queue, and the function runs to a fix point. Tunables of the ProportionalCapacityPreemptionPolicy: # observe-only mode (i.e., log the actions it would take, but behave as read-only) # how frequently to run the policy # how long to wait between preemption and kill of a container # which fraction of the containers I would like to obtain should I preempt (has to do with the natural rate at which containers
[jira] [Commented] (YARN-1184) ClassCastException is thrown during preemption When a huge job is submitted to a queue B whose resources is used by a job in queueA
[ https://issues.apache.org/jira/browse/YARN-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769067#comment-13769067 ] Chris Douglas commented on YARN-1184: - I committed this. Thanks Bikas for the review. ClassCastException is thrown during preemption When a huge job is submitted to a queue B whose resources is used by a job in queueA --- Key: YARN-1184 URL: https://issues.apache.org/jira/browse/YARN-1184 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Affects Versions: 2.1.0-beta Reporter: J.Andreina Assignee: Chris Douglas Fix For: 2.1.1-beta Attachments: Y1184-0.patch, Y1184-1.patch preemption is enabled. Queue = a,b a capacity = 30% b capacity = 70% Step 1: Assign a big job to queue a ( so that job_a will utilize some resources from queue b) Step 2: Assigne a big job to queue b. Following exception is thrown at Resource Manager {noformat} 2013-09-12 10:42:32,535 ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread Thread[SchedulingMonitor (ProportionalCapacityPreemptionPolicy),5,main] threw an Exception. java.lang.ClassCastException: java.util.Collections$UnmodifiableSet cannot be cast to java.util.NavigableSet at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getContainersToPreempt(ProportionalCapacityPreemptionPolicy.java:403) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:202) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:173) at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:72) at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PreemptionChecker.run(SchedulingMonitor.java:82) at java.lang.Thread.run(Thread.java:662) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1184) ClassCastException is thrown during preemption When a huge job is submitted to a queue B whose resources is used by a job in queueA
[ https://issues.apache.org/jira/browse/YARN-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-1184: Attachment: Y1184-1.patch ClassCastException is thrown during preemption When a huge job is submitted to a queue B whose resources is used by a job in queueA --- Key: YARN-1184 URL: https://issues.apache.org/jira/browse/YARN-1184 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Affects Versions: 2.1.0-beta Reporter: J.Andreina Assignee: Chris Douglas Fix For: 2.1.1-beta Attachments: Y1184-0.patch, Y1184-1.patch preemption is enabled. Queue = a,b a capacity = 30% b capacity = 70% Step 1: Assign a big job to queue a ( so that job_a will utilize some resources from queue b) Step 2: Assigne a big job to queue b. Following exception is thrown at Resource Manager {noformat} 2013-09-12 10:42:32,535 ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread Thread[SchedulingMonitor (ProportionalCapacityPreemptionPolicy),5,main] threw an Exception. java.lang.ClassCastException: java.util.Collections$UnmodifiableSet cannot be cast to java.util.NavigableSet at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getContainersToPreempt(ProportionalCapacityPreemptionPolicy.java:403) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:202) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:173) at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:72) at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PreemptionChecker.run(SchedulingMonitor.java:82) at java.lang.Thread.run(Thread.java:662) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-2297) Preemption can hang in corner case by not allowing any task container to proceed.
[ https://issues.apache.org/jira/browse/YARN-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063086#comment-14063086 ] Chris Douglas commented on YARN-2297: - Are there realistic configurations where this creates a problem? If a queue is configured with less than a container's capacity, what is the intent? Preemption can hang in corner case by not allowing any task container to proceed. - Key: YARN-2297 URL: https://issues.apache.org/jira/browse/YARN-2297 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.5.0 Reporter: Tassapol Athiapinya Assignee: Wangda Tan Priority: Critical Preemption can cause hang issue in single-node cluster. Only AMs run. No task container can run. h3. queue configuration Queue A/B has 1% and 99% respectively. No max capacity. h3. scenario Turn on preemption. Configure 1 NM with 4 GB of memory. Use only 2 apps. Use 1 user. Submit app 1 to queue A. AM needs 2 GB. There is 1 task that needs 2 GB. Occupy entire cluster. Submit app 2 to queue B. AM needs 2 GB. There are 3 tasks that need 2 GB each. Instead of entire app 1 preempted, app 1 AM will stay. App 2 AM will launch. No task of either app can proceed. h3. commands /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar randomtextwriter -Dmapreduce.map.memory.mb=2000 -Dyarn.app.mapreduce.am.command-opts=-Xmx1800M -Dmapreduce.randomtextwriter.bytespermap=2147483648 -Dmapreduce.job.queuename=A -Dmapreduce.map.maxattempts=100 -Dmapreduce.am.max-attempts=1 -Dyarn.app.mapreduce.am.resource.mb=2000 -Dmapreduce.map.java.opts=-Xmx1800M -Dmapreduce.randomtextwriter.mapsperhost=1 -Dmapreduce.randomtextwriter.totalbytes=2147483648 dir1 /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar sleep -Dmapreduce.map.memory.mb=2000 -Dyarn.app.mapreduce.am.command-opts=-Xmx1800M -Dmapreduce.job.queuename=B -Dmapreduce.map.maxattempts=100 -Dmapreduce.am.max-attempts=1 -Dyarn.app.mapreduce.am.resource.mb=2000 -Dmapreduce.map.java.opts=-Xmx1800M -m 1 -r 0 -mt 4000 -rt 0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2297) Preemption can hang in corner case by not allowing any task container to proceed.
[ https://issues.apache.org/jira/browse/YARN-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063115#comment-14063115 ] Chris Douglas commented on YARN-2297: - I'll try asking the question differently. Does this occur when the absolute guaranteed capacity of a queue is smaller than the minimum container size? If so, then what is the operator expressing with that configuration? Preemption can hang in corner case by not allowing any task container to proceed. - Key: YARN-2297 URL: https://issues.apache.org/jira/browse/YARN-2297 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.5.0 Reporter: Tassapol Athiapinya Assignee: Wangda Tan Priority: Critical Preemption can cause hang issue in single-node cluster. Only AMs run. No task container can run. h3. queue configuration Queue A/B has 1% and 99% respectively. No max capacity. h3. scenario Turn on preemption. Configure 1 NM with 4 GB of memory. Use only 2 apps. Use 1 user. Submit app 1 to queue A. AM needs 2 GB. There is 1 task that needs 2 GB. Occupy entire cluster. Submit app 2 to queue B. AM needs 2 GB. There are 3 tasks that need 2 GB each. Instead of entire app 1 preempted, app 1 AM will stay. App 2 AM will launch. No task of either app can proceed. h3. commands /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar randomtextwriter -Dmapreduce.map.memory.mb=2000 -Dyarn.app.mapreduce.am.command-opts=-Xmx1800M -Dmapreduce.randomtextwriter.bytespermap=2147483648 -Dmapreduce.job.queuename=A -Dmapreduce.map.maxattempts=100 -Dmapreduce.am.max-attempts=1 -Dyarn.app.mapreduce.am.resource.mb=2000 -Dmapreduce.map.java.opts=-Xmx1800M -Dmapreduce.randomtextwriter.mapsperhost=1 -Dmapreduce.randomtextwriter.totalbytes=2147483648 dir1 /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar sleep -Dmapreduce.map.memory.mb=2000 -Dyarn.app.mapreduce.am.command-opts=-Xmx1800M -Dmapreduce.job.queuename=B -Dmapreduce.map.maxattempts=100 -Dmapreduce.am.max-attempts=1 -Dyarn.app.mapreduce.am.resource.mb=2000 -Dmapreduce.map.java.opts=-Xmx1800M -m 1 -r 0 -mt 4000 -rt 0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2297) Preemption can hang in corner case by not allowing any task container to proceed.
[ https://issues.apache.org/jira/browse/YARN-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063232#comment-14063232 ] Chris Douglas commented on YARN-2297: - The parameter defining the deadzone around the computed ideal [1] flattens out that jitter. When the guaranteed capacity for the queue is so vanishingly small that the deadzone is smaller than a single container allocation, then the deadzone (and guaranteed queue capacity) is effectively zero. [1] {{yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity}} Preemption can hang in corner case by not allowing any task container to proceed. - Key: YARN-2297 URL: https://issues.apache.org/jira/browse/YARN-2297 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.5.0 Reporter: Tassapol Athiapinya Assignee: Wangda Tan Priority: Critical Preemption can cause hang issue in single-node cluster. Only AMs run. No task container can run. h3. queue configuration Queue A/B has 1% and 99% respectively. No max capacity. h3. scenario Turn on preemption. Configure 1 NM with 4 GB of memory. Use only 2 apps. Use 1 user. Submit app 1 to queue A. AM needs 2 GB. There is 1 task that needs 2 GB. Occupy entire cluster. Submit app 2 to queue B. AM needs 2 GB. There are 3 tasks that need 2 GB each. Instead of entire app 1 preempted, app 1 AM will stay. App 2 AM will launch. No task of either app can proceed. h3. commands /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar randomtextwriter -Dmapreduce.map.memory.mb=2000 -Dyarn.app.mapreduce.am.command-opts=-Xmx1800M -Dmapreduce.randomtextwriter.bytespermap=2147483648 -Dmapreduce.job.queuename=A -Dmapreduce.map.maxattempts=100 -Dmapreduce.am.max-attempts=1 -Dyarn.app.mapreduce.am.resource.mb=2000 -Dmapreduce.map.java.opts=-Xmx1800M -Dmapreduce.randomtextwriter.mapsperhost=1 -Dmapreduce.randomtextwriter.totalbytes=2147483648 dir1 /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar sleep -Dmapreduce.map.memory.mb=2000 -Dyarn.app.mapreduce.am.command-opts=-Xmx1800M -Dmapreduce.job.queuename=B -Dmapreduce.map.maxattempts=100 -Dmapreduce.am.max-attempts=1 -Dyarn.app.mapreduce.am.resource.mb=2000 -Dmapreduce.map.java.opts=-Xmx1800M -m 1 -r 0 -mt 4000 -rt 0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2297) Preemption can hang when configured ridiculously
[ https://issues.apache.org/jira/browse/YARN-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-2297: Summary: Preemption can hang when configured ridiculously (was: Preemption can hang in corner case by not allowing any task container to proceed.) Preemption can hang when configured ridiculously Key: YARN-2297 URL: https://issues.apache.org/jira/browse/YARN-2297 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.5.0 Reporter: Tassapol Athiapinya Assignee: Wangda Tan Priority: Critical Preemption can cause hang issue in single-node cluster. Only AMs run. No task container can run. h3. queue configuration Queue A/B has 1% and 99% respectively. No max capacity. h3. scenario Turn on preemption. Configure 1 NM with 4 GB of memory. Use only 2 apps. Use 1 user. Submit app 1 to queue A. AM needs 2 GB. There is 1 task that needs 2 GB. Occupy entire cluster. Submit app 2 to queue B. AM needs 2 GB. There are 3 tasks that need 2 GB each. Instead of entire app 1 preempted, app 1 AM will stay. App 2 AM will launch. No task of either app can proceed. h3. commands /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar randomtextwriter -Dmapreduce.map.memory.mb=2000 -Dyarn.app.mapreduce.am.command-opts=-Xmx1800M -Dmapreduce.randomtextwriter.bytespermap=2147483648 -Dmapreduce.job.queuename=A -Dmapreduce.map.maxattempts=100 -Dmapreduce.am.max-attempts=1 -Dyarn.app.mapreduce.am.resource.mb=2000 -Dmapreduce.map.java.opts=-Xmx1800M -Dmapreduce.randomtextwriter.mapsperhost=1 -Dmapreduce.randomtextwriter.totalbytes=2147483648 dir1 /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar sleep -Dmapreduce.map.memory.mb=2000 -Dyarn.app.mapreduce.am.command-opts=-Xmx1800M -Dmapreduce.job.queuename=B -Dmapreduce.map.maxattempts=100 -Dmapreduce.am.max-attempts=1 -Dyarn.app.mapreduce.am.resource.mb=2000 -Dmapreduce.map.java.opts=-Xmx1800M -m 1 -r 0 -mt 4000 -rt 0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2297) Preemption can prevent progress in small queues
[ https://issues.apache.org/jira/browse/YARN-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-2297: Summary: Preemption can prevent progress in small queues (was: Preemption can hang when configured ridiculously) Preemption can prevent progress in small queues --- Key: YARN-2297 URL: https://issues.apache.org/jira/browse/YARN-2297 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.5.0 Reporter: Tassapol Athiapinya Assignee: Wangda Tan Priority: Critical Preemption can cause hang issue in single-node cluster. Only AMs run. No task container can run. h3. queue configuration Queue A/B has 1% and 99% respectively. No max capacity. h3. scenario Turn on preemption. Configure 1 NM with 4 GB of memory. Use only 2 apps. Use 1 user. Submit app 1 to queue A. AM needs 2 GB. There is 1 task that needs 2 GB. Occupy entire cluster. Submit app 2 to queue B. AM needs 2 GB. There are 3 tasks that need 2 GB each. Instead of entire app 1 preempted, app 1 AM will stay. App 2 AM will launch. No task of either app can proceed. h3. commands /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar randomtextwriter -Dmapreduce.map.memory.mb=2000 -Dyarn.app.mapreduce.am.command-opts=-Xmx1800M -Dmapreduce.randomtextwriter.bytespermap=2147483648 -Dmapreduce.job.queuename=A -Dmapreduce.map.maxattempts=100 -Dmapreduce.am.max-attempts=1 -Dyarn.app.mapreduce.am.resource.mb=2000 -Dmapreduce.map.java.opts=-Xmx1800M -Dmapreduce.randomtextwriter.mapsperhost=1 -Dmapreduce.randomtextwriter.totalbytes=2147483648 dir1 /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar sleep -Dmapreduce.map.memory.mb=2000 -Dyarn.app.mapreduce.am.command-opts=-Xmx1800M -Dmapreduce.job.queuename=B -Dmapreduce.map.maxattempts=100 -Dmapreduce.am.max-attempts=1 -Dyarn.app.mapreduce.am.resource.mb=2000 -Dmapreduce.map.java.opts=-Xmx1800M -m 1 -r 0 -mt 4000 -rt 0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-2424: Attachment: Y2424-1.patch Added a version with a log statement that warns on startup. [~tucu00], is this sufficient? The config docs are pretty clear about the effect of setting the parameter, and this should be noticed if someone is experimenting with LCE. LCE should support non-cgroups, non-secure mode --- Key: YARN-2424 URL: https://issues.apache.org/jira/browse/YARN-2424 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 Reporter: Allen Wittenauer Priority: Blocker Attachments: Y2424-1.patch, YARN-2424.patch After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. This is a fairly serious regression, as turning on LCE prior to turning on full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-2424: Assignee: Allen Wittenauer (was: Chris Douglas) LCE should support non-cgroups, non-secure mode --- Key: YARN-2424 URL: https://issues.apache.org/jira/browse/YARN-2424 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Priority: Blocker Attachments: Y2424-1.patch, YARN-2424.patch After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. This is a fairly serious regression, as turning on LCE prior to turning on full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2470) A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception a
[ https://issues.apache.org/jira/browse/YARN-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115581#comment-14115581 ] Chris Douglas commented on YARN-2470: - Failing to start is the correct behavior; that timeout is not valid. Is your intent to disable cleanup entirely? A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception and nodemanager does not start -- Key: YARN-2470 URL: https://issues.apache.org/jira/browse/YARN-2470 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.1 Reporter: Shivaji Dutta Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1709) Admission Control: Reservation subsystem
[ https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14123885#comment-14123885 ] Chris Douglas commented on YARN-1709: - Overall, the patch lgtm. Just a few minor tweaks, then I'm +1 * very minor: Javadoc could be compressed a bit (empty lines) {{InMemoryPlan}} * The {{ZERO_RESOURCE}} instance escapes via {{getConsumptionForUser}} * Some lines are more than 80 characters * The logging can use built-in substitution more efficiently. Instead of: {code} String errMsg = MessageFormat .format( The specified Reservation with ID {0} does not exist in the plan, reservation.getReservationId()); LOG.error(errMsg); {code} Prefer: {code} LOG.error(The specified Reservation with ID {} does not exist in the plan, reservation.getReservationId()); {code} Some of the code already uses this construction, but a few still use {{MessageFormat}}. * This form is harder to read: {code} InMemoryReservationAllocation inMemReservation = null; if (reservation instanceof InMemoryReservationAllocation) { inMemReservation = (InMemoryReservationAllocation) reservation; } else { // [snip] log error throw new RuntimeException(errMsg); } {code} than the if (error) { throw; } construction used the other checks. Is it an improvement over {{ClassCastException}}? * {{addReservation}} doesn't need to hold the write lock while it checks invariants on its arguments * The private methods that assume locks ({{incrementAllocation}}, {{decrementAllocation}}, {{removeReservation}}, etc.) are held should probably {{assert}} that precondition (e.g., {{RRWL::isWriteLockedByCurrentThread()}}) * {{getMinimumAllocation}} and {{getMaximumAllocation}} return mutable data that should probably be cloned {{InMemoryReservationAllocation}} * minor style: redundant {{this}} in get methods * {{toString}} should use {{StringBuilder}} instead of {{StringBuffer}} {{PlanView}} * Mismatched javadoc on {{getEarliestStartTime}} * {{getLastEndTime}} specifies UTC. Is that enforced in the implementation? {{ReservationInterval}} * Can this be made immutable? It's a key in several maps {{RLESparseResourceAllocation}} * Though some methods in {{InMemoryPlan}}, the {{ZERO_RESOURCE}} internal variable can escape via {{getCapacityAtTime}}. Admission Control: Reservation subsystem Key: YARN-1709 URL: https://issues.apache.org/jira/browse/YARN-1709 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch This JIRA is about the key data structure used to track resources over time to enable YARN-1051. The Reservation subsystem is conceptually a plan of how the scheduler will allocate resources over-time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1710) Admission Control: agents to allocate reservation
[ https://issues.apache.org/jira/browse/YARN-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129038#comment-14129038 ] Chris Douglas commented on YARN-1710: - {{GreedyReservationAgent}} * Consider {{@link}} for {{ReservationRequest}} in class javadoc * An inline comment could replace the {{adjustContract()}} method * Most of the javadoc on private methods can be cut * {{currentReservationStage}} does not need to be declared outside the loop * {{allocations}} cannot be null * An internal {{Resource(0, 0)}} could be reused * {{li}} should be part of the loop ({{for}} not {{while}}). Its initialization is unreadable; please use temp vars. * Generally, embedded calls are difficult to read: {code} if (findEarliestTime(allocations.keySet()) earliestStart) { allocations.put(new ReservationInterval(earliestStart, findEarliestTime(allocations.keySet())), ReservationRequest .newInstance(Resource.newInstance(0, 0), 0)); // consider to add trailing zeros at the end for simmetry } {code} Assuming the {{ReservationRequest}} is never modified by the plan: {code} private final ZERO_RSRC = ReservationRequest.newInstance(Resource.newInstance(0, 0), 0); // ... long allocStart = findEarliestTime(allocations.keySet()); if (allocStart earliestStart) { ReservationInterval preAlloc = new ReservationInterval(earliestStart, allocStart); allocations.put(preAlloc, ZERO_RSRC); } {code} * {{findEarliestTime(allocations.keySet())}} is called several times and should be memoized ** Would a {{TreeSet}} be more appropriate, given this access pattern? * Instead of: {code} boolean result = false; if (oldReservation != null) { result = plan.updateReservation(capReservation); } else { result = plan.addReservation(capReservation); } return result; {code} Consider: {code} if (oldReservation != null) { return plan.updateReservation(capReservation); } return plan.addReservation(capReservation); {code} * A comment unpacking the arithmetic for calculating {{curMaxGang}} would help readability {{TestGreedyReservationAgent}} * Instead of fixing the seed, consider setting and logging it for each run. * {{testStress}} is brittle, as it verifies only the timeout; {{testBig}} and {{testSmall}} don't verify anything. Both tests are useful, but probably not as part of the build. Dropping the annotation and adding a {{main()}} that calls each fo them would be one alternative. Admission Control: agents to allocate reservation - Key: YARN-1710 URL: https://issues.apache.org/jira/browse/YARN-1710 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-1710.1.patch, YARN-1710.patch This JIRA tracks the algorithms used to allocate a user ReservationRequest coming in from the new reservation API (YARN-1708), in the inventory subsystem (YARN-1709) maintaining the current plan for the cluster. The focus of this agents is to quickly find a solution for the set of contraints provided by the user, and the physical constraints of the plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2475) ReservationSystem: replan upon capacity reduction
[ https://issues.apache.org/jira/browse/YARN-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129041#comment-14129041 ] Chris Douglas commented on YARN-2475: - {{SimpleCapacityReplanner}} * The Clock can be initialized in the constructor, declared private and final * The exception refers to an InventorySizeAdjusmentPolicy * nit: redundant parenthesis in the main loop, exceeds 80 char * {{curSessions}} cannot be null; prefer {{!isEmpty()}} to {{size() 0}} ** Is this check even necessary? {{sort}} and the following loop should be noops * A brief comment about the natural order of {{ReservationAllocations}} would help readability of this loop. It's in the class doc, but something inline would be helpful * An internal {{Resource(0,0)}} could be reused, instead of creating it in the loop * Could the inner loop be more readable? The embedded function calls in the {{Resource}} arithmetic are hard to read (pseudo): {code} ArrayList curSessions = new ArrayList(plan.getResourcesAtTime(t)); Collections.sort(curSessions); for (Iterator i = curSessions.iterator(); i.hasNext() excessCap 0;) { InMemoryReservationAllocation a = (InMemoryReservationAllocation) i.next(); plan.deleteReservation(a.getReservationId()); excessCap -= a.getResourcesAtTime(t); } {code} * Why is the enforcement window tied to {{CapacitySchedulerConfiguration}}? {{TestSimpleCapacityReplanner}} * Tests should not call {{Thread.sleep}}; instead update the mock * Passing in a mocked {{Clock}} to the cstr rather than assigning it in the test is cleaner * Instead of {{assertTrue(cond != null)}} use {{assertNotNull(cond)}} (same for positive null check) * The test should not catch and discard {{PlanningException}} ReservationSystem: replan upon capacity reduction - Key: YARN-2475 URL: https://issues.apache.org/jira/browse/YARN-2475 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-2475.patch In the context of YARN-1051, if capacity of the cluster drops significantly upon machine failures we need to trigger a reorganization of the planned reservations. As reservations are absolute it is possible that they will not all fit, and some need to be rejected a-posteriori. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1710) Admission Control: agents to allocate reservation
[ https://issues.apache.org/jira/browse/YARN-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131700#comment-14131700 ] Chris Douglas commented on YARN-1710: - bq. I am not memoizing findEarliestTime, as it would only save one invocation (the others are on diff sets, or updated version of the same set) I'm confused. There are three invocations: {code} if (findEarliestTime(allocations.keySet()) earliestStart) { allocations.put(new ReservationInterval(earliestStart, findEarliestTime(allocations.keySet())), ZERO_RES); } ReservationAllocation capReservation = new InMemoryReservationAllocation(reservationId, contract, user, plan.getQueueName(), findEarliestTime(allocations.keySet()), findLatestTime(allocations.keySet()), allocations, plan.getResourceCalculator(), plan.getMinimumAllocation()); {code} Isn't earliest time is either the earliest in the set, or the interval this just added? Admission Control: agents to allocate reservation - Key: YARN-1710 URL: https://issues.apache.org/jira/browse/YARN-1710 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-1710.1.patch, YARN-1710.2.patch, YARN-1710.patch This JIRA tracks the algorithms used to allocate a user ReservationRequest coming in from the new reservation API (YARN-1708), in the inventory subsystem (YARN-1709) maintaining the current plan for the cluster. The focus of this agents is to quickly find a solution for the set of contraints provided by the user, and the physical constraints of the plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2475) ReservationSystem: replan upon capacity reduction
[ https://issues.apache.org/jira/browse/YARN-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131713#comment-14131713 ] Chris Douglas commented on YARN-2475: - +1, other than a couple very minor nits: * the new cstr accepting {{Clock}} can be package-private, with the no-arg cstr calling {{this(new UTCClock());}} (comment unnecessary, or replace with {{@VisibleForTesting}}) * The unit test could have a more descriptive name than {{test()}}, declare {{PlanningException}} in its throws clause instead of calling {{Assert::fail()}} on catching it, and not declare {{InterruptedException}} which it no longer throws Just a minor clarification: as this iterates over each instant of the plan, are others allowed to modify it? ReservationSystem: replan upon capacity reduction - Key: YARN-2475 URL: https://issues.apache.org/jira/browse/YARN-2475 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-2475.patch, YARN-2475.patch In the context of YARN-1051, if capacity of the cluster drops significantly upon machine failures we need to trigger a reorganization of the planned reservations. As reservations are absolute it is possible that they will not all fit, and some need to be rejected a-posteriori. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1709) Admission Control: Reservation subsystem
[ https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131727#comment-14131727 ] Chris Douglas commented on YARN-1709: - Thanks for the updates. Just a few minor tweaks, then I'm +1 * In checking the preconditions: {code} if (!readWriteLock.isWriteLockedByCurrentThread()) { return; } {code} The intent was to {{assert}} and crash, so tests against this code can detect violations if the code is modified. When assertions are disabled, the check is elided * Instead of two cstr that assign all the final fields, the no-arg should call the other * Instead of explicitly throwing {{ClassCastException}}, this should just attempt the cast. The cause is implicit, and doesn't require a custom error string Admission Control: Reservation subsystem Key: YARN-1709 URL: https://issues.apache.org/jira/browse/YARN-1709 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch This JIRA is about the key data structure used to track resources over time to enable YARN-1051. The Reservation subsystem is conceptually a plan of how the scheduler will allocate resources over-time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2475) ReservationSystem: replan upon capacity reduction
[ https://issues.apache.org/jira/browse/YARN-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132210#comment-14132210 ] Chris Douglas commented on YARN-2475: - Yes, that makes sense. Just curious about the contract. ReservationSystem: replan upon capacity reduction - Key: YARN-2475 URL: https://issues.apache.org/jira/browse/YARN-2475 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-2475.patch, YARN-2475.patch, YARN-2475.patch In the context of YARN-1051, if capacity of the cluster drops significantly upon machine failures we need to trigger a reorganization of the planned reservations. As reservations are absolute it is possible that they will not all fit, and some need to be rejected a-posteriori. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1710) Admission Control: agents to allocate reservation
[ https://issues.apache.org/jira/browse/YARN-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132466#comment-14132466 ] Chris Douglas commented on YARN-1710: - +1 lgtm. Thanks [~curino] for all the iterations on this Admission Control: agents to allocate reservation - Key: YARN-1710 URL: https://issues.apache.org/jira/browse/YARN-1710 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-1710.1.patch, YARN-1710.2.patch, YARN-1710.3.patch, YARN-1710.4.patch, YARN-1710.patch This JIRA tracks the algorithms used to allocate a user ReservationRequest coming in from the new reservation API (YARN-1708), in the inventory subsystem (YARN-1709) maintaining the current plan for the cluster. The focus of this agents is to quickly find a solution for the set of contraints provided by the user, and the physical constraints of the plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
[ https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133406#comment-14133406 ] Chris Douglas commented on YARN-1711: - General - The public classes should be annotated with the correct visibility and stability annotations (probably {{@Public}} and {{@Unstable}}) {{SharingPolicy}} - Javadoc throws clause and some parameters not populated - In particular, the {{excludeList}} parameter could use some unpacking. {{CapacityOverTimePolicy}} - Just performing the cast, and throwing {{ClassCastException}} implicitly is equally clear - nit: spacing/concat: {{plan.getTotalCapacity() + )by + accepting reservation: }} - Just as an observation, no change requested: the assumption that the sharing policy holds the lock on the plan is probably OK, but since both are interfaces there may be a missing abstraction that associates compatible sets of interlocking components. - I can't think of a more appropriate solution to handling aggregates of {{Resource}}. Anything more correct doesn't really justify the complexity, certainly not before we get some more experience with planning. Since enabling this is optional, enforcement with the {{IntegralResource}} is a pragmatic tradeoff. {{*Exception}} - s/MismatchingUserException/MismatchedUserException/ - Are the subclasses of {{PlanningException}} for a caller to distinguish the cause for the rejected request, so it can refine it? If that's the case, should they contain diagnostic information as a payload e.g., requested vs actual user? If the intent is to extract it, then some more easily parsed format for the message might be appropriate (e.g., JSON). {{NoOverCommitPolicy}} - The {{excludeList}} should probably be final, and cleared/populated with a clone of the set on calls to {{init()}} {{CapacitySchedulerConfiguration}} - Missing javadoc for the new parameters. {{TestNoOverCommitPolicy}} - Consider using {{@Test(expected = SomeException.class)}} instead of {{Assert::fail()}} and try/catch for {{testSingleFail()}} - Consider specifying the expected cause/subtype instead of generic {{PlanningException}} - {{testMultiTenantFail}} only veries that a {{PlanningException}} is thrown, not that it fails as expected {{TestCapacityOverTimePolicy}} - Most of the tests don't verify that the failure occurs when and how its parameters specify, but only check that a {{PlanningException}} is thrown. CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709 -- Key: YARN-1711 URL: https://issues.apache.org/jira/browse/YARN-1711 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino Assignee: Carlo Curino Labels: reservations Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.3.patch, YARN-1711.patch This JIRA tracks the development of a policy that enforces user quotas (a time-extension of the notion of capacity) in the inventory subsystem discussed in YARN-1709. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
[ https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134718#comment-14134718 ] Chris Douglas commented on YARN-1711: - +1 Thanks for addressing the feedback on the patch CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709 -- Key: YARN-1711 URL: https://issues.apache.org/jira/browse/YARN-1711 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino Assignee: Carlo Curino Labels: reservations Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.3.patch, YARN-1711.4.patch, YARN-1711.patch This JIRA tracks the development of a policy that enforces user quotas (a time-extension of the notion of capacity) in the inventory subsystem discussed in YARN-1709. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.
[ https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-1051: Fix Version/s: (was: 3.0.0) 2.6.0 YARN Admission Control/Planner: enhancing the resource allocation model with time. -- Key: YARN-1051 URL: https://issues.apache.org/jira/browse/YARN-1051 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager, scheduler Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.6.0 Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, techreport.pdf In this umbrella JIRA we propose to extend the YARN RM to handle time explicitly, allowing users to reserve capacity over time. This is an important step towards SLAs, long-running services, workflows, and helps for gang scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630893#comment-13630893 ] Chris Douglas commented on YARN-45: --- [~sandyr]: Yes, but the correct format/semantics for time are a complex discussion in themselves. To keep this easy to review and the discussion focused, we were going to file that separately. But I totally agree: for the AM to respond intelligently, the time before it's forced to give up the container is valuable input. [~bikash]: Agree almost completely. In YARN-569, the hysteresis you cite motivated several design points, including multiple dampers on actions taken by the preemption policy, out-of-band observation/enforcement, and no effort to fine-tune particular allocations. The role of preemption (to summarize what [~curino] discussed in detail in the prenominate JIRA) is to make coarse corrections around the core scheduler invariants (e.g., capacity, fairness). Rather than introducing new races or complexity, one could argue that preemption is a dual of allocation in an inconsistent environment. Your proposal matches case (1) in the above [comment|https://issues.apache.org/jira/browse/YARN-45?focusedCommentId=13628950page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628950], where the RM specifies the set of containers in jeopardy and a contract (as {{ResourceRequest}}) for avoiding the kills, should the AM have cause to pick different containers. Further, your observation that the RM has enough information in priorities, etc. to make an educated guess at those containers is spot-on. IIRC, the policy uses allocation order when selecting containers, but that should be a secondary key after priority. The disputed point, and I'm not sure we actually disagree, is the claim that the AM should never kill things in response to this message. To be fair, that can be implemented by just ignoring the requests, so it's orthogonal to this particular protocol, but it's certainly an important best practice to discuss to ensure we're capturing the right thing. Certainly there are many cases where ignoring the message is correct; most CDFs of map task execution time show that over 80% finish in less than a minute, so the AM has few reasons to pessimistically kill them. There are a few scenarios where this isn't optimal. Take the case of YARN-415, where the AM is billed cumulatively for cluster time. Assume an AM knows (a) the container will not finish (reinforcing [~sandyr]'s point about including time in the preemption message) and (b) the work done is not worth checkpointing. It can conclude that killing the container is in its best interest, because squatting on the resource could affect its ability to get containers in the future (or simply cost more). Moreover, for long-lived services and speculative container allocation/retention, the AM may actually be holding the container only as an optimization or for a future execution, so it could release it at low cost to itself. Finally, the time allowed before the RM starts killing containers can be extended if AMs typically return resources before the deadline. It's also a mechanism for the RM to advise the AM about constraints that prevent it from granting its pending requests. The AM currently kills reducers if it can't get containers to regenerate lost map output. If the scheduler values some containers more than others, the AM's response to starvation can be improved from random killing. This is a case where the current implementation acknowledges the fact that it already runs in an inconsistent environment. Scheduler feedback to AM to release containers -- Key: YARN-45 URL: https://issues.apache.org/jira/browse/YARN-45 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Chris Douglas Assignee: Carlo Curino Attachments: YARN-45.patch, YARN-45.patch The ResourceManager strikes a balance between cluster utilization and strict enforcement of resource invariants in the cluster. Individual allocations of containers must be reclaimed- or reserved- to restore the global invariants when cluster load shifts. In some cases, the ApplicationMaster can respond to fluctuations in resource availability without losing the work already completed by that task (MAPREDUCE-4584). Supplying it with this information would be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol for the RM to ask the AM to release containers. [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on
[jira] [Commented] (YARN-573) Shared data structures in Public Localizer and Private Localizer are not Thread safe.
[ https://issues.apache.org/jira/browse/YARN-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631268#comment-13631268 ] Chris Douglas commented on YARN-573: Pardon? Shared data structures in Public Localizer and Private Localizer are not Thread safe. - Key: YARN-573 URL: https://issues.apache.org/jira/browse/YARN-573 Project: Hadoop YARN Issue Type: Sub-task Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi PublicLocalizer 1) pending accessed by addResource (part of event handling) and run method (as a part of PublicLocalizer.run() ). PrivateLocalizer 1) pending accessed by addResource (part of event handling) and findNextResource (i.remove()). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632630#comment-13632630 ] Chris Douglas commented on YARN-45: --- bq. ResourceRequest is not actionable in the sense that neither of the schedulers can currently send a non-empty ResourceRequest to preempt. Both only do preemption by containers though they have some plumbing to send RR's if they want to do so. So I am not quite sure what you mean by We indeed have code that exercises the ResourceRequest version of it. A prototype impl against MapReduce responds to {{ResourceRequest}} in the preempt message. We're currently polishing and splitting that up for review, but wanted to get consensus on the Yarn changes in case new requirements required reworking the rest. An RM impl that includes killing for {{ResourceRequest}} (or {{Resource}}) is a more invasive change, particularly because (a) the AM needs to reason about which recently finished containers are included in the message (i.e., it needs to reason about what the RM knows, so the RM needs to be consistent in what it tells the AM) and (b) the RM needs to track its previous preemption requests, timing them out in the context of existing allocations and exited containers (i.e., decisions to preempt need to incorporate subsequent information). To get experience before proposing anything drastic, we marked this API as experimental, wrote the enforcement policy against {{ContainerID}}, and tucked it behind a pluggable interface. This way, the AM can ignore stale requests for exited containers and the RM can time out particular containers it asked for easily; every computed preemption set is bound in a namespace that sidesteps the most disruptive impl issues on both sides. bq. By not using location we are implicitly using the * location right? Might as well make it explicit. Non * locations will make sense when affinity based preemptions occur. Yes, that's exactly the intent. The policy in YARN-569 doesn't attempt to bias the preemptions to match the requests in under-capacity queues, but that's a natural policy to implement against this protocol. {quote} The bare-minimum requirement seems: # RM should notify the AM that a certain amount of resources will need to be reclaimed (ala SIGTERM). # Thus, the AM gets an opportunity to *pick* which containers it will sacrifice to satisfy the RM's requirements. # Iff the AM doesn't act, the RM will go ahead and terminate some containers (probably the most-recently allocated ones); ala SIGKILL. Given the above, I feel that this is a set of changes we need to be conservative about - particularly since the really simple pre-emption i.e. SIGKILL alone on RM side is trivial (from an API perspective). {quote} Totally agreed. The symmetry of {{ResourceRequest}} in the ask-back is attractive, but it's not a sufficient condition. To it, I'd add all the familiar attributes of using them in allocation requests (economy, expressiveness, versatility). While {{Resource}} covers the current impl, it leaves little room for related improvements, or even refinements (e.g., preferring resources requested by under-capacity queues, prioritizing types of containers, and time). The API isn't that complex, but a strict implementation would change the RM more, adding risk. To mitigate that, but still encourage applications to write against the richer type while we get experience with it, [~curino]'s formulation [above|https://issues.apache.org/jira/browse/YARN-45?focusedCommentId=13628950page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628950] seems like a decent set of semantics... We could add a new type that encodes a subset of the {{ResourceRequest}} type. It lacks symmetry, but it also allows them to evolve independently. Scheduler feedback to AM to release containers -- Key: YARN-45 URL: https://issues.apache.org/jira/browse/YARN-45 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Chris Douglas Assignee: Carlo Curino Attachments: YARN-45.patch, YARN-45.patch The ResourceManager strikes a balance between cluster utilization and strict enforcement of resource invariants in the cluster. Individual allocations of containers must be reclaimed- or reserved- to restore the global invariants when cluster load shifts. In some cases, the ApplicationMaster can respond to fluctuations in resource availability without losing the work already completed by that task (MAPREDUCE-4584). Supplying it with this information would be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol for the RM to ask the AM to release containers. [1]
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642163#comment-13642163 ] Chris Douglas commented on YARN-45: --- If everyone's OK with the current patch as a base, I'll commit it in the next couple days. Scheduler feedback to AM to release containers -- Key: YARN-45 URL: https://issues.apache.org/jira/browse/YARN-45 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Chris Douglas Assignee: Carlo Curino Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch The ResourceManager strikes a balance between cluster utilization and strict enforcement of resource invariants in the cluster. Individual allocations of containers must be reclaimed- or reserved- to restore the global invariants when cluster load shifts. In some cases, the ApplicationMaster can respond to fluctuations in resource availability without losing the work already completed by that task (MAPREDUCE-4584). Supplying it with this information would be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol for the RM to ask the AM to release containers. [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644626#comment-13644626 ] Chris Douglas commented on YARN-45: --- I'm also a fan of {{ResourceRequest}}, but we're not really using all its features, yet. Similarly, {{Resource}} bakes in the fungibility of resources, which could be awkward as the RM accommodates richer requests (as in YARN-392). We could use {{ResourceRequest}}- so the API is there for extensions- but only populate the capability as an aggregate. With the convention that \-1 containers can mean packed as you see fit, it expresses {{Resource}} (which we need in practice, since the priorities for requests don't always [match the preemption order|https://issues.apache.org/jira/browse/YARN-569?focusedCommentId=13638825page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13638825]), which is sufficient for the current schedulers. If we're adding the contract back with the set of containers, the [semantics|https://issues.apache.org/jira/browse/YARN-45?focusedCommentId=13628950page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628950] we discussed earlier still seem OK. Scheduler feedback to AM to release containers -- Key: YARN-45 URL: https://issues.apache.org/jira/browse/YARN-45 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Chris Douglas Assignee: Carlo Curino Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45_summary_of_alternatives.pdf The ResourceManager strikes a balance between cluster utilization and strict enforcement of resource invariants in the cluster. Individual allocations of containers must be reclaimed- or reserved- to restore the global invariants when cluster load shifts. In some cases, the ApplicationMaster can respond to fluctuations in resource availability without losing the work already completed by that task (MAPREDUCE-4584). Supplying it with this information would be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol for the RM to ask the AM to release containers. [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644675#comment-13644675 ] Chris Douglas commented on YARN-45: --- bq. we could express the ResourceRequest as a multiple of the minimum allocation +1 This is better Scheduler feedback to AM to release containers -- Key: YARN-45 URL: https://issues.apache.org/jira/browse/YARN-45 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Chris Douglas Assignee: Carlo Curino Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45_summary_of_alternatives.pdf The ResourceManager strikes a balance between cluster utilization and strict enforcement of resource invariants in the cluster. Individual allocations of containers must be reclaimed- or reserved- to restore the global invariants when cluster load shifts. In some cases, the ApplicationMaster can respond to fluctuations in resource availability without losing the work already completed by that task (MAPREDUCE-4584). Supplying it with this information would be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol for the RM to ask the AM to release containers. [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650527#comment-13650527 ] Chris Douglas edited comment on YARN-45 at 5/7/13 6:20 AM: --- bq. Would be great if you could add a version number to your patches. Sorry, we weren't sure of the current convention. {quote} - PreemptionMessage.strict should perhaps be named strictContract explicitly. You did name the setters and the getters verbosely which is good. - You should mark all the api getters and setters to be synchronized. There are similar locking bugs in other existing records too but we are tracking them elsewhere. - PreemptionContainer.getId() - Javadoc should refer to containers instead of Resource? - PreemptionContract.getContainers() - Javadoc referring to codeResourceManager/code may also include a @link PreemptionContract that, if satisfied, may replace these doesn't make sense to me. {quote} Fixed all of these; last one was a copy/paste of an older version of the code. Thanks for catching these. [~bikassaha]: we took another attempt at the javadoc, but it's probably still not sufficient. We opened YARN-650 to track documentation of this feature in the AM how-to, which we'll address presently. (thanks everyone for the great feedback!) was (Author: curino): bq. Would be great if you could add a version number to your patches. Sorry, we weren't sure of the current convention. {quote} - PreemptionMessage.strict should perhaps be named strictContract explicitly. You did name the setters and the getters verbosely which is good. - You should mark all the api getters and setters to be synchronized. There are similar locking bugs in other existing records too but we are tracking them elsewhere. - PreemptionContainer.getId() - Javadoc should refer to containers instead of Resource? - PreemptionContract.getContainers() - Javadoc referring to codeResourceManager/code may also include a @link PreemptionContract that, if satisfied, may replace these doesn't make sense to me. {quote} Fixed all of these; last one was a copy/paste of an older version of the code. Thanks for catching these. [~bikassaha]: we took another attempt at the javadoc, but it's probably still not sufficient. We opened YARN-XXX to track documentation of this feature in the AM how-to, which we'll address presently. (thanks everyone for the great feedback!) Scheduler feedback to AM to release containers -- Key: YARN-45 URL: https://issues.apache.org/jira/browse/YARN-45 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Chris Douglas Assignee: Carlo Curino Fix For: 2.0.5-beta Attachments: YARN-45.1.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45_summary_of_alternatives.pdf The ResourceManager strikes a balance between cluster utilization and strict enforcement of resource invariants in the cluster. Individual allocations of containers must be reclaimed- or reserved- to restore the global invariants when cluster load shifts. In some cases, the ApplicationMaster can respond to fluctuations in resource availability without losing the work already completed by that task (MAPREDUCE-4584). Supplying it with this information would be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol for the RM to ask the AM to release containers. [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-650) User guide for preemption
Chris Douglas created YARN-650: -- Summary: User guide for preemption Key: YARN-650 URL: https://issues.apache.org/jira/browse/YARN-650 Project: Hadoop YARN Issue Type: Task Components: documentation Reporter: Chris Douglas Priority: Minor Fix For: 2.0.5-beta YARN-45 added a protocol for the RM to ask back resources. The docs on writing YARN applications should include a section on how to interpret this message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-567) RM changes to support preemption for FairScheduler and CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-567: --- Attachment: (was: YARN-567-1.patch) RM changes to support preemption for FairScheduler and CapacityScheduler Key: YARN-567 URL: https://issues.apache.org/jira/browse/YARN-567 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-567.patch, YARN-567.patch A common tradeoff in scheduling jobs is between keeping the cluster busy and enforcing capacity/fairness properties. FairScheduler and CapacityScheduler takes opposite stance on how to achieve this. The FairScheduler, leverages task-killing to quickly reclaim resources from currently running jobs and redistributing them among new jobs, thus keeping the cluster busy but waste useful work. The CapacityScheduler is typically tuned to limit the portion of the cluster used by each queue so that the likelihood of violating capacity is low, thus never wasting work, but risking to keep the cluster underutilized or have jobs waiting to obtain their rightful capacity. By introducing the notion of a work-preserving preemption we can remove this tradeoff. This requires a protocol for preemption (YARN-45), and ApplicationMasters that can answer to preemption efficiently (e.g., by saving their intermediate state, this will be posted for MapReduce in a separate JIRA soon), together with a scheduler that can issues preemption requests (discussed in separate JIRAs YARN-568 and YARN-569). The changes we track with this JIRA are common to FairScheduler and CapacityScheduler, and are mostly propagation of preemption decisions through the ApplicationMastersService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-567) RM changes to support preemption for FairScheduler and CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-567: --- Attachment: YARN-567-1.patch RM changes to support preemption for FairScheduler and CapacityScheduler Key: YARN-567 URL: https://issues.apache.org/jira/browse/YARN-567 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-567.patch, YARN-567.patch A common tradeoff in scheduling jobs is between keeping the cluster busy and enforcing capacity/fairness properties. FairScheduler and CapacityScheduler takes opposite stance on how to achieve this. The FairScheduler, leverages task-killing to quickly reclaim resources from currently running jobs and redistributing them among new jobs, thus keeping the cluster busy but waste useful work. The CapacityScheduler is typically tuned to limit the portion of the cluster used by each queue so that the likelihood of violating capacity is low, thus never wasting work, but risking to keep the cluster underutilized or have jobs waiting to obtain their rightful capacity. By introducing the notion of a work-preserving preemption we can remove this tradeoff. This requires a protocol for preemption (YARN-45), and ApplicationMasters that can answer to preemption efficiently (e.g., by saving their intermediate state, this will be posted for MapReduce in a separate JIRA soon), together with a scheduler that can issues preemption requests (discussed in separate JIRAs YARN-568 and YARN-569). The changes we track with this JIRA are common to FairScheduler and CapacityScheduler, and are mostly propagation of preemption decisions through the ApplicationMastersService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-568) FairScheduler: support for work-preserving preemption
[ https://issues.apache.org/jira/browse/YARN-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-568: --- Attachment: YARN-568-1.patch FairScheduler: support for work-preserving preemption -- Key: YARN-568 URL: https://issues.apache.org/jira/browse/YARN-568 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-568-1.patch, YARN-568.patch, YARN-568.patch In the attached patch, we modified the FairScheduler to substitute its preemption-by-killling with a work-preserving version of preemption (followed by killing if the AMs do not respond quickly enough). This should allows to run preemption checking more often, but kill less often (proper tuning to be investigated). Depends on YARN-567 and YARN-45, is related to YARN-569. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-650) User guide for preemption
[ https://issues.apache.org/jira/browse/YARN-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-650: --- Attachment: Y650-0.patch User guide for preemption - Key: YARN-650 URL: https://issues.apache.org/jira/browse/YARN-650 Project: Hadoop YARN Issue Type: Task Components: documentation Reporter: Chris Douglas Priority: Minor Fix For: 2.0.5-beta Attachments: Y650-0.patch YARN-45 added a protocol for the RM to ask back resources. The docs on writing YARN applications should include a section on how to interpret this message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-568) FairScheduler: support for work-preserving preemption
[ https://issues.apache.org/jira/browse/YARN-568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13653234#comment-13653234 ] Chris Douglas commented on YARN-568: +1 I committed this. Thanks Carlo and Sandy FairScheduler: support for work-preserving preemption -- Key: YARN-568 URL: https://issues.apache.org/jira/browse/YARN-568 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-568-1.patch, YARN-568-2.patch, YARN-568-2.patch, YARN-568.patch, YARN-568.patch In the attached patch, we modified the FairScheduler to substitute its preemption-by-killling with a work-preserving version of preemption (followed by killing if the AMs do not respond quickly enough). This should allows to run preemption checking more often, but kill less often (proper tuning to be investigated). Depends on YARN-567 and YARN-45, is related to YARN-569. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-568) FairScheduler: support for work-preserving preemption
[ https://issues.apache.org/jira/browse/YARN-568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661793#comment-13661793 ] Chris Douglas commented on YARN-568: bq. From the code in generatePreemptionMessage() the overlap between strict and fungible is not obvious. Can both be sent? Yes. From the discussion in YARN-45, it seemed the consensus was that the RM may want to send a mix of both requests. Does that still make sense? bq. Unused new member seems to have been added: recordFactory? Sorry, an artifact of a previous version. Cleaned up in a followup commit. FairScheduler: support for work-preserving preemption -- Key: YARN-568 URL: https://issues.apache.org/jira/browse/YARN-568 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.0.5-beta Attachments: YARN-568-1.patch, YARN-568-2.patch, YARN-568-2.patch, YARN-568.patch, YARN-568.patch In the attached patch, we modified the FairScheduler to substitute its preemption-by-killling with a work-preserving version of preemption (followed by killing if the AMs do not respond quickly enough). This should allows to run preemption checking more often, but kill less often (proper tuning to be investigated). Depends on YARN-567 and YARN-45, is related to YARN-569. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource
[ https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918548#comment-13918548 ] Chris Douglas commented on YARN-1771: - The simpler check doesn't seem to have any practical issues. Since the cache is keyed on Paths, the case where a user can refer to an object without access to it seems pretty esoteric. As long as the public cache runs with lowered privileges, and the check isn't necessary to verify that the public resource isn't private to YARN. Copying with the user's HDFS credentials avoids that, though that seems like a heavyweight solution if reducing getFileStatus calls is the only motivation. many getFileStatus calls made from node manager for localizing a public distributed cache resource -- Key: YARN-1771 URL: https://issues.apache.org/jira/browse/YARN-1771 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical We're observing that the getFileStatus calls are putting a fair amount of load on the name node as part of checking the public-ness for localizing a resource that belong in the public cache. We see 7 getFileStatus calls made for each of these resource. We should look into reducing the number of calls to the name node. One example: {noformat} 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp ... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,355 INFO audit: ... cmd=open src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource
[ https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918625#comment-13918625 ] Chris Douglas commented on YARN-1771: - bq. Orthogonal to this we have been discussing adding a FileStatus[] getFileStatus(Path f) API that returns FileStatus for each path component of f in a single RPC. Symlinks might be awkward to support, but that discussion is for a separate ticket. Do you have a JIRA ref? bq. So I think we need some kind of access check, either as the requesting user or explicit access checks like it does today, to avoid a malicious client obtaining access to private files via the NM. An HDFS nobody account? A cache would probably be correct in almost all cases, though. Since the check is only performed when the resource is localized, there could be cases where the filesystem is never in the cached state, but those are rare (and as Sandy points out, already in the current design). To attack the cache, the writer would need to take an unprotected directory, change its permissions, then populate it with private data (whose attributes are guessable). Expiring after short internals and not populating the cache with failed localization attempts could help mitigate its effectiveness. many getFileStatus calls made from node manager for localizing a public distributed cache resource -- Key: YARN-1771 URL: https://issues.apache.org/jira/browse/YARN-1771 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical We're observing that the getFileStatus calls are putting a fair amount of load on the name node as part of checking the public-ness for localizing a resource that belong in the public cache. We see 7 getFileStatus calls made for each of these resource. We should look into reducing the number of calls to the name node. One example: {noformat} 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp ... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,355 INFO audit: ... cmd=open src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource
[ https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932203#comment-13932203 ] Chris Douglas commented on YARN-1771: - I just skimmed the patch, but it lgtm. The LoadingCache impl is very clean, and only caching over the course of a container localization relieves one of any practical responsibility to limit the cache size (that said, might as well add something fixed). Only minor, optional nits: If a path is invalid/inaccessible, it might make sense to memoize the failure, also. {{FSDownload::isPublic}} can be package-private (and annotated w/ {{\@VisibleForTesting}} for the unit test, rather than public. many getFileStatus calls made from node manager for localizing a public distributed cache resource -- Key: YARN-1771 URL: https://issues.apache.org/jira/browse/YARN-1771 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch We're observing that the getFileStatus calls are putting a fair amount of load on the name node as part of checking the public-ness for localizing a resource that belong in the public cache. We see 7 getFileStatus calls made for each of these resource. We should look into reducing the number of calls to the name node. One example: {noformat} 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp ... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,355 INFO audit: ... cmd=open src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource
[ https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932651#comment-13932651 ] Chris Douglas commented on YARN-1771: - bq. I'm also going to make changes to memoize failures. The only slight hesitation I have is normally that would be quite rare, but I think it's a good thing to have. Agreed, I doubt it will have a significant impact, here. In a shared/longer-lived cache it might be marginally more useful, but still rare. bq. I did think about making the stat cache longer-lived. But the complexity of managing its size as well as the values getting quite stale dissuaded me from it. Let me know if you agree... *nod* Since the goal is to reduce stress on the NN, deferring that complexity until necessary is a good plan. many getFileStatus calls made from node manager for localizing a public distributed cache resource -- Key: YARN-1771 URL: https://issues.apache.org/jira/browse/YARN-1771 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch We're observing that the getFileStatus calls are putting a fair amount of load on the name node as part of checking the public-ness for localizing a resource that belong in the public cache. We see 7 getFileStatus calls made for each of these resource. We should look into reducing the number of calls to the name node. One example: {noformat} 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp ... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,355 INFO audit: ... cmd=open src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource
[ https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-1771: Issue Type: Improvement (was: Bug) many getFileStatus calls made from node manager for localizing a public distributed cache resource -- Key: YARN-1771 URL: https://issues.apache.org/jira/browse/YARN-1771 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.3.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, yarn-1771.patch We're observing that the getFileStatus calls are putting a fair amount of load on the name node as part of checking the public-ness for localizing a resource that belong in the public cache. We see 7 getFileStatus calls made for each of these resource. We should look into reducing the number of calls to the name node. One example: {noformat} 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp ... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,355 INFO audit: ... cmd=open src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource
[ https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-1771: Fix Version/s: 2.5.0 many getFileStatus calls made from node manager for localizing a public distributed cache resource -- Key: YARN-1771 URL: https://issues.apache.org/jira/browse/YARN-1771 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.3.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Fix For: 3.0.0, 2.4.0, 2.5.0 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, yarn-1771.patch We're observing that the getFileStatus calls are putting a fair amount of load on the name node as part of checking the public-ness for localizing a resource that belong in the public cache. We see 7 getFileStatus calls made for each of these resource. We should look into reducing the number of calls to the name node. One example: {noformat} 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp ... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,355 INFO audit: ... cmd=open src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource
[ https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935310#comment-13935310 ] Chris Douglas commented on YARN-1771: - bq. It would be great if you could commit this to branch-2.4 too... Sure, np. Done many getFileStatus calls made from node manager for localizing a public distributed cache resource -- Key: YARN-1771 URL: https://issues.apache.org/jira/browse/YARN-1771 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.3.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Fix For: 3.0.0, 2.4.0, 2.5.0 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, yarn-1771.patch We're observing that the getFileStatus calls are putting a fair amount of load on the name node as part of checking the public-ness for localizing a resource that belong in the public cache. We see 7 getFileStatus calls made for each of these resource. We should look into reducing the number of calls to the name node. One example: {noformat} 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp ... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,355 INFO audit: ... cmd=open src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1927) Preemption message shouldn’t be created multiple times for same container-id in ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967685#comment-13967685 ] Chris Douglas commented on YARN-1927: - The decision to preempt a container may be reversed. The policy reiterates its request and only kills containers consistently recalled over a grace period. The application clears the containers requested in {{FiCaSchedulerApp::getAllocation}} after reporting them to the AM. [~curino], can you confirm that this is the intent? Preemption message shouldn’t be created multiple times for same container-id in ProportionalCapacityPreemptionPolicy Key: YARN-1927 URL: https://issues.apache.org/jira/browse/YARN-1927 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Priority: Minor Attachments: YARN-1927.patch Currently, after each editSchedule() called, preemption message will be created and sent to scheduler. ProportionalCapacityPreemptionPolicy should only send preemption message once for each container. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1957) ProportionalCapacitPreemptionPolicy handling of corner cases...
[ https://issues.apache.org/jira/browse/YARN-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983978#comment-13983978 ] Chris Douglas commented on YARN-1957: - +1 Enforcing {{maxCapacity}} in the calculation of the ideal capacity is a good fix, and distributing capacity over queues with zero capacity (with the config knob to restore the existing 0 == disabled with aggressive preemption) makes sense. The code appears to effect this, also. There's a slight optimization that can separate the zero-capacity queues during cloning, but the overhead is negligible. ProportionalCapacitPreemptionPolicy handling of corner cases... --- Key: YARN-1957 URL: https://issues.apache.org/jira/browse/YARN-1957 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler, preemption Attachments: YARN-1957.patch, YARN-1957.patch, YARN-1957_test.patch The current version of ProportionalCapacityPreemptionPolicy should be improved to deal with the following two scenarios: 1) when rebalancing over-capacity allocations, it potentially preempts without considering the maxCapacity constraints of a queue (i.e., preempting possibly more than strictly necessary) 2) a zero capacity queue is preempted even if there is no demand (coherent with old use of zero-capacity to disabled queues) The proposed patch fixes both issues, and introduce few new test cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1957) ProportionalCapacitPreemptionPolicy handling of corner cases...
[ https://issues.apache.org/jira/browse/YARN-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-1957: Fix Version/s: 3.0.0 2.5.0 ProportionalCapacitPreemptionPolicy handling of corner cases... --- Key: YARN-1957 URL: https://issues.apache.org/jira/browse/YARN-1957 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler, preemption Fix For: 3.0.0, 2.5.0, 2.4.1 Attachments: YARN-1957.patch, YARN-1957.patch, YARN-1957_test.patch The current version of ProportionalCapacityPreemptionPolicy should be improved to deal with the following two scenarios: 1) when rebalancing over-capacity allocations, it potentially preempts without considering the maxCapacity constraints of a queue (i.e., preempting possibly more than strictly necessary) 2) a zero capacity queue is preempted even if there is no demand (coherent with old use of zero-capacity to disabled queues) The proposed patch fixes both issues, and introduce few new test cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1709) Admission Control: Reservation subsystem
[ https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025401#comment-14025401 ] Chris Douglas commented on YARN-1709: - First pass: {{RLE::addInterval/removeInterval}} * This can return true when totCap == 0; when there's no work to do * There are some redundant calls to {{isSameAsPrevious()}} and {{isSameAsNext()}} when adding a non-zero interval (it wouldn't be in the set if it were equal, so applying the same delta to each preserves this) * Allowing a min/max allocation for the RLE would make this data structure more general/reusable outside this context * These return true for all cases (the {{result}} variable is not necessary). This makes sense, until it adds invariants like min/max constraints. * {{removeInterval}}: is it sufficient to compare against Resource(0, 0) instead of each member separately? * {{removeInterval}}: doesn't roll back the transaction when it throws an exception, so it could leave an interval partially applied. {{RLE::getCapacityAtTime/get*}} * Many of the {{get*}} methods return mutable data, violating the locking. These should clone the objects before returning them. {{RLE::toMemJSONString}}/{{RLE::toString}} * Consider removing the spaces/newlines in the JSON representation * Please use a {{StringBuilder}} instead of concatenation * Please use one of the JSON libraries on the classpath * The {{toString()}} could be very verbose. Consider printing a summary (#steps, min/max, etc.) instead. {{InMemoryPlan}} * This should use a consistent clock time for the tick and archive, or it may archive ticks that have not been observed. * The logic using {{isSuccess}} is confusing. Instead, return false/throw as constraints and invariants are violated, and return true when successful. * {{ReentrantReadWriteLock}} is much slower w/ fair == true; are those semantics required? * Using {{Class::isInstance}} is unconventional; using the instanceof operator or equals() (if it requires an exact match) is more common * The {{*Cis}} fields and functions appear misnamed * The {{updateCis}} function should be two functions, rather than passing {{addOrRemove}} * {{updateCis}} would be easier to read if it established the invariant of the user in the collection, then called {{addInterval}} at the end. * It looks like users in {{userCis}} are not GC'd ** If this is fixed, there's a potential NPE on {{userCis.get(reservation.getUser())}} deref * {{getAllReservations}} should be package-private, {{@VisibleForTesting}} ; remove from interface * {{headMap}} doesn't return null; these checks can be removed * This returns mutable {{Resource}} instances from some {{get*}} methods, violating locking * This creates many instances of {{Resource(0, 0)}}; can some of these be avoided? * This should probably clone the {{Resource}} passed to setTotalCapacity * Please remove the newlines in {{toString()}} {{InMemoryReservationAllocation}} * Fields can be final * Why are some fields protected? * remove newline from {{toString()}} * Since this implements {{compareTo}}, it should also implement {{equals()}} and (particularly since it's added to collections calling it) {{hashCode()}} General/nit * some lines are more than 80 characters * Javadocs contain empty lines * Instead of two lookups on the HashSet {{containsKey()}}/{{get()}}, this can call {{get()}} once and check for {{null}} Admission Control: Reservation subsystem Key: YARN-1709 URL: https://issues.apache.org/jira/browse/YARN-1709 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan Attachments: YARN-1709.patch This JIRA is about the key data structure used to track resources over time to enable YARN-1051. The Reservation subsystem is conceptually a plan of how the scheduler will allocate resources over-time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809874#comment-13809874 ] Chris Douglas commented on YARN-1374: - +1 lgtm Resource Manager fails to start due to ConcurrentModificationException -- Key: YARN-1374 URL: https://issues.apache.org/jira/browse/YARN-1374 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Devaraj K Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1374-1.patch, yarn-1374-1.patch Resource Manager is failing to start with the below ConcurrentModificationException. {code:xml} 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: java.util.ConcurrentModificationException java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioning to standby 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioned to standby 2013-10-30 20:22:42,378 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,379 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24 / {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1324) NodeManager potentially causes unnecessary operations on all its disks
[ https://issues.apache.org/jira/browse/YARN-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809885#comment-13809885 ] Chris Douglas commented on YARN-1324: - bq. When does MR use multiple disks in the same task/container? Isnt the map output written to a single indexed partition file? Spills are spread across all volumes, but merged into a single file at the end. Would randomizing the order of disks be a reasonable short-term workaround for (1)? Future changes could weight/elide directories based on other criteria, but that's a simple change. So would changing the random selection to bias its search order using a hash of the task id (instead of disk usage when creating the spill), so the ShuffleHandler could search fewer directories on average. I agree with Vinod, it would be hard to prevent the search altogether... bq. Requiring apps to specify the number of disks for a container is also a viable solution and can be done in a back-compatible manner by changing MR to specify multiple disks and leaving the default to 1 for apps that dont care. This makes sense as a hint, but some users might interpret it as a constraint and be confused when a NM schedules them on a node the reports fewer local dirs (due to failure, heterogeneous config). NodeManager potentially causes unnecessary operations on all its disks -- Key: YARN-1324 URL: https://issues.apache.org/jira/browse/YARN-1324 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Bikas Saha Currently, for every container, the NM creates a directory on every disk and expects the container-task to choose 1 of them and load balance the use of the disks across all containers. 1) This may have worked fine in the MR world where MR tasks would randomly choose dirs but in general we cannot expect every app/task writer to understand these nuances and randomly pick disks. So we could end up overloading the first disk if most people decide to use the first disk. 2) This makes a number of NM operations to scan every disk (thus randomizing that disk) to locate the dir which the task has actually chosen to use for its files. Makes all these operations expensive for the NM as well as disruptive for users of disks that did not have the real task working dirs. I propose that NM should up-front decide the disk it is assigning to tasks. It could choose to do so randomly or weighted-randomly by looking at space and load on each disk. So it could do a better job of load balancing. Then, it would associate the chosen working directory with the container context so that subsequent operations on the NM can directly seek to the correct location instead of having to seek on every disk. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1471) The SLS simulator is not running the preemption policy for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-1471: Attachment: (was: YARN-1471.patch.2) The SLS simulator is not running the preemption policy for CapacityScheduler Key: YARN-1471 URL: https://issues.apache.org/jira/browse/YARN-1471 Project: Hadoop YARN Issue Type: Bug Reporter: Carlo Curino Assignee: Carlo Curino Priority: Minor Attachments: SLSCapacityScheduler.java, YARN-1471.patch The simulator does not run the ProportionalCapacityPreemptionPolicy monitor. This is because the policy needs to interact with a CapacityScheduler, and the wrapping done by the simulator breaks this. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1471) The SLS simulator is not running the preemption policy for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-1471: Attachment: YARN-1471.patch The SLS simulator is not running the preemption policy for CapacityScheduler Key: YARN-1471 URL: https://issues.apache.org/jira/browse/YARN-1471 Project: Hadoop YARN Issue Type: Bug Reporter: Carlo Curino Assignee: Carlo Curino Priority: Minor Attachments: SLSCapacityScheduler.java, YARN-1471.patch, YARN-1471.patch The simulator does not run the ProportionalCapacityPreemptionPolicy monitor. This is because the policy needs to interact with a CapacityScheduler, and the wrapping done by the simulator breaks this. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1471) The SLS simulator is not running the preemption policy for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-1471: Attachment: YARN-1471.2.patch The SLS simulator is not running the preemption policy for CapacityScheduler Key: YARN-1471 URL: https://issues.apache.org/jira/browse/YARN-1471 Project: Hadoop YARN Issue Type: Bug Reporter: Carlo Curino Assignee: Carlo Curino Priority: Minor Attachments: SLSCapacityScheduler.java, YARN-1471.2.patch, YARN-1471.patch, YARN-1471.patch The simulator does not run the ProportionalCapacityPreemptionPolicy monitor. This is because the policy needs to interact with a CapacityScheduler, and the wrapping done by the simulator breaks this. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1471) The SLS simulator is not running the preemption policy for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852373#comment-13852373 ] Chris Douglas commented on YARN-1471: - I committed this. Thanks Carlo The SLS simulator is not running the preemption policy for CapacityScheduler Key: YARN-1471 URL: https://issues.apache.org/jira/browse/YARN-1471 Project: Hadoop YARN Issue Type: Bug Reporter: Carlo Curino Assignee: Carlo Curino Priority: Minor Fix For: 3.0.0 Attachments: SLSCapacityScheduler.java, YARN-1471.2.patch, YARN-1471.patch, YARN-1471.patch The simulator does not run the ProportionalCapacityPreemptionPolicy monitor. This is because the policy needs to interact with a CapacityScheduler, and the wrapping done by the simulator breaks this. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (YARN-1518) Ensure CapacityScheduler remains compatible with SLS simulator
Chris Douglas created YARN-1518: --- Summary: Ensure CapacityScheduler remains compatible with SLS simulator Key: YARN-1518 URL: https://issues.apache.org/jira/browse/YARN-1518 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Chris Douglas Priority: Minor YARN-1471 added a workaround for the CapacityScheduler and monitors to work with the SLS simulator. This issue explores a cleaner integration, including tests to verify continued compatibility. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-2664) Improve RM webapp to expose info about reservations.
[ https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-2664: Assignee: Matteo Mazzucchelli Improve RM webapp to expose info about reservations. Key: YARN-2664 URL: https://issues.apache.org/jira/browse/YARN-2664 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Matteo Mazzucchelli Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, YARN-2664.patch YARN-1051 provides a new functionality in the RM to ask for reservation on resources. Exposing this through the webapp GUI is important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (YARN-2877) Extend YARN to support distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-2877: Comment: was deleted (was: (ignore that comment, was for YARN-2875)) Extend YARN to support distributed scheduling - Key: YARN-2877 URL: https://issues.apache.org/jira/browse/YARN-2877 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Sriram Rao This is an umbrella JIRA that proposes to extend YARN to support distributed scheduling. Briefly, some of the motivations for distributed scheduling are the following: 1. Improve cluster utilization by opportunistically executing tasks otherwise idle resources on individual machines. 2. Reduce allocation latency. Tasks where the scheduling time dominates (i.e., task execution time is much less compared to the time required for obtaining a container from the RM). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (YARN-2877) Extend YARN to support distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-2877: Comment: was deleted (was: Linking to HADOOP-11317 to cover project-wide use. I don't think yarn-common needs to explicitly declare a dependency on log4j, at least outside the test run. If you comment out that dependency —does everything still build?) Extend YARN to support distributed scheduling - Key: YARN-2877 URL: https://issues.apache.org/jira/browse/YARN-2877 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Sriram Rao This is an umbrella JIRA that proposes to extend YARN to support distributed scheduling. Briefly, some of the motivations for distributed scheduling are the following: 1. Improve cluster utilization by opportunistically executing tasks otherwise idle resources on individual machines. 2. Reduce allocation latency. Tasks where the scheduling time dominates (i.e., task execution time is much less compared to the time required for obtaining a container from the RM). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2877) Extend YARN to support distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-2877: Assignee: Konstantinos Karanasos Extend YARN to support distributed scheduling - Key: YARN-2877 URL: https://issues.apache.org/jira/browse/YARN-2877 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Sriram Rao Assignee: Konstantinos Karanasos This is an umbrella JIRA that proposes to extend YARN to support distributed scheduling. Briefly, some of the motivations for distributed scheduling are the following: 1. Improve cluster utilization by opportunistically executing tasks otherwise idle resources on individual machines. 2. Reduce allocation latency. Tasks where the scheduling time dominates (i.e., task execution time is much less compared to the time required for obtaining a container from the RM). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2718) Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292258#comment-14292258 ] Chris Douglas commented on YARN-2718: - I share Allen's skepticism. Adding this to the CLC is an invasive change. If the purpose is debugging, wouldn't a composite CE that does the demux be sufficient? Are there other use cases this supports? Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor --- Key: YARN-2718 URL: https://issues.apache.org/jira/browse/YARN-2718 Project: Hadoop YARN Issue Type: New Feature Reporter: Abin Shahab Attachments: YARN-2718.patch There should be a composite container that allows users to run their jobs in DockerContainerExecutor, but switch to DefaultContainerExecutor for debugging purposes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306738#comment-14306738 ] Chris Douglas commented on YARN-3100: - Motivation for the conversion from {{QueueACL}} to the nearly identical, new {{YarnAuthorizationProvider.AccessType}}- like the introduction of {{PrivilegedEntity}}- is not obvious. Are these pluggable types? Are there other, future entities besides queues? Should the authorizer plugin perform the mapping from {{QueueACL}}? Just trying to understand the design... For the {{Default\*}} impl, partial updates for {{refreshQueues}} that become visible during the update and after a partial, failed update are hard to reason about. While it's a noop for external services, aren't these different semantics from the current implementation? Readers are blocked, so there are no locks necessary for modifications by {{setPermission}}? Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3100.1.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306177#comment-14306177 ] Chris Douglas commented on YARN-3100: - [~aw], have you read through the patch? What it implements looks like a pretty straightfoward application of the common ACL libraries to queues and applications. It just routes some of the YARN checks to a configurable component. Is there functionality implemented in the common libs that's not being used? A few quick questions: * What is the behavior of {{refreshQueues}}? It looks like the provider class remains fixed (should it throw an exception if the class in the conf doesn't match the singleton?), but every queue's ACLs get reset from the config. The refresh isn't transactional, though... if it fails partway through, the ACLs could be partially refreshed in the provider. Is that correct? If the provider is {{Configurable}}, then it also doesn't get reconfigured, as it will return the singleton from the first call to {{getInstance()}} * Could we avoid pluggable implementations with a {{Default\*}} class? A descriptive name is easier to change and... well, descriptive. * {{PrivilegedEntity}} is an odd class. Would it be possible to expand on its definition in the javadoc, and (as a public class) add annotations for its intended audience (HADOOP-5073)? Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3100.1.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308559#comment-14308559 ] Chris Douglas commented on YARN-3100: - bq. I agree with you that if construction of Q' fails, we possibly get a mix of Q' and Q ACLs, which happens in the existing code. I think the existing code doesn't have this property. ACLs [parsed|https://git1-us-west.apache.org/repos/asf?p=hadoop.git;a=blob;f=hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java;h=c1432101510b30cab5979223c4a52b813cfc7aee;hb=HEAD#l156] from the config are stored in a [member field|https://git1-us-west.apache.org/repos/asf?p=hadoop.git;a=blob;f=hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java;h=e4c26658b0bf5301892ce7c618402ece3a6ea360;hb=HEAD#l273]. If construction fails, those ACLs aren't installed. The patch moves enforcement to the authorizer: {noformat} public boolean hasAccess(QueueACL acl, UserGroupInformation user) { synchronized (this) { - if (acls.get(acl).isUserAllowed(user)) { + if (authorizer.checkPermission(toAccessType(acl), queueEntity, user)) { return true; } } {noformat} Which is updated during construction of the replacement queue hierarchy. Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3100.1.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309909#comment-14309909 ] Chris Douglas commented on YARN-3100: - Agreed; definitely a separate JIRA. As state is copied from the old queues, some of the methods called in {{CSQueueUtils}} throw exceptions, similar to the case you found in {{LeafQueue}}. Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3100.1.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308126#comment-14308126 ] Chris Douglas commented on YARN-3100: - bq. The reinitializeQueues looks to be transactional, it instantiates all new sub queues first and then update the root queue and child queues accordingly. And the checkAccess chain will compete the same scheduler lock with the refreshQueue. If there's a queue with root _Q_, say we're constructing _Q'_. In the current patch, the {{YarnAuthorizationProvider}} singleton instance will get calls to {{setPermission()}} during construction of _Q'_. These (1) will be observable by readers of _Q_ who share the instance. I agree that if construction of _Q'_ fails then it won't get installed, but (2) _Q_ will run with a mix of _Q'_ and _Q_ ACLs because each call to {{setPermission()}} overwrites what was installed for _Q_. I'm curious if (1) and (2) are an artifact of the new plugin architecture or if this is also happens in the existing code. Not for external implementations, but for the {{Default\*}} one. bq. Alternatively, the plug-in can choose to add new acl via the setPermission when refreshQueue is invoked, but not to replace existing acl. Also, whether to add new or update or no, this is something that plug-in itself can decide or make it configurable by user. Maybe I'm being dense, but I don't see how a plugin could implement those semantics cleanly. {{YarnAuthorizationProvider}} forces the instance to be a singleton, and it gets some sequence of calls to {{setPermission()}}. Since queues can't be deleted in the CS, I suppose it could track the sequence of calls that install ACLs and only publish new ACLs when it's received updates for everything, but that could still yield (2) if the refresh adds new queues before the refresh fails. Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3100.1.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306937#comment-14306937 ] Chris Douglas commented on YARN-3100: - bq. I'm not sure if I get your point, the DefaultYarnAuthorizer currently uses a concurrentHashMap to store the acls, setPermission is currently used on queue initialization. So I think lock on setPermission is not needed ? Could the RM be in a state where the old version of ACLs are applied to one queue, but a new version is applied to another (a client observes the new ACLs while they're being installed)? I think this is true of scenarios where {{refreshQueues()}} fails, but I don't know if intermediate states are observable. Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3100.1.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308987#comment-14308987 ] Chris Douglas commented on YARN-3100: - Looking through {{AbstractCSQueue}} and {{CSQueueUtils}}, it looks like there are many misconfigurations that leave queues in an inconsistent state... Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3100.1.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3074) Nodemanager dies when localizer runner tries to write to a full disk
[ https://issues.apache.org/jira/browse/YARN-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284587#comment-14284587 ] Chris Douglas commented on YARN-3074: - bq. catch FSError since it will be a common and recoverable error in this case. +1 Nodemanager dies when localizer runner tries to write to a full disk Key: YARN-3074 URL: https://issues.apache.org/jira/browse/YARN-3074 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Varun Saxena When a LocalizerRunner tries to write to a full disk it can bring down the nodemanager process. Instead of failing the whole process we should fail only the container and make a best attempt to keep going. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3177) Fix the order of the parameters in YarnConfiguration
[ https://issues.apache.org/jira/browse/YARN-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318874#comment-14318874 ] Chris Douglas commented on YARN-3177: - [~brahmareddy] moving code for readability is completely reasonable. In this particular instance, {{YarnConfiguration}} is a set of fields... Javadoc orders them and devs will look up the symbol directly. Those two cover basically all the users of the class; it's almost never read. Restructuring it offers a low payoff, compared to maintaining the history of when and why that field was added to {{YarnConfiguration}}. Of course that's still available, but this adds another lookup for a developer, which is more common. Fix the order of the parameters in YarnConfiguration Key: YARN-3177 URL: https://issues.apache.org/jira/browse/YARN-3177 Project: Hadoop YARN Issue Type: Improvement Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Minor Attachments: YARN-3177.patch *1. keep Process principal and keytab one place..( NM and RM are not placed in order)* {code} public static final String RM_AM_MAX_ATTEMPTS = RM_PREFIX + am.max-attempts; public static final int DEFAULT_RM_AM_MAX_ATTEMPTS = 2; /** The keytab for the resource manager.*/ public static final String RM_KEYTAB = RM_PREFIX + keytab; /**The kerberos principal to be used for spnego filter for RM.*/ public static final String RM_WEBAPP_SPNEGO_USER_NAME_KEY = RM_PREFIX + webapp.spnego-principal; /**The kerberos keytab to be used for spnego filter for RM.*/ public static final String RM_WEBAPP_SPNEGO_KEYTAB_FILE_KEY = RM_PREFIX + webapp.spnego-keytab-file; {code} *2.RM webapp adress and port are not in order* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3192) Empty handler for exception: java.lang.InterruptedException #WebAppProxy.java and #/ResourceManager.java
[ https://issues.apache.org/jira/browse/YARN-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas resolved YARN-3192. - Resolution: Not a Problem Calling {{System.exit(-1)}} is not an acceptable way to shut down the RM. Please review the surrounding code. I'm going to close this, until we can tie a bug to this code. Graceful shutdown is difficult to effect, and this issue's scope is too narrow to contribute to it. [~brahmareddy], many of the JIRAs you're filing appear to be detected by automated tools. If the interrupt handling here can cause hangs, HA bugs, inconsistent replies to users, etc. then please file reports on the consequences, citing this as the source. Empty handler for exception: java.lang.InterruptedException #WebAppProxy.java and #/ResourceManager.java Key: YARN-3192 URL: https://issues.apache.org/jira/browse/YARN-3192 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Attachments: YARN-3192.patch The InterruptedException is completely ignored. As a result, any events causing this interrupt will be lost. File: org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java {code} try { event = eventQueue.take(); } catch (InterruptedException e) { LOG.error(Returning, interrupted : + e); return; // TODO: Kill RM. } {code} File: org/apache/hadoop/yarn/server/webproxy/WebAppProxy.java {code} public void join() { if(proxyServer != null) { try { proxyServer.join(); } catch (InterruptedException e) { } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294529#comment-14294529 ] Chris Douglas commented on YARN-1039: - Requiring accurate estimates is not realistic, but no service runs forever in the same container(s). If container leases can be renewed/refreshed, that's a manageable and realistic guarantee for the user (couldn't find a JIRA; it must exist). Migration, decommission, OS upgrades, and other operations-in-time on containers seem necessary to support long-running services, since preemption is comparably heavy-handed. Specifying a precise duration may be a little pedantic for the existing use cases, but it seems like the right abstraction. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1983) Support heterogeneous container types at runtime on YARN
[ https://issues.apache.org/jira/browse/YARN-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300975#comment-14300975 ] Chris Douglas commented on YARN-1983: - As in YARN-2718: can't this be implemented as a composite CE, rather than changing the CLC? Managing versions of the CE, selecting a compatible CE, matching in the scheduler, etc. will require more than the classname to match. Configuring multiple CEs covers some useful cases, but if a composite CE is sufficient to experiment, then we can avoid a kludge in the protocol. Support heterogeneous container types at runtime on YARN Key: YARN-1983 URL: https://issues.apache.org/jira/browse/YARN-1983 Project: Hadoop YARN Issue Type: Improvement Reporter: Junping Du Attachments: YARN-1983.2.patch, YARN-1983.patch Different container types (default, LXC, docker, VM box, etc.) have different semantics on isolation of security, namespace/env, performance, etc. Per discussions in YARN-1964, we have some good thoughts on supporting different types of containers running on YARN and specified by application at runtime which largely enhance YARN's flexibility to meet heterogenous app's requirement on isolation at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294613#comment-14294613 ] Chris Douglas commented on YARN-1039: - [~cwelch] YARN shouldn't understand the lifecycle for a service or the progress/dependencies for task containers. As proposed, an AM will receive a lease on a container for some duration. Before the lease expires, it can relinquish the lease or request that it be renewed. While this adds some complexity in the AM implementation- it needs to track and renew its container leases- it's mostly library code that admits straightforward, naive implementations. The most obvious strawman would request all resources at the longest possible duration and always renew. Mapping an enumeration expressing an AM lifecycle into a policy for requesting, refreshing, and managing resources is an excellent client-side abstraction. Even if an implementation of YARN only receives (and only issues) leases from a fixed set of values, the underlying abstraction can admit arbitrary durations. An enumeration is a good API for applications, but it's the RM framework could have a more fine-grained substrate. Leases actually help services run under YARN. By way of example, refusing to renew a lease could signal that the node will be decommissioned, or that some cluster-wide invariant- like balanced utilization or fairness- is better met by (re)moving that container. Refusing to renew a lease- or renewing it for a shorter period- could signal the service to request new containers. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313843#comment-14313843 ] Chris Douglas commented on YARN-3100: - Sorry, I didn't get to the patch over the weekend. Thanks for addressing the review feedback. Are there JIRAs following some of the types to be added to PrivilegedEntity? Just curious. Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Fix For: 2.7.0 Attachments: YARN-3100.1.patch, YARN-3100.2.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1983) Support heterogeneous container types at runtime on YARN
[ https://issues.apache.org/jira/browse/YARN-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313831#comment-14313831 ] Chris Douglas commented on YARN-1983: - bq. We still need a way to demux the executor to support the case of YARN cluster with a mix of executors. That'd mean some impact on the CLC, no? Policies that select the appropriate executor could demux on the contents of the CLC and not a dedicated field. A simple, static dispatch from an admin-configured list is a great place to start, but adding a string to the CLC that selects the executor class by name is difficult to evolve. Since the same semantics are available without changes to the platform, why bake these in? bq. I think my current patch is intrusive indeed but more general, right? I'm not sure I follow. How is it more general? Support heterogeneous container types at runtime on YARN Key: YARN-1983 URL: https://issues.apache.org/jira/browse/YARN-1983 Project: Hadoop YARN Issue Type: Improvement Reporter: Junping Du Attachments: YARN-1983.2.patch, YARN-1983.patch Different container types (default, LXC, docker, VM box, etc.) have different semantics on isolation of security, namespace/env, performance, etc. Per discussions in YARN-1964, we have some good thoughts on supporting different types of containers running on YARN and specified by application at runtime which largely enhance YARN's flexibility to meet heterogenous app's requirement on isolation at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3192) Empty handler for exception: java.lang.InterruptedException #WebAppProxy.java and #/ResourceManager.java
[ https://issues.apache.org/jira/browse/YARN-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320645#comment-14320645 ] Chris Douglas commented on YARN-3192: - bq. w.r.t the WebAppProxy path; we could change the join() method to simply pass up the exception; the sole place it is used is WebAppProxyServer.main, which catches all throwables and exits with a (-1) AFAICT, there is no graceful shtudown for {{WebAppProxyServer}}; the intent is to exit on interrupt. This would print an error message, Error starting Proxy server when the proxy is shut down instead of silently exiting. Though catching the {{InterruptedException}} in {{WebAppProxyServer}} is arguably more correct, so throwing out of {{WebAppProxy::join()}} could be a useful change if there are ever other users of {{WebAppProxy}}. That said, I'm still not clear what this would achieve. Empty handler for exception: java.lang.InterruptedException #WebAppProxy.java and #/ResourceManager.java Key: YARN-3192 URL: https://issues.apache.org/jira/browse/YARN-3192 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Attachments: YARN-3192.patch The InterruptedException is completely ignored. As a result, any events causing this interrupt will be lost. File: org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java {code} try { event = eventQueue.take(); } catch (InterruptedException e) { LOG.error(Returning, interrupted : + e); return; // TODO: Kill RM. } {code} File: org/apache/hadoop/yarn/server/webproxy/WebAppProxy.java {code} public void join() { if(proxyServer != null) { try { proxyServer.join(); } catch (InterruptedException e) { } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die
[ https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-3369: Description: In AppSchedulingInfo.java the method checkForDeactivation() has these 2 consecutive lines: {code} ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY); if (request.getNumContainers() 0) { {code} the first line calls getResourceRequest and it can return null. {code} synchronized public ResourceRequest getResourceRequest( Priority priority, String resourceName) { MapString, ResourceRequest nodeRequests = requests.get(priority); return (nodeRequests == null) ? {color:red} null : nodeRequests.get(resourceName); } {code} The second line dereferences the pointer directly without a check. If the pointer is null, the RM dies. {quote}2015-03-17 14:14:04,757 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739) at java.lang.Thread.run(Thread.java:722) {color:red} *2015-03-17 14:14:04,758 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..*{color} {quote} was: In AppSchedulingInfo.java the method checkForDeactivation() has these 2 consecutive lines: {quote} {color:red} ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY); if (request.getNumContainers() 0) { {color} {quote} the first line calls getResourceRequest and it can return null. {quote} synchronized public ResourceRequest getResourceRequest( Priority priority, String resourceName) { MapString, ResourceRequest nodeRequests = requests.get(priority); {color:red} *return* {color} (nodeRequests == null) ? {color:red} *null* {color} : nodeRequests.get(resourceName); } {quote} The second line dereferences the pointer directly without a check. If the pointer is null, the RM dies. {quote}2015-03-17 14:14:04,757 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559) at
[jira] [Commented] (YARN-3338) Exclude jline dependency from YARN
[ https://issues.apache.org/jira/browse/YARN-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358963#comment-14358963 ] Chris Douglas commented on YARN-3338: - +1 lgtm Exclude jline dependency from YARN -- Key: YARN-3338 URL: https://issues.apache.org/jira/browse/YARN-3338 Project: Hadoop YARN Issue Type: Bug Components: build Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-3338.1.patch It was fixed in YARN-2815, but is broken again by YARN-1514. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298135#comment-14298135 ] Chris Douglas commented on YARN-1039: - bq. That's not necessarily so, there are some cases where the type of life cycle for an application is important, for example, when determining whether or not it is open-ended (service) or a batch process which entails a notion of progress (session), at least for purposes of display. That's a fair distinction. Would you agree the YARN _scheduler_ should not use detailed information about progress, task dependencies, or service lifecycles? If an AM registers with a tag that affects the attributes displayed in dashboards, then issues like YARN-1079 can be resolved cleanly, as you and Zhijie propose. Steve has a point about mixed-mode AMs that run both long and short-lived containers (e.g., a long-lived service supporting a workflow composed of short tasks). If it's solely for display, then an enum seems adequate, but I'd like to better understand the use cases. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549505#comment-14549505 ] Chris Douglas commented on YARN-1039: - The semantics of a boolean flag are opaque. The policies enforced by different RM configurations (and versions) will not be- and cannot be made to be- consistent. Application and container priority are already encoded (or in progress, YARN-1963), so it's not just preemption priority or cost. Affinity and anti-affinity are also covered by different features. Discussion has been wide-ranging because it is unclear what long-lived guarantees across existing features (beyond removing the progress bar from the UI, which I hope we can stop mentioning). An implementation that only recognizes infinite and undefined leases could be mapped into duration. Lease duration could also be used to communicate when security tokens cannot be renewed, short-lived guarantees for YARN-2877 containers, boundaries of YARN-1051 reservations, and planned decommissioning. In contrast, the long-lived flag cannot be used for these cases. We could expose probabilistic guarantees (which are what we give in reality), but that's a later issue. Considering the blockers more concretely: bq. (a) reservations (b) white-listed requests or (c) node-label requests getting stuck on a node used by other services' containers that don't exit. Aren't these handled by adding a timeout to allocations, which would also catch cases where this flag is _not_ set? The timeout value could be set across the scheduler to start, but could even be user-visible in later versions... All said, I don't have time to work on this, agree the API can be evolved from the flag, and am -0 on it. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3806) Proposal of Generic Scheduling Framework for YARN
[ https://issues.apache.org/jira/browse/YARN-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598163#comment-14598163 ] Chris Douglas commented on YARN-3806: - [~wshao] Please don't delete obsoleted versions of the design doc, as it orphans discussion about them. Also, as you're making updates, please note the changes so people don't have to diff the docs. Proposal of Generic Scheduling Framework for YARN - Key: YARN-3806 URL: https://issues.apache.org/jira/browse/YARN-3806 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Wei Shao Attachments: ProposalOfGenericSchedulingFrameworkForYARN-V1.05.pdf, ProposalOfGenericSchedulingFrameworkForYARN-V1.06.pdf Currently, a typical YARN cluster runs many different kinds of applications: production applications, ad hoc user applications, long running services and so on. Different YARN scheduling policies may be suitable for different applications. For example, capacity scheduling can manage production applications well since application can get guaranteed resource share, fair scheduling can manage ad hoc user applications well since it can enforce fairness among users. However, current YARN scheduling framework doesn’t have a mechanism for multiple scheduling policies work hierarchically in one cluster. YARN-3306 talked about many issues of today’s YARN scheduling framework, and proposed a per-queue policy driven framework. In detail, it supported different scheduling policies for leaf queues. However, support of different scheduling policies for upper level queues is not seriously considered yet. A generic scheduling framework is proposed here to address these limitations. It supports different policies (fair, capacity, fifo and so on) for any queue consistently. The proposal tries to solve many other issues in current YARN scheduling framework as well. Two new proposed scheduling policies YARN-3807 YARN-3808 are based on generic scheduling framework brought up in this proposal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3119) Memory limit check need not be enforced unless aggregate usage of all containers is near limit
[ https://issues.apache.org/jira/browse/YARN-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588429#comment-14588429 ] Chris Douglas commented on YARN-3119: - Systems that embrace more forgiving resource enforcement are difficult to tune, particularly if those jobs run in multiple environments with different constraints (as is common when moving from research/test to production). If jobs silently and implicitly use more resources than requested, then users only learn that their container is under-provisioned when the cluster workload shifts, and their pipelines start to fail. I agree with [~aw]'s [feedback|https://issues.apache.org/jira/browse/YARN-3119?focusedCommentId=14303956page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14303956]. If this workaround is committed, this should be disabled by default and strongly discouraged. Memory limit check need not be enforced unless aggregate usage of all containers is near limit -- Key: YARN-3119 URL: https://issues.apache.org/jira/browse/YARN-3119 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3119.prelim.patch Today we kill any container preemptively even if the total usage of containers for that is well within the limit for YARN. Instead if we enforce memory limit only if the total limit of all containers is close to some configurable ratio of overall memory assigned to containers, we can allow for flexibility in container memory usage without adverse effects. This is similar in principle to how cgroups uses soft_limit_in_bytes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1983) Support heterogeneous container types at runtime on YARN
[ https://issues.apache.org/jira/browse/YARN-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584272#comment-14584272 ] Chris Douglas commented on YARN-1983: - (sorry for the delayed reply; missed this) bq. I was proposing we continue the same without adding a new CLC field. Are we both saying the same thing then? Yeah, I think we agree. We don't need to extend the CLC definition for this use case, because it's less invasive to add a composite CE that can inspect the CLC and demux on a set of rules. I scanned the patch on YARN-1964, and maybe I'm being dense but I couldn't find the demux. It does some validation using patterns... Support heterogeneous container types at runtime on YARN Key: YARN-1983 URL: https://issues.apache.org/jira/browse/YARN-1983 Project: Hadoop YARN Issue Type: Improvement Reporter: Junping Du Attachments: YARN-1983.2.patch, YARN-1983.patch Different container types (default, LXC, docker, VM box, etc.) have different semantics on isolation of security, namespace/env, performance, etc. Per discussions in YARN-1964, we have some good thoughts on supporting different types of containers running on YARN and specified by application at runtime which largely enhance YARN's flexibility to meet heterogenous app's requirement on isolation at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3820) Collect disks usages on the node
[ https://issues.apache.org/jira/browse/YARN-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605889#comment-14605889 ] Chris Douglas commented on YARN-3820: - [~aw] Is there a corresponding part of the datanode already monitoring these resources? I looked, but found only the metrics. This JIRA and YARN-3819 only extend the monitoring. As Karthik pointed out in YARN-2745, refactoring for more unified resource monitoring is in YARN-3332. On the patch: looks good, though why does the disk need a {{forcedRead}} parameter? Collect disks usages on the node Key: YARN-3820 URL: https://issues.apache.org/jira/browse/YARN-3820 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Robert Grandl Assignee: Robert Grandl Labels: yarn-common, yarn-util Attachments: YARN-3820-1.patch, YARN-3820-2.patch, YARN-3820-3.patch, YARN-3820-4.patch In this JIRA we propose to collect disks usages on a node. This JIRA is part of a larger effort of monitoring resource usages on the nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3820) Collect disks usages on the node
[ https://issues.apache.org/jira/browse/YARN-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605891#comment-14605891 ] Chris Douglas commented on YARN-3820: - Oh, didn't see YARN-3819. Will continue there. Collect disks usages on the node Key: YARN-3820 URL: https://issues.apache.org/jira/browse/YARN-3820 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Robert Grandl Assignee: Robert Grandl Labels: yarn-common, yarn-util Attachments: YARN-3820-1.patch, YARN-3820-2.patch, YARN-3820-3.patch, YARN-3820-4.patch In this JIRA we propose to collect disks usages on a node. This JIRA is part of a larger effort of monitoring resource usages on the nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3819) Collect network usage on the node
[ https://issues.apache.org/jira/browse/YARN-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606014#comment-14606014 ] Chris Douglas commented on YARN-3819: - bq. I think that if we decide to move this to Common, we should move the whole ResourceCalculator; otherwise, just finish this one here. I'm willing to start the JIRA in Common (or reuse if anybody knows about a JIRA already pushing for that) to have the whole ResourceCalculator there. +1 Let's just do this and move on. Collect network usage on the node - Key: YARN-3819 URL: https://issues.apache.org/jira/browse/YARN-3819 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Robert Grandl Assignee: Robert Grandl Labels: yarn-common, yarn-util Attachments: YARN-3819-1.patch, YARN-3819-2.patch, YARN-3819-3.patch, YARN-3819-4.patch, YARN-3819-5.patch In this JIRA we propose to collect the network usage on a node. This JIRA is part of a larger effort of monitoring resource usages on the nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3820) Collect disks usages on the node
[ https://issues.apache.org/jira/browse/YARN-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606047#comment-14606047 ] Chris Douglas commented on YARN-3820: - I understand its function, I'm curious why it was added (CPU doesn't include this). Did you notice an overhead? Collect disks usages on the node Key: YARN-3820 URL: https://issues.apache.org/jira/browse/YARN-3820 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Robert Grandl Assignee: Robert Grandl Labels: yarn-common, yarn-util Attachments: YARN-3820-1.patch, YARN-3820-2.patch, YARN-3820-3.patch, YARN-3820-4.patch In this JIRA we propose to collect disks usages on a node. This JIRA is part of a larger effort of monitoring resource usages on the nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3784) Indicate preemption timout along with the list of containers to AM (preemption message)
[ https://issues.apache.org/jira/browse/YARN-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606205#comment-14606205 ] Chris Douglas commented on YARN-3784: - Minor: - Docs for timeout don't include units - Many whitespace changes in {{FiCaSchedulerApp}} - change nested if to {{}} at: {noformat} +if (this.preemptionTimeout != 0) { + if (timeout this.preemptionTimeout) { {noformat} - Would it be possible to test more than the timeout reported is non-zero? If this used a {{Clock}} instead of calling {{System.currentTimeMillis}} directly, the unit test could be easier to write... If containers are preempted for multiple causes (e.g., over-capacity, NM decommission), then the time to preempt could vary widely. The ProportionalCPP also limits the preempted capacity per round, so a global timeout will be very pessimistic. Would it make sense to change {{timeout}} to be {{nextkill}}? More general solutions would be significantly more work... Indicate preemption timout along with the list of containers to AM (preemption message) --- Key: YARN-3784 URL: https://issues.apache.org/jira/browse/YARN-3784 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-3784.patch Currently during preemption, AM is notified with a list of containers which are marked for preemption. Introducing a timeout duration also along with this container list so that AM can know how much time it will get to do a graceful shutdown to its containers (assuming one of preemption policy is loaded in AM). This will help in decommissioning NM scenarios, where NM will be decommissioned after a timeout (also killing containers on it). This timeout will be helpful to indicate AM that those containers can be killed by RM forcefully after the timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3877) YarnClientImpl.submitApplication swallows exceptions
[ https://issues.apache.org/jira/browse/YARN-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627026#comment-14627026 ] Chris Douglas commented on YARN-3877: - * Not sure I understand this change: {noformat} +conf.setLong(YarnConfiguration. +YARN_CLIENT_APPLICATION_CLIENT_PROTOCOL_POLL_TIMEOUT_MS, 2000); {noformat} It seems like it would introduce timing bugs rather than prevent them. The {{\@Test}} timeout should prevent the test from hanging; if the poll timeout fires before the interrupt is triggered, then the unit test will fail. Does config enforce a property that would be unverified without it? * If necessary, then it should probably also be relative to {{pollIntervalMs}} * This should probably be a separate test, instead of a subsection of {{testSubmitApplication}} YarnClientImpl.submitApplication swallows exceptions Key: YARN-3877 URL: https://issues.apache.org/jira/browse/YARN-3877 Project: Hadoop YARN Issue Type: Improvement Components: client Affects Versions: 2.7.2 Reporter: Steve Loughran Assignee: Varun Saxena Priority: Minor Attachments: YARN-3877.01.patch When {{YarnClientImpl.submitApplication}} spins waiting for the application to be accepted, any interruption during its Sleep() calls are logged and swallowed. this makes it hard to interrupt the thread during shutdown. Really it should throw some form of exception and let the caller deal with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3612) Resource calculation in child tasks is CPU-heavy
[ https://issues.apache.org/jira/browse/YARN-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617391#comment-14617391 ] Chris Douglas commented on YARN-3612: - bq. Moreover, I have not added the config as I do not see anyone disabling it. Thoughts ? The config change could be useful for MR jobs that want to avoid the CPU overhead. Suggest changing calls to {{nanoTime}} from {{currentTimeMillis}} since it's measuring durations. +1 overall, even without the change to {{Task}}. Resource calculation in child tasks is CPU-heavy Key: YARN-3612 URL: https://issues.apache.org/jira/browse/YARN-3612 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Todd Lipcon Labels: BB2015-05-RFC, performance Attachments: MAPREDUCE-4469.patch, MAPREDUCE-4469_rev2.patch, MAPREDUCE-4469_rev3.patch, MAPREDUCE-4469_rev4.patch, MAPREDUCE-4469_rev5.patch, YARN-3612.01.patch, YARN-3612.02.patch In doing some benchmarking on a hadoop-1 derived codebase, I noticed that each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed that it's spending a lot of time looping through all the files in /proc to calculate resource usage. As a test, I added a flag to disable use of the ResourceCalculatorPlugin within the tasks. On a CPU-bound 500G-sort workload, this improved total job runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-666) [Umbrella] Support rolling upgrades in YARN
[ https://issues.apache.org/jira/browse/YARN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-666: --- Assignee: (was: Brook Zhou) > [Umbrella] Support rolling upgrades in YARN > --- > > Key: YARN-666 > URL: https://issues.apache.org/jira/browse/YARN-666 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Siddharth Seth > Fix For: 2.6.0 > > Attachments: YARN_Rolling_Upgrades.pdf, YARN_Rolling_Upgrades_v2.pdf > > > Jira to track changes required in YARN to allow rolling upgrades, including > documentation and possible upgrade routes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)