wilfred-s commented on code in PR #457: URL: https://github.com/apache/yunikorn-site/pull/457#discussion_r1682486677
########## docs/user_guide/preemption.md: ########## @@ -228,4 +228,31 @@ In this example, two imbalances are observed: | `rt.ten-a.queue-2` | 0 | 0 | | `rt.ten-b` | 15 | 10 | | `rt.ten-b.queue-3` | 15 | 10 | -| `rt.sys` | 0 | 10 | \ No newline at end of file +| `rt.sys` | 0 | 10 | + +### Redistribution of Quota and Preemption Storm + +#### Redistribution of Quota + +Setting up guaranteed resources for the queue present at a higher level in the whole queue hierarchy helps to re-distribute the quota among different groups especially when workloads of the same priority run in different groups. Unlike the default scheduler, Yunikorn preempts even the workloads of the same priority to free up resources for pending workloads who deserve to get the resources as per guaranteed quota. At times, one needs this kind of queue set up in a real production cluster for redistribution. Review Comment: drop 'whole' from "the whole queue" start new sentence at "especially" Yunikorn with a capital K drop 'even the' from "even the workloads" add "the queues' " to "as per the queues' guaranteed quota" set up is one word "setup" ########## docs/user_guide/preemption.md: ########## @@ -228,4 +228,31 @@ In this example, two imbalances are observed: | `rt.ten-a.queue-2` | 0 | 0 | | `rt.ten-b` | 15 | 10 | | `rt.ten-b.queue-3` | 15 | 10 | -| `rt.sys` | 0 | 10 | \ No newline at end of file +| `rt.sys` | 0 | 10 | + +### Redistribution of Quota and Preemption Storm + +#### Redistribution of Quota + +Setting up guaranteed resources for the queue present at a higher level in the whole queue hierarchy helps to re-distribute the quota among different groups especially when workloads of the same priority run in different groups. Unlike the default scheduler, Yunikorn preempts even the workloads of the same priority to free up resources for pending workloads who deserve to get the resources as per guaranteed quota. At times, one needs this kind of queue set up in a real production cluster for redistribution. + +For example, root.region[1-N].country[1-N].state[1-N] + + + +This queue set up has N regions under “root”, each region has N countries. If administrators want to redistribute the workloads of the same priority among different regions, then it is better to define the guaranteed quota for each region so that preemption helps to reach the situation of running the workloads by redistribution based on the guaranteed quota each region is supposed to get. That way each region uses the resources it deserves to get at the maximum possible level from the overall cluster resources. Review Comment: set up is one word "setup" ########## docs/user_guide/preemption.md: ########## @@ -228,4 +228,31 @@ In this example, two imbalances are observed: | `rt.ten-a.queue-2` | 0 | 0 | | `rt.ten-b` | 15 | 10 | | `rt.ten-b.queue-3` | 15 | 10 | -| `rt.sys` | 0 | 10 | \ No newline at end of file +| `rt.sys` | 0 | 10 | + +### Redistribution of Quota and Preemption Storm + +#### Redistribution of Quota + +Setting up guaranteed resources for the queue present at a higher level in the whole queue hierarchy helps to re-distribute the quota among different groups especially when workloads of the same priority run in different groups. Unlike the default scheduler, Yunikorn preempts even the workloads of the same priority to free up resources for pending workloads who deserve to get the resources as per guaranteed quota. At times, one needs this kind of queue set up in a real production cluster for redistribution. + +For example, root.region[1-N].country[1-N].state[1-N] + + + +This queue set up has N regions under “root”, each region has N countries. If administrators want to redistribute the workloads of the same priority among different regions, then it is better to define the guaranteed quota for each region so that preemption helps to reach the situation of running the workloads by redistribution based on the guaranteed quota each region is supposed to get. That way each region uses the resources it deserves to get at the maximum possible level from the overall cluster resources. + +#### Preemption Storm + +With setup like above, there is a side effect of increasing the possibilities of preemption storm or loop happening within the specific region between different state queues (siblings belonging to same parent). + +ReplicaSets are a good example to look at for looping and circular preemption. Each time a pod from a replica set is removed the ReplicaSet controller will create a new pod to make sure the set is complete. That auto-recreation could trigger loops as described below. + + + +Replica set <i>State1 Repl</i> runs in queue <i>State1</i>. Replica set <i>State2 Repl</i> runs in the queue <i>State2</i>. Both queues belong to the same parent queue (they are siblings), <i>Country1</i>. The pods all run with the same settings for priority and preemption. There is no space left on the cluster. <i>State1</i> has no guaranteed quota, 4 pods of each vcores:1 are running and multiple pods of each vcores:1 of the replica set are pending. <i>State2</i> has no guaranteed quota, 4 pods of each vcores:1 are running and multiple pods of each vcores:1 of the replica set are pending. Both region, <i>region1</i> and country, <i>country1</i> queue usage is vcores:4. Since <i>region1</i> has a guaranteed quota of vcores:10 and usage of vcores:8 lower than its guaranteed quota leading to starvation of resources. All the queues (including both direct or indirect) below the parent queue are starving as it inherits the “under guaranteed” behavior from above said parent queue, < i>region1</i> calculation unless each state (leaf) queue has its own guaranteed quota. Now, either one of these state queues can trigger preemption. + +Let's say, <i>state1</i> triggers preemption to meet resource requirements for pending pods. +To make room for a <i>State1 Repl</i> pod, a pod from the <i>State2 Repl</i> set is preempted. Now, the pending <i>State1 Repl</i> pod moves from pending to running. Now, the next scheduling cycle comes. Let's say, <i>State2</i> triggers preemption to meet resource requirements for its pending pods. In addition to already existing pending pods, pod preempted (killed) in earlier scheduling cycles would have been recreated automatically by this time as it is a replica set. To make room for a <i>State2 Repl</i> pod, a pod from the <i>State1 Repl</i> set is preempted. Now, the pending <i>State2 Repl</i> pod moves from pending to running and preempted (killed) pod belonging to <i>State1 Repl</i> set would be recreated again. Now, the next scheduling cycle comes. Again, the whole loop repeats killing each other from the siblings without going anywhere leading to a preemption storm causing instability of the queues. + Review Comment: Add an extension on this: it could even happen for a child queue below country 2 that gets caught in the preemption storm. ########## docs/user_guide/preemption.md: ########## @@ -228,4 +228,31 @@ In this example, two imbalances are observed: | `rt.ten-a.queue-2` | 0 | 0 | | `rt.ten-b` | 15 | 10 | | `rt.ten-b.queue-3` | 15 | 10 | -| `rt.sys` | 0 | 10 | \ No newline at end of file +| `rt.sys` | 0 | 10 | + +### Redistribution of Quota and Preemption Storm + +#### Redistribution of Quota + +Setting up guaranteed resources for the queue present at a higher level in the whole queue hierarchy helps to re-distribute the quota among different groups especially when workloads of the same priority run in different groups. Unlike the default scheduler, Yunikorn preempts even the workloads of the same priority to free up resources for pending workloads who deserve to get the resources as per guaranteed quota. At times, one needs this kind of queue set up in a real production cluster for redistribution. + +For example, root.region[1-N].country[1-N].state[1-N] + + + +This queue set up has N regions under “root”, each region has N countries. If administrators want to redistribute the workloads of the same priority among different regions, then it is better to define the guaranteed quota for each region so that preemption helps to reach the situation of running the workloads by redistribution based on the guaranteed quota each region is supposed to get. That way each region uses the resources it deserves to get at the maximum possible level from the overall cluster resources. + +#### Preemption Storm + +With setup like above, there is a side effect of increasing the possibilities of preemption storm or loop happening within the specific region between different state queues (siblings belonging to same parent). Review Comment: add "a" to "With a setup" replace "possibilities" with "chance" add "a" to "a preemption storm or loop" ########## docs/user_guide/preemption.md: ########## @@ -228,4 +228,31 @@ In this example, two imbalances are observed: | `rt.ten-a.queue-2` | 0 | 0 | | `rt.ten-b` | 15 | 10 | | `rt.ten-b.queue-3` | 15 | 10 | -| `rt.sys` | 0 | 10 | \ No newline at end of file +| `rt.sys` | 0 | 10 | + +### Redistribution of Quota and Preemption Storm + +#### Redistribution of Quota + +Setting up guaranteed resources for the queue present at a higher level in the whole queue hierarchy helps to re-distribute the quota among different groups especially when workloads of the same priority run in different groups. Unlike the default scheduler, Yunikorn preempts even the workloads of the same priority to free up resources for pending workloads who deserve to get the resources as per guaranteed quota. At times, one needs this kind of queue set up in a real production cluster for redistribution. + +For example, root.region[1-N].country[1-N].state[1-N] + + + +This queue set up has N regions under “root”, each region has N countries. If administrators want to redistribute the workloads of the same priority among different regions, then it is better to define the guaranteed quota for each region so that preemption helps to reach the situation of running the workloads by redistribution based on the guaranteed quota each region is supposed to get. That way each region uses the resources it deserves to get at the maximum possible level from the overall cluster resources. + +#### Preemption Storm + +With setup like above, there is a side effect of increasing the possibilities of preemption storm or loop happening within the specific region between different state queues (siblings belonging to same parent). + +ReplicaSets are a good example to look at for looping and circular preemption. Each time a pod from a replica set is removed the ReplicaSet controller will create a new pod to make sure the set is complete. That auto-recreation could trigger loops as described below. + + + +Replica set <i>State1 Repl</i> runs in queue <i>State1</i>. Replica set <i>State2 Repl</i> runs in the queue <i>State2</i>. Both queues belong to the same parent queue (they are siblings), <i>Country1</i>. The pods all run with the same settings for priority and preemption. There is no space left on the cluster. <i>State1</i> has no guaranteed quota, 4 pods of each vcores:1 are running and multiple pods of each vcores:1 of the replica set are pending. <i>State2</i> has no guaranteed quota, 4 pods of each vcores:1 are running and multiple pods of each vcores:1 of the replica set are pending. Both region, <i>region1</i> and country, <i>country1</i> queue usage is vcores:4. Since <i>region1</i> has a guaranteed quota of vcores:10 and usage of vcores:8 lower than its guaranteed quota leading to starvation of resources. All the queues (including both direct or indirect) below the parent queue are starving as it inherits the “under guaranteed” behavior from above said parent queue, < i>region1</i> calculation unless each state (leaf) queue has its own guaranteed quota. Now, either one of these state queues can trigger preemption. Review Comment: use backquotes for names like`State1 Repl` instead of `<i>State1 repl</i>` to stick with the md flow and not make it HTML Describe the state of each queue as a bullet list ########## docs/user_guide/preemption.md: ########## @@ -228,4 +228,31 @@ In this example, two imbalances are observed: | `rt.ten-a.queue-2` | 0 | 0 | | `rt.ten-b` | 15 | 10 | | `rt.ten-b.queue-3` | 15 | 10 | -| `rt.sys` | 0 | 10 | \ No newline at end of file +| `rt.sys` | 0 | 10 | + +### Redistribution of Quota and Preemption Storm + +#### Redistribution of Quota + +Setting up guaranteed resources for the queue present at a higher level in the whole queue hierarchy helps to re-distribute the quota among different groups especially when workloads of the same priority run in different groups. Unlike the default scheduler, Yunikorn preempts even the workloads of the same priority to free up resources for pending workloads who deserve to get the resources as per guaranteed quota. At times, one needs this kind of queue set up in a real production cluster for redistribution. + +For example, root.region[1-N].country[1-N].state[1-N] + + + +This queue set up has N regions under “root”, each region has N countries. If administrators want to redistribute the workloads of the same priority among different regions, then it is better to define the guaranteed quota for each region so that preemption helps to reach the situation of running the workloads by redistribution based on the guaranteed quota each region is supposed to get. That way each region uses the resources it deserves to get at the maximum possible level from the overall cluster resources. + +#### Preemption Storm + +With setup like above, there is a side effect of increasing the possibilities of preemption storm or loop happening within the specific region between different state queues (siblings belonging to same parent). + +ReplicaSets are a good example to look at for looping and circular preemption. Each time a pod from a replica set is removed the ReplicaSet controller will create a new pod to make sure the set is complete. That auto-recreation could trigger loops as described below. + + + +Replica set <i>State1 Repl</i> runs in queue <i>State1</i>. Replica set <i>State2 Repl</i> runs in the queue <i>State2</i>. Both queues belong to the same parent queue (they are siblings), <i>Country1</i>. The pods all run with the same settings for priority and preemption. There is no space left on the cluster. <i>State1</i> has no guaranteed quota, 4 pods of each vcores:1 are running and multiple pods of each vcores:1 of the replica set are pending. <i>State2</i> has no guaranteed quota, 4 pods of each vcores:1 are running and multiple pods of each vcores:1 of the replica set are pending. Both region, <i>region1</i> and country, <i>country1</i> queue usage is vcores:4. Since <i>region1</i> has a guaranteed quota of vcores:10 and usage of vcores:8 lower than its guaranteed quota leading to starvation of resources. All the queues (including both direct or indirect) below the parent queue are starving as it inherits the “under guaranteed” behavior from above said parent queue, < i>region1</i> calculation unless each state (leaf) queue has its own guaranteed quota. Now, either one of these state queues can trigger preemption. + +Let's say, <i>state1</i> triggers preemption to meet resource requirements for pending pods. +To make room for a <i>State1 Repl</i> pod, a pod from the <i>State2 Repl</i> set is preempted. Now, the pending <i>State1 Repl</i> pod moves from pending to running. Now, the next scheduling cycle comes. Let's say, <i>State2</i> triggers preemption to meet resource requirements for its pending pods. In addition to already existing pending pods, pod preempted (killed) in earlier scheduling cycles would have been recreated automatically by this time as it is a replica set. To make room for a <i>State2 Repl</i> pod, a pod from the <i>State1 Repl</i> set is preempted. Now, the pending <i>State2 Repl</i> pod moves from pending to running and preempted (killed) pod belonging to <i>State1 Repl</i> set would be recreated again. Now, the next scheduling cycle comes. Again, the whole loop repeats killing each other from the siblings without going anywhere leading to a preemption storm causing instability of the queues. + +Defining guaranteed resources at queues at lower level or at end leaf queues can avoid the preemption storm or loop happening in the cluster. Administrators should be aware of the side effects of setting up guaranteed resources at any specific location in the whole queue hierarchy to reap the best possible outcomes of the preemption process. Review Comment: add "from" to "or loop from happening" drop "whole" from "the whole queue" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
