Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Sun, Nov 13, 2016 at 12:53:28PM -0600, Christoph Lameter wrote: > On Fri, 11 Nov 2016, Peter Zijlstra wrote: > > > On Fri, Nov 11, 2016 at 12:46:37PM -0600, Christoph Lameter wrote: > > > On Thu, 10 Nov 2016, Daniel Vacek wrote: > > > > > > > I believe Daniel's patches are the best thing we can do in current > > > > situation as the behavior now seems rather buggy and does not provide > > > > above > > > > mentioned expectations set when rt throttling was merged with default > > > > budget of 95% of CPU time. Nor if you configure so that it does (by > > > > disabling RT_RUNTIME_SHARE), it also forces CPUs to go idle needlessly > > > > when > > > > there still can be rt task running not really starving anyone. At least > > > > till a proper rework of rt scheduling with DL Server is implemented. > > > > > > This looks like a fix for a bug and the company I work for is suffering > > > as a result. Could we please merge that ASAP? > > > > > > > What bug? And no, the patch has as many technical issues as it has > > conceptual ones. > > There is a deadlock, Peter!!! Describe please? Also, have you tried disabling RT_RUNTIME_SHARE ?
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Sun, Nov 13, 2016 at 12:53:28PM -0600, Christoph Lameter wrote: > On Fri, 11 Nov 2016, Peter Zijlstra wrote: > > > On Fri, Nov 11, 2016 at 12:46:37PM -0600, Christoph Lameter wrote: > > > On Thu, 10 Nov 2016, Daniel Vacek wrote: > > > > > > > I believe Daniel's patches are the best thing we can do in current > > > > situation as the behavior now seems rather buggy and does not provide > > > > above > > > > mentioned expectations set when rt throttling was merged with default > > > > budget of 95% of CPU time. Nor if you configure so that it does (by > > > > disabling RT_RUNTIME_SHARE), it also forces CPUs to go idle needlessly > > > > when > > > > there still can be rt task running not really starving anyone. At least > > > > till a proper rework of rt scheduling with DL Server is implemented. > > > > > > This looks like a fix for a bug and the company I work for is suffering > > > as a result. Could we please merge that ASAP? > > > > > > > What bug? And no, the patch has as many technical issues as it has > > conceptual ones. > > There is a deadlock, Peter!!! Describe please? Also, have you tried disabling RT_RUNTIME_SHARE ?
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Fri, 11 Nov 2016, Peter Zijlstra wrote: > On Fri, Nov 11, 2016 at 12:46:37PM -0600, Christoph Lameter wrote: > > On Thu, 10 Nov 2016, Daniel Vacek wrote: > > > > > I believe Daniel's patches are the best thing we can do in current > > > situation as the behavior now seems rather buggy and does not provide > > > above > > > mentioned expectations set when rt throttling was merged with default > > > budget of 95% of CPU time. Nor if you configure so that it does (by > > > disabling RT_RUNTIME_SHARE), it also forces CPUs to go idle needlessly > > > when > > > there still can be rt task running not really starving anyone. At least > > > till a proper rework of rt scheduling with DL Server is implemented. > > > > This looks like a fix for a bug and the company I work for is suffering > > as a result. Could we please merge that ASAP? > > > > What bug? And no, the patch has as many technical issues as it has > conceptual ones. There is a deadlock, Peter!!!
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Fri, 11 Nov 2016, Peter Zijlstra wrote: > On Fri, Nov 11, 2016 at 12:46:37PM -0600, Christoph Lameter wrote: > > On Thu, 10 Nov 2016, Daniel Vacek wrote: > > > > > I believe Daniel's patches are the best thing we can do in current > > > situation as the behavior now seems rather buggy and does not provide > > > above > > > mentioned expectations set when rt throttling was merged with default > > > budget of 95% of CPU time. Nor if you configure so that it does (by > > > disabling RT_RUNTIME_SHARE), it also forces CPUs to go idle needlessly > > > when > > > there still can be rt task running not really starving anyone. At least > > > till a proper rework of rt scheduling with DL Server is implemented. > > > > This looks like a fix for a bug and the company I work for is suffering > > as a result. Could we please merge that ASAP? > > > > What bug? And no, the patch has as many technical issues as it has > conceptual ones. There is a deadlock, Peter!!!
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Fri, Nov 11, 2016 at 12:46:37PM -0600, Christoph Lameter wrote: > On Thu, 10 Nov 2016, Daniel Vacek wrote: > > > I believe Daniel's patches are the best thing we can do in current > > situation as the behavior now seems rather buggy and does not provide above > > mentioned expectations set when rt throttling was merged with default > > budget of 95% of CPU time. Nor if you configure so that it does (by > > disabling RT_RUNTIME_SHARE), it also forces CPUs to go idle needlessly when > > there still can be rt task running not really starving anyone. At least > > till a proper rework of rt scheduling with DL Server is implemented. > > This looks like a fix for a bug and the company I work for is suffering > as a result. Could we please merge that ASAP? > What bug? And no, the patch has as many technical issues as it has conceptual ones.
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Fri, Nov 11, 2016 at 12:46:37PM -0600, Christoph Lameter wrote: > On Thu, 10 Nov 2016, Daniel Vacek wrote: > > > I believe Daniel's patches are the best thing we can do in current > > situation as the behavior now seems rather buggy and does not provide above > > mentioned expectations set when rt throttling was merged with default > > budget of 95% of CPU time. Nor if you configure so that it does (by > > disabling RT_RUNTIME_SHARE), it also forces CPUs to go idle needlessly when > > there still can be rt task running not really starving anyone. At least > > till a proper rework of rt scheduling with DL Server is implemented. > > This looks like a fix for a bug and the company I work for is suffering > as a result. Could we please merge that ASAP? > What bug? And no, the patch has as many technical issues as it has conceptual ones.
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Thu, 10 Nov 2016, Daniel Vacek wrote: > I believe Daniel's patches are the best thing we can do in current > situation as the behavior now seems rather buggy and does not provide above > mentioned expectations set when rt throttling was merged with default > budget of 95% of CPU time. Nor if you configure so that it does (by > disabling RT_RUNTIME_SHARE), it also forces CPUs to go idle needlessly when > there still can be rt task running not really starving anyone. At least > till a proper rework of rt scheduling with DL Server is implemented. This looks like a fix for a bug and the company I work for is suffering as a result. Could we please merge that ASAP?
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Thu, 10 Nov 2016, Daniel Vacek wrote: > I believe Daniel's patches are the best thing we can do in current > situation as the behavior now seems rather buggy and does not provide above > mentioned expectations set when rt throttling was merged with default > budget of 95% of CPU time. Nor if you configure so that it does (by > disabling RT_RUNTIME_SHARE), it also forces CPUs to go idle needlessly when > there still can be rt task running not really starving anyone. At least > till a proper rework of rt scheduling with DL Server is implemented. This looks like a fix for a bug and the company I work for is suffering as a result. Could we please merge that ASAP?
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On 11/08/2016 08:50 PM, Peter Zijlstra wrote: >> The problem is that using RT_RUNTIME_SHARE a CPU will almost always >> > borrow enough runtime to make a CPU intensive rt task to run forever... >> > well not forever, but until the system crash because a kworker starved >> > in this CPU. Kworkers are sched fair by design and users do not always >> > have a way to avoid them in an isolated CPU, for example. >> > >> > The user then can disable RT_RUNTIME_SHARE, but then the user will have >> > the CPU going idle for (period - runtime) at each period... throwing CPU >> > time in the trash. > So why is this a problem? You really should not be running that much > FIFO tasks to begin with. I agree that a spinning real-time task is not a good practice, but there are people using it and they have their own reasons/metrics/evaluations motivating them. > So I'm willing to take out (or at least default disable > RT_RUNTIME_SHARE). But other than this, this never really worked to > begin with. So it cannot be a regression. And we've lived this long with > the 'problem'. I agree! It would work perfectly in the absence of tasks pinned to a processor, but that is not the case. Trying to attend the users that want as much CPU time for -rt tasks as possible, the proposed patch seems to be a better solution when compared to RT_RUNTIME_SHARE, and it is way simpler! Even though it is not as perfect as a DL Server would be in the future, it seems to be useful until there... > We really should be doing the right thing here, not make a bigger mess. I see, agree and I am anxious to have it! :-). Tommaso and I discussed about DL servers implementing such rt throttling. The more complicated point so far (as Rostedt pointed on another e-mail) will be to have DL servers with arbitrary affinity, or serving task with arbitrary affinity. For example, one DL server pinned to each CPU providing bandwidth for fair tasks to run for (rt_period - rt_runtime)us at each rt_period... it will take sometime until someone propose it. -- Daniel
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On 11/08/2016 08:50 PM, Peter Zijlstra wrote: >> The problem is that using RT_RUNTIME_SHARE a CPU will almost always >> > borrow enough runtime to make a CPU intensive rt task to run forever... >> > well not forever, but until the system crash because a kworker starved >> > in this CPU. Kworkers are sched fair by design and users do not always >> > have a way to avoid them in an isolated CPU, for example. >> > >> > The user then can disable RT_RUNTIME_SHARE, but then the user will have >> > the CPU going idle for (period - runtime) at each period... throwing CPU >> > time in the trash. > So why is this a problem? You really should not be running that much > FIFO tasks to begin with. I agree that a spinning real-time task is not a good practice, but there are people using it and they have their own reasons/metrics/evaluations motivating them. > So I'm willing to take out (or at least default disable > RT_RUNTIME_SHARE). But other than this, this never really worked to > begin with. So it cannot be a regression. And we've lived this long with > the 'problem'. I agree! It would work perfectly in the absence of tasks pinned to a processor, but that is not the case. Trying to attend the users that want as much CPU time for -rt tasks as possible, the proposed patch seems to be a better solution when compared to RT_RUNTIME_SHARE, and it is way simpler! Even though it is not as perfect as a DL Server would be in the future, it seems to be useful until there... > We really should be doing the right thing here, not make a bigger mess. I see, agree and I am anxious to have it! :-). Tommaso and I discussed about DL servers implementing such rt throttling. The more complicated point so far (as Rostedt pointed on another e-mail) will be to have DL servers with arbitrary affinity, or serving task with arbitrary affinity. For example, one DL server pinned to each CPU providing bandwidth for fair tasks to run for (rt_period - rt_runtime)us at each rt_period... it will take sometime until someone propose it. -- Daniel
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016, Steven Rostedt wrote: > On Mon, 7 Nov 2016 21:06:50 +0100 > Daniel Bristot de Oliveirawrote: > > > The throttling allowed the kworker to run, but once the kworker went to > > sleep, the RT tasks started to work again. In the previous behavior, > > the system would either go idle, or the kworker would starve because > > the runtime become infinity for RR tasks. > > I'm confused? Are you saying that RR tasks don't get throttled in the > current code? That sounds like a bug to me. Good. Thats what I wanted to hear after all these justifications that the system is just behaving as designed.
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016, Steven Rostedt wrote: > On Mon, 7 Nov 2016 21:06:50 +0100 > Daniel Bristot de Oliveira wrote: > > > The throttling allowed the kworker to run, but once the kworker went to > > sleep, the RT tasks started to work again. In the previous behavior, > > the system would either go idle, or the kworker would starve because > > the runtime become infinity for RR tasks. > > I'm confused? Are you saying that RR tasks don't get throttled in the > current code? That sounds like a bug to me. Good. Thats what I wanted to hear after all these justifications that the system is just behaving as designed.
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Tue, Nov 08, 2016 at 08:29:49PM +0100, Daniel Bristot de Oliveira wrote: > > > On 11/08/2016 07:05 PM, Peter Zijlstra wrote: > >> > > >> > I know what we want to do, but there's some momentous problems that > >> > need to be solved first. > > Like what? > > The problem is that using RT_RUNTIME_SHARE a CPU will almost always > borrow enough runtime to make a CPU intensive rt task to run forever... > well not forever, but until the system crash because a kworker starved > in this CPU. Kworkers are sched fair by design and users do not always > have a way to avoid them in an isolated CPU, for example. > > The user then can disable RT_RUNTIME_SHARE, but then the user will have > the CPU going idle for (period - runtime) at each period... throwing CPU > time in the trash. So why is this a problem? You really should not be running that much FIFO tasks to begin with. So I'm willing to take out (or at least default disable RT_RUNTIME_SHARE). But other than this, this never really worked to begin with. So it cannot be a regression. And we've lived this long with the 'problem'. And that means this is a 'feature' and that means I say no. We really should be doing the right thing here, not make a bigger mess.
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Tue, Nov 08, 2016 at 08:29:49PM +0100, Daniel Bristot de Oliveira wrote: > > > On 11/08/2016 07:05 PM, Peter Zijlstra wrote: > >> > > >> > I know what we want to do, but there's some momentous problems that > >> > need to be solved first. > > Like what? > > The problem is that using RT_RUNTIME_SHARE a CPU will almost always > borrow enough runtime to make a CPU intensive rt task to run forever... > well not forever, but until the system crash because a kworker starved > in this CPU. Kworkers are sched fair by design and users do not always > have a way to avoid them in an isolated CPU, for example. > > The user then can disable RT_RUNTIME_SHARE, but then the user will have > the CPU going idle for (period - runtime) at each period... throwing CPU > time in the trash. So why is this a problem? You really should not be running that much FIFO tasks to begin with. So I'm willing to take out (or at least default disable RT_RUNTIME_SHARE). But other than this, this never really worked to begin with. So it cannot be a regression. And we've lived this long with the 'problem'. And that means this is a 'feature' and that means I say no. We really should be doing the right thing here, not make a bigger mess.
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On 11/08/2016 07:05 PM, Peter Zijlstra wrote: >> > >> > I know what we want to do, but there's some momentous problems that >> > need to be solved first. > Like what? The problem is that using RT_RUNTIME_SHARE a CPU will almost always borrow enough runtime to make a CPU intensive rt task to run forever... well not forever, but until the system crash because a kworker starved in this CPU. Kworkers are sched fair by design and users do not always have a way to avoid them in an isolated CPU, for example. The user then can disable RT_RUNTIME_SHARE, but then the user will have the CPU going idle for (period - runtime) at each period... throwing CPU time in the trash. >> > Until then, we may be forced to continue with >> > hacks. > Well, the more ill specified hacks we put in, the harder if will be to > replace because people will end up depending on it. The proposed patch seems to be the expected behavior for users/rt throttling - they want a safeguard for fair tasks while allowing -rt tasks to run as much as possible. I see (and completely agree) that a DL server for fair/rt task would be the best way to deal with this problem, but it will take some time until such solution :-(. We even discussed this at Retis today, but yeah, it will take sometime even in the best case. (thinking aloud... a DL Server would react like the proposed patch, in the sense that it would not be activated without tasks to run and would return the CPU for other tasks if the tasks inside the server finish their job before the end of the DL server runtime...) -- Daniel
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On 11/08/2016 07:05 PM, Peter Zijlstra wrote: >> > >> > I know what we want to do, but there's some momentous problems that >> > need to be solved first. > Like what? The problem is that using RT_RUNTIME_SHARE a CPU will almost always borrow enough runtime to make a CPU intensive rt task to run forever... well not forever, but until the system crash because a kworker starved in this CPU. Kworkers are sched fair by design and users do not always have a way to avoid them in an isolated CPU, for example. The user then can disable RT_RUNTIME_SHARE, but then the user will have the CPU going idle for (period - runtime) at each period... throwing CPU time in the trash. >> > Until then, we may be forced to continue with >> > hacks. > Well, the more ill specified hacks we put in, the harder if will be to > replace because people will end up depending on it. The proposed patch seems to be the expected behavior for users/rt throttling - they want a safeguard for fair tasks while allowing -rt tasks to run as much as possible. I see (and completely agree) that a DL server for fair/rt task would be the best way to deal with this problem, but it will take some time until such solution :-(. We even discussed this at Retis today, but yeah, it will take sometime even in the best case. (thinking aloud... a DL Server would react like the proposed patch, in the sense that it would not be activated without tasks to run and would return the CPU for other tasks if the tasks inside the server finish their job before the end of the DL server runtime...) -- Daniel
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Tue, Nov 08, 2016 at 12:17:10PM -0500, Steven Rostedt wrote: > On Tue, 8 Nov 2016 17:51:33 +0100 > Peter Zijlstrawrote: > > > You really should already know this. > > I know what we want to do, but there's some momentous problems that > need to be solved first. Like what? > Until then, we may be forced to continue with > hacks. Well, the more ill specified hacks we put in, the harder if will be to replace because people will end up depending on it.
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Tue, Nov 08, 2016 at 12:17:10PM -0500, Steven Rostedt wrote: > On Tue, 8 Nov 2016 17:51:33 +0100 > Peter Zijlstra wrote: > > > You really should already know this. > > I know what we want to do, but there's some momentous problems that > need to be solved first. Like what? > Until then, we may be forced to continue with > hacks. Well, the more ill specified hacks we put in, the harder if will be to replace because people will end up depending on it.
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Tue, 8 Nov 2016 17:51:33 +0100 Peter Zijlstrawrote: > You really should already know this. I know what we want to do, but there's some momentous problems that need to be solved first. Until then, we may be forced to continue with hacks. > > As stands the current rt cgroup code (and all this throttling code) is a > giant mess (as in, its not actually correct from a RT pov). We should > not make it worse by adding random hacks to it. > > The right way to to about doing this is by replacing it with something > better; like the proposed DL server for FIFO tasks -- which is entirely > non-trivial as well, see existing discussion on that. Right. The biggest issue that I see is how to assign affinities to FIFO tasks and use a DL server to keep them from starving other tasks? > > I'm not entirely sure what this patch was supposed to fix, but it could > be running CFS tasks with higher priority than RT for a window, instead I'm a bit confused with the above sentence. Do you mean that this patch causes CFS tasks to run for a period with a higher priority than RT? Well, currently we have the both CFS tasks and the "idle" task run higher than RT, but this patch changes that to be just CFS tasks. > of throttling RT tasks. This seems fairly ill specified, but something > like that could easily done with an explicit or slack time DL server for > CFS tasks. If we can have a DL scheduler that can handle arbitrary affinities, then all could be solved with that. -- Steve
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Tue, 8 Nov 2016 17:51:33 +0100 Peter Zijlstra wrote: > You really should already know this. I know what we want to do, but there's some momentous problems that need to be solved first. Until then, we may be forced to continue with hacks. > > As stands the current rt cgroup code (and all this throttling code) is a > giant mess (as in, its not actually correct from a RT pov). We should > not make it worse by adding random hacks to it. > > The right way to to about doing this is by replacing it with something > better; like the proposed DL server for FIFO tasks -- which is entirely > non-trivial as well, see existing discussion on that. Right. The biggest issue that I see is how to assign affinities to FIFO tasks and use a DL server to keep them from starving other tasks? > > I'm not entirely sure what this patch was supposed to fix, but it could > be running CFS tasks with higher priority than RT for a window, instead I'm a bit confused with the above sentence. Do you mean that this patch causes CFS tasks to run for a period with a higher priority than RT? Well, currently we have the both CFS tasks and the "idle" task run higher than RT, but this patch changes that to be just CFS tasks. > of throttling RT tasks. This seems fairly ill specified, but something > like that could easily done with an explicit or slack time DL server for > CFS tasks. If we can have a DL scheduler that can handle arbitrary affinities, then all could be solved with that. -- Steve
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Tue, Nov 08, 2016 at 09:07:40AM -0500, Steven Rostedt wrote: > On Tue, 8 Nov 2016 12:59:58 +0100 > Peter Zijlstrawrote: > > > No, none of this stands a chance of being accepted. > > > > This is making bad code worse. > > Peter, > > Instead of a flat out rejection, can you please provide some > constructive criticism to let those that are working on this know what > would be accepted? And what their next steps should be. > > There's obviously a problem with the current code, what steps do you > recommend to fix it? You really should already know this. As stands the current rt cgroup code (and all this throttling code) is a giant mess (as in, its not actually correct from a RT pov). We should not make it worse by adding random hacks to it. The right way to to about doing this is by replacing it with something better; like the proposed DL server for FIFO tasks -- which is entirely non-trivial as well, see existing discussion on that. I'm not entirely sure what this patch was supposed to fix, but it could be running CFS tasks with higher priority than RT for a window, instead of throttling RT tasks. This seems fairly ill specified, but something like that could easily done with an explicit or slack time DL server for CFS tasks.
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Tue, Nov 08, 2016 at 09:07:40AM -0500, Steven Rostedt wrote: > On Tue, 8 Nov 2016 12:59:58 +0100 > Peter Zijlstra wrote: > > > No, none of this stands a chance of being accepted. > > > > This is making bad code worse. > > Peter, > > Instead of a flat out rejection, can you please provide some > constructive criticism to let those that are working on this know what > would be accepted? And what their next steps should be. > > There's obviously a problem with the current code, what steps do you > recommend to fix it? You really should already know this. As stands the current rt cgroup code (and all this throttling code) is a giant mess (as in, its not actually correct from a RT pov). We should not make it worse by adding random hacks to it. The right way to to about doing this is by replacing it with something better; like the proposed DL server for FIFO tasks -- which is entirely non-trivial as well, see existing discussion on that. I'm not entirely sure what this patch was supposed to fix, but it could be running CFS tasks with higher priority than RT for a window, instead of throttling RT tasks. This seems fairly ill specified, but something like that could easily done with an explicit or slack time DL server for CFS tasks.
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Tue, 8 Nov 2016 12:59:58 +0100 Peter Zijlstrawrote: > No, none of this stands a chance of being accepted. > > This is making bad code worse. Peter, Instead of a flat out rejection, can you please provide some constructive criticism to let those that are working on this know what would be accepted? And what their next steps should be. There's obviously a problem with the current code, what steps do you recommend to fix it? -- Steve
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Tue, 8 Nov 2016 12:59:58 +0100 Peter Zijlstra wrote: > No, none of this stands a chance of being accepted. > > This is making bad code worse. Peter, Instead of a flat out rejection, can you please provide some constructive criticism to let those that are working on this know what would be accepted? And what their next steps should be. There's obviously a problem with the current code, what steps do you recommend to fix it? -- Steve
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
No, none of this stands a chance of being accepted. This is making bad code worse.
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
No, none of this stands a chance of being accepted. This is making bad code worse.
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
Hi Daniel, On 07/11/16 14:51, Daniel Bristot de Oliveira wrote: > On 11/07/2016 11:31 AM, Tommaso Cucinotta wrote: [...] > > -) only issue might be that, if a non-RT task wakes up after the > > unthrottle, it will have to wait, but worst-case it will have a chance > > in the next throttling window > > In the current default behavior (RT_RUNTIME_SHARING), in a domain with > more than two CPUs, the worst case easily become "infinity," because a > CPU can borrow runtime from another CPU. There is no guarantee for > minimum latency for non-rt tasks. Anyway, if the user wants to provide > such guarantee, they just need not enable this feature, while disabling > RT_RUNTIME_SHARING (or run the non-rt task as a deadline task ;-)) > I could only skim through the patch, so please forgive me if I'm talking gibberish, but I think what Tommaso is saying is that with your current approach if an unlucky OTHER task wakes up just after you unthrottled an rt_rq (by replenishment), it will have to wait until the next throttling event. I agree that this is still better than current status, and that you can still configure the system to avoid this from happening. What I'm wondering though is if we could modify your implementation and only do the replenishment when the replenishment timer actually fires, but let RT tasks continue to run, while their rt_rq is throttled, if no OTHER task is present, or wakes up. I guess this will complicate things, and maybe doesn't buy us much, just an idea. :) Otherwise, the patch looks good and useful to me. Best, - Juri
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
Hi Daniel, On 07/11/16 14:51, Daniel Bristot de Oliveira wrote: > On 11/07/2016 11:31 AM, Tommaso Cucinotta wrote: [...] > > -) only issue might be that, if a non-RT task wakes up after the > > unthrottle, it will have to wait, but worst-case it will have a chance > > in the next throttling window > > In the current default behavior (RT_RUNTIME_SHARING), in a domain with > more than two CPUs, the worst case easily become "infinity," because a > CPU can borrow runtime from another CPU. There is no guarantee for > minimum latency for non-rt tasks. Anyway, if the user wants to provide > such guarantee, they just need not enable this feature, while disabling > RT_RUNTIME_SHARING (or run the non-rt task as a deadline task ;-)) > I could only skim through the patch, so please forgive me if I'm talking gibberish, but I think what Tommaso is saying is that with your current approach if an unlucky OTHER task wakes up just after you unthrottled an rt_rq (by replenishment), it will have to wait until the next throttling event. I agree that this is still better than current status, and that you can still configure the system to avoid this from happening. What I'm wondering though is if we could modify your implementation and only do the replenishment when the replenishment timer actually fires, but let RT tasks continue to run, while their rt_rq is throttled, if no OTHER task is present, or wakes up. I guess this will complicate things, and maybe doesn't buy us much, just an idea. :) Otherwise, the patch looks good and useful to me. Best, - Juri
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
Hi all, since GRUB reclaiming has been mentioned, I am going to add some comments on it :) On Mon, 7 Nov 2016 14:51:37 +0100 Daniel Bristot de Oliveirawrote: [...] > The sum of allocated runtime for all DL tasks will not to be greater > than RT throttling enforcement runtime. The DL scheduler admission > control already avoids this by limiting the amount of CPU time all DL > tasks can consume (see init_dl_bw()). So, DL tasks are avoid ind the > "global" throttling on before hand - in the admission control. > > GRUB might implement something <> for the DEADLINE scheduler. > With GRUB, a deadline tasks will have more runtime than previously > set/granted. But I am quite sure it will still be bounded by the > sum of the already allocated DL runtime, that will continue being > smaller than "to_ratio(global_rt_period(), global_rt_runtime())". Well, it's not exactly like this... In the original GRUB algorithm[1] (that was uni-processor only), the tasks were able to reclaim 100% of the CPU time (in other words: with the original GRUB algorithm, SCHED_DEADLINE tasks can starve all of the non-deadline tasks). But in the patchset I submitted I modified the algorithm to reclaim only a specified fraction of the CPU time[2] (so that some CPU time is left for non-deadline tasks). See patch 5/6 in my latest submission (v3). I set the percentage of reclaimable CPU time equal to "to_ratio(global_rt_period(), global_rt_runtime())" (so, deadline tasks can be able to consume up to this fraction), but this can be changed if needed. Finally, notice that if we are interested in hard schedulability (hard respect of all the deadlines - this is not something that SCHED_DEADLINE currently does) then the reclaiming algorithm must be modified and can reclaim a smaller amount of CPU time (see [3,4] for details) Luca [1] Lipari, G., & Baruah, S. (2000). Greedy reclamation of unused bandwidth in constant-bandwidth servers. In Real-Time Systems, 2000. Euromicro RTS 2000. 12th Euromicro Conference on (pp. 193-200). IEEE. [2] Abeni, L., Lelli, J., Scordino, C., & Palopoli, L. (2014, October). Greedy CPU reclaiming for SCHED DEADLINE. In Proceedings of the Real-Time Linux Workshop (RTLWS), Dusseldorf, Germany. [3] Abeni, L., Lipari, G., Parri, A., & Sun, Y. (2016, April). Multicore CPU reclaiming: parallel or sequential?. In Proceedings of the 31st Annual ACM Symposium on Applied Computing (pp. 1877-1884). ACM. [4] https://arxiv.org/abs/1512.01984 > > Am I missing something? > > > -) only issue might be that, if a non-RT task wakes up after the > > unthrottle, it will have to wait, but worst-case it will have a > > chance in the next throttling window > > In the current default behavior (RT_RUNTIME_SHARING), in a domain with > more than two CPUs, the worst case easily become "infinity," because a > CPU can borrow runtime from another CPU. There is no guarantee for > minimum latency for non-rt tasks. Anyway, if the user wants to provide > such guarantee, they just need not enable this feature, while > disabling RT_RUNTIME_SHARING (or run the non-rt task as a deadline > task ;-)) > > > -) an alternative to unthrottling might be temporary class > > downgrade to sched_other, but that might be much more complex, > > instead this Daniel's one looks quite simple > > Yeah, decrease the priority of the task would be something way more > complicated and prone to errors. RT tasks would need to reduce its > priority to a level higher than the IDLE task, but lower than > SCHED_IDLE... > > > -) when considering also DEADLINE tasks, it might be good to think > > about how we'd like the throttling of DEADLINE and RT tasks to > > inter-relate, e.g.: > > Currently, DL tasks are limited (in the bw control) to the global RT > throttling limit... > > I think that this might be an extension to GRUB... that is extending > the current behavior... so... things for the future - and IMHO it is > another topic - way more challenging. > > Comments are welcome :-) > > -- Daniel > -- > To unsubscribe from this list: send the line "unsubscribe > linux-rt-users" in the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
Hi all, since GRUB reclaiming has been mentioned, I am going to add some comments on it :) On Mon, 7 Nov 2016 14:51:37 +0100 Daniel Bristot de Oliveira wrote: [...] > The sum of allocated runtime for all DL tasks will not to be greater > than RT throttling enforcement runtime. The DL scheduler admission > control already avoids this by limiting the amount of CPU time all DL > tasks can consume (see init_dl_bw()). So, DL tasks are avoid ind the > "global" throttling on before hand - in the admission control. > > GRUB might implement something <> for the DEADLINE scheduler. > With GRUB, a deadline tasks will have more runtime than previously > set/granted. But I am quite sure it will still be bounded by the > sum of the already allocated DL runtime, that will continue being > smaller than "to_ratio(global_rt_period(), global_rt_runtime())". Well, it's not exactly like this... In the original GRUB algorithm[1] (that was uni-processor only), the tasks were able to reclaim 100% of the CPU time (in other words: with the original GRUB algorithm, SCHED_DEADLINE tasks can starve all of the non-deadline tasks). But in the patchset I submitted I modified the algorithm to reclaim only a specified fraction of the CPU time[2] (so that some CPU time is left for non-deadline tasks). See patch 5/6 in my latest submission (v3). I set the percentage of reclaimable CPU time equal to "to_ratio(global_rt_period(), global_rt_runtime())" (so, deadline tasks can be able to consume up to this fraction), but this can be changed if needed. Finally, notice that if we are interested in hard schedulability (hard respect of all the deadlines - this is not something that SCHED_DEADLINE currently does) then the reclaiming algorithm must be modified and can reclaim a smaller amount of CPU time (see [3,4] for details) Luca [1] Lipari, G., & Baruah, S. (2000). Greedy reclamation of unused bandwidth in constant-bandwidth servers. In Real-Time Systems, 2000. Euromicro RTS 2000. 12th Euromicro Conference on (pp. 193-200). IEEE. [2] Abeni, L., Lelli, J., Scordino, C., & Palopoli, L. (2014, October). Greedy CPU reclaiming for SCHED DEADLINE. In Proceedings of the Real-Time Linux Workshop (RTLWS), Dusseldorf, Germany. [3] Abeni, L., Lipari, G., Parri, A., & Sun, Y. (2016, April). Multicore CPU reclaiming: parallel or sequential?. In Proceedings of the 31st Annual ACM Symposium on Applied Computing (pp. 1877-1884). ACM. [4] https://arxiv.org/abs/1512.01984 > > Am I missing something? > > > -) only issue might be that, if a non-RT task wakes up after the > > unthrottle, it will have to wait, but worst-case it will have a > > chance in the next throttling window > > In the current default behavior (RT_RUNTIME_SHARING), in a domain with > more than two CPUs, the worst case easily become "infinity," because a > CPU can borrow runtime from another CPU. There is no guarantee for > minimum latency for non-rt tasks. Anyway, if the user wants to provide > such guarantee, they just need not enable this feature, while > disabling RT_RUNTIME_SHARING (or run the non-rt task as a deadline > task ;-)) > > > -) an alternative to unthrottling might be temporary class > > downgrade to sched_other, but that might be much more complex, > > instead this Daniel's one looks quite simple > > Yeah, decrease the priority of the task would be something way more > complicated and prone to errors. RT tasks would need to reduce its > priority to a level higher than the IDLE task, but lower than > SCHED_IDLE... > > > -) when considering also DEADLINE tasks, it might be good to think > > about how we'd like the throttling of DEADLINE and RT tasks to > > inter-relate, e.g.: > > Currently, DL tasks are limited (in the bw control) to the global RT > throttling limit... > > I think that this might be an extension to GRUB... that is extending > the current behavior... so... things for the future - and IMHO it is > another topic - way more challenging. > > Comments are welcome :-) > > -- Daniel > -- > To unsubscribe from this list: send the line "unsubscribe > linux-rt-users" in the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016 21:33:02 +0100 Daniel Bristot de Oliveirawrote: > On 11/07/2016 09:16 PM, Steven Rostedt wrote: > > I'm confused? Are you saying that RR tasks don't get throttled in the > > current code? That sounds like a bug to me. > > If the RT_RUNTIME_SHARING is enabled, the CPU in which the RR tasks are > running (and pinned) will borrow RT runtime from another CPU, allowing > the RR tasks to run forever. For example: > > [root@kiron debug]# cat /proc/sched_debug | grep rt_runtime > .rt_runtime: 950.00 > .rt_runtime: 950.00 > .rt_runtime: 950.00 > .rt_runtime: 950.00 > [root@kiron debug]# echo RT_RUNTIME_SHARE > sched_features > [root@kiron debug]# taskset -c 2 chrt -r 5 /home/bristot/f & > [1] 23908 > [root@kiron debug]# taskset -c 2 chrt -r 5 /home/bristot/f & > [2] 23915 > [root@kiron debug]# cat /proc/sched_debug | grep rt_runtime > .rt_runtime: 900.00 > .rt_runtime: 950.00 > .rt_runtime: 1000.00 > .rt_runtime: 950.00 > > You see? the rt_runtime of the CPU 2 was borrowed time from CPU 0. > > It is not a BUG but a feature (no jokes haha). With RT_RUNTIME_SHARE, > the rt_runtime is such a global runtime. It works fine for tasks that > can migrate... but that is not the case for per-cpu kworkers. This still looks like a bug, or not the expected result. Perhaps we shouldn't share when tasks are pinned. It doesn't make sense. It's like pinning two deadline tasks to the same CPU and giving them 100% of that CPU and saying that it's really just 1/nr_cpus of usage, which would have the same effect. OK, it appears this is specific to RT_RUNTIME_SHARE which is what causes this strange behavior, and even more rational to make this a default option and perhaps even turn RT_RUNTIME_SHARE off by default. -- Steve
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016 21:33:02 +0100 Daniel Bristot de Oliveira wrote: > On 11/07/2016 09:16 PM, Steven Rostedt wrote: > > I'm confused? Are you saying that RR tasks don't get throttled in the > > current code? That sounds like a bug to me. > > If the RT_RUNTIME_SHARING is enabled, the CPU in which the RR tasks are > running (and pinned) will borrow RT runtime from another CPU, allowing > the RR tasks to run forever. For example: > > [root@kiron debug]# cat /proc/sched_debug | grep rt_runtime > .rt_runtime: 950.00 > .rt_runtime: 950.00 > .rt_runtime: 950.00 > .rt_runtime: 950.00 > [root@kiron debug]# echo RT_RUNTIME_SHARE > sched_features > [root@kiron debug]# taskset -c 2 chrt -r 5 /home/bristot/f & > [1] 23908 > [root@kiron debug]# taskset -c 2 chrt -r 5 /home/bristot/f & > [2] 23915 > [root@kiron debug]# cat /proc/sched_debug | grep rt_runtime > .rt_runtime: 900.00 > .rt_runtime: 950.00 > .rt_runtime: 1000.00 > .rt_runtime: 950.00 > > You see? the rt_runtime of the CPU 2 was borrowed time from CPU 0. > > It is not a BUG but a feature (no jokes haha). With RT_RUNTIME_SHARE, > the rt_runtime is such a global runtime. It works fine for tasks that > can migrate... but that is not the case for per-cpu kworkers. This still looks like a bug, or not the expected result. Perhaps we shouldn't share when tasks are pinned. It doesn't make sense. It's like pinning two deadline tasks to the same CPU and giving them 100% of that CPU and saying that it's really just 1/nr_cpus of usage, which would have the same effect. OK, it appears this is specific to RT_RUNTIME_SHARE which is what causes this strange behavior, and even more rational to make this a default option and perhaps even turn RT_RUNTIME_SHARE off by default. -- Steve
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On 11/07/2016 09:16 PM, Steven Rostedt wrote: > I'm confused? Are you saying that RR tasks don't get throttled in the > current code? That sounds like a bug to me. If the RT_RUNTIME_SHARING is enabled, the CPU in which the RR tasks are running (and pinned) will borrow RT runtime from another CPU, allowing the RR tasks to run forever. For example: [root@kiron debug]# cat /proc/sched_debug | grep rt_runtime .rt_runtime: 950.00 .rt_runtime: 950.00 .rt_runtime: 950.00 .rt_runtime: 950.00 [root@kiron debug]# echo RT_RUNTIME_SHARE > sched_features [root@kiron debug]# taskset -c 2 chrt -r 5 /home/bristot/f & [1] 23908 [root@kiron debug]# taskset -c 2 chrt -r 5 /home/bristot/f & [2] 23915 [root@kiron debug]# cat /proc/sched_debug | grep rt_runtime .rt_runtime: 900.00 .rt_runtime: 950.00 .rt_runtime: 1000.00 .rt_runtime: 950.00 You see? the rt_runtime of the CPU 2 was borrowed time from CPU 0. It is not a BUG but a feature (no jokes haha). With RT_RUNTIME_SHARE, the rt_runtime is such a global runtime. It works fine for tasks that can migrate... but that is not the case for per-cpu kworkers. -- Daniel
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On 11/07/2016 09:16 PM, Steven Rostedt wrote: > I'm confused? Are you saying that RR tasks don't get throttled in the > current code? That sounds like a bug to me. If the RT_RUNTIME_SHARING is enabled, the CPU in which the RR tasks are running (and pinned) will borrow RT runtime from another CPU, allowing the RR tasks to run forever. For example: [root@kiron debug]# cat /proc/sched_debug | grep rt_runtime .rt_runtime: 950.00 .rt_runtime: 950.00 .rt_runtime: 950.00 .rt_runtime: 950.00 [root@kiron debug]# echo RT_RUNTIME_SHARE > sched_features [root@kiron debug]# taskset -c 2 chrt -r 5 /home/bristot/f & [1] 23908 [root@kiron debug]# taskset -c 2 chrt -r 5 /home/bristot/f & [2] 23915 [root@kiron debug]# cat /proc/sched_debug | grep rt_runtime .rt_runtime: 900.00 .rt_runtime: 950.00 .rt_runtime: 1000.00 .rt_runtime: 950.00 You see? the rt_runtime of the CPU 2 was borrowed time from CPU 0. It is not a BUG but a feature (no jokes haha). With RT_RUNTIME_SHARE, the rt_runtime is such a global runtime. It works fine for tasks that can migrate... but that is not the case for per-cpu kworkers. -- Daniel
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016 21:06:50 +0100 Daniel Bristot de Oliveirawrote: > The throttling allowed the kworker to run, but once the kworker went to > sleep, the RT tasks started to work again. In the previous behavior, > the system would either go idle, or the kworker would starve because > the runtime become infinity for RR tasks. I'm confused? Are you saying that RR tasks don't get throttled in the current code? That sounds like a bug to me. -- Steve
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016 21:06:50 +0100 Daniel Bristot de Oliveira wrote: > The throttling allowed the kworker to run, but once the kworker went to > sleep, the RT tasks started to work again. In the previous behavior, > the system would either go idle, or the kworker would starve because > the runtime become infinity for RR tasks. I'm confused? Are you saying that RR tasks don't get throttled in the current code? That sounds like a bug to me. -- Steve
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On 11/07/2016 09:00 PM, Steven Rostedt wrote: > On Mon, 7 Nov 2016 13:54:12 -0600 (CST) > Christoph Lameterwrote: > >> On Mon, 7 Nov 2016, Steven Rostedt wrote: >> >>> On Mon, 7 Nov 2016 13:30:15 -0600 (CST) >>> Christoph Lameter wrote: >>> SCHED_RR tasks alternately running on on cpu can cause endless deferral of kworker threads. With the global effect of the OS processing reserved it may be the case that the processor we are executing never gets any time. And if that kworker threads role is releasing a mutex (like the cgroup_lock) then deadlocks can result. >>> >>> I believe SCHED_RR tasks will still throttle if they use up too much of >>> the CPU. But I still don't see how this patch helps your situation. >> >> The kworker thread will be able to make progress? Or am I not reading this >> correctly? >> > > If kworker is SCHED_OTHER, then it will be able to make progress if the > RT tasks are throttled. > > What Daniel's patch does, is to turn off throttling of the RT tasks if > there's no other task on the run queue. Here in the example of two spinning RR tasks (f-22466 & f-22473) and an other task (o-22506): f-22466 [002] d... 79045.641364: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=o next_pid=22506 next_prio=120 o-22506 [002] d... 79045.690379: sched_switch: prev_comm=o prev_pid=22506 prev_prio=120 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94 f-22466 [002] d... 79045.725359: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22473 next_prio=94 f-22473 [002] d... 79045.825356: sched_switch: prev_comm=f prev_pid=22473 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94 f-22466 [002] d... 79045.925350: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22473 next_prio=94 f-22473 [002] d... 79046.025346: sched_switch: prev_comm=f prev_pid=22473 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94 f-22466 [002] d... 79046.125346: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22473 next_prio=94 f-22473 [002] d... 79046.225337: sched_switch: prev_comm=f prev_pid=22473 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94 f-22466 [002] d... 79046.325333: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22473 next_prio=94 f-22473 [002] d... 79046.425328: sched_switch: prev_comm=f prev_pid=22473 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94 f-22466 [002] d... 79046.525324: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22473 next_prio=94 f-22473 [002] d... 79046.625319: sched_switch: prev_comm=f prev_pid=22473 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94 f-22466 [002] d... 79046.641320: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=o next_pid=22506 next_prio=120 o-22506 [002] d... 79046.690335: sched_switch: prev_comm=o prev_pid=22506 prev_prio=120 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94 The throttling is per-rq, so even if the RR tasks keep switching between each other, the throttling will take place if there is any sched other task. On Christoph's case, the other task will be the kworker, like bellow: f-22466 [002] d... 79294.430542: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22473 next_prio=94 f-22473 [002] d... 79294.483539: sched_switch: prev_comm=f prev_pid=22473 prev_prio=94 prev_state=R ==> next_comm=kworker/2:1 next_pid=22198 next_prio=120 kworker/2:1-22198 [002] d... 79294.483544: sched_switch: prev_comm=kworker/2:1 prev_pid=22198 prev_prio=120 prev_state=S ==> next_comm=f next_pid=22473 next_prio=94 f-22473 [002] d... 79294.530537: sched_switch: prev_comm=f prev_pid=22473 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94 f-22466 [002] d... 79294.630541: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22473 next_prio=94 The throttling allowed the kworker to run, but once the kworker went to sleep, the RT tasks started to work again. In the previous behavior, the system would either go idle, or the kworker would starve because the runtime become infinity for RR tasks. -- Daniel
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On 11/07/2016 09:00 PM, Steven Rostedt wrote: > On Mon, 7 Nov 2016 13:54:12 -0600 (CST) > Christoph Lameter wrote: > >> On Mon, 7 Nov 2016, Steven Rostedt wrote: >> >>> On Mon, 7 Nov 2016 13:30:15 -0600 (CST) >>> Christoph Lameter wrote: >>> SCHED_RR tasks alternately running on on cpu can cause endless deferral of kworker threads. With the global effect of the OS processing reserved it may be the case that the processor we are executing never gets any time. And if that kworker threads role is releasing a mutex (like the cgroup_lock) then deadlocks can result. >>> >>> I believe SCHED_RR tasks will still throttle if they use up too much of >>> the CPU. But I still don't see how this patch helps your situation. >> >> The kworker thread will be able to make progress? Or am I not reading this >> correctly? >> > > If kworker is SCHED_OTHER, then it will be able to make progress if the > RT tasks are throttled. > > What Daniel's patch does, is to turn off throttling of the RT tasks if > there's no other task on the run queue. Here in the example of two spinning RR tasks (f-22466 & f-22473) and an other task (o-22506): f-22466 [002] d... 79045.641364: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=o next_pid=22506 next_prio=120 o-22506 [002] d... 79045.690379: sched_switch: prev_comm=o prev_pid=22506 prev_prio=120 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94 f-22466 [002] d... 79045.725359: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22473 next_prio=94 f-22473 [002] d... 79045.825356: sched_switch: prev_comm=f prev_pid=22473 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94 f-22466 [002] d... 79045.925350: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22473 next_prio=94 f-22473 [002] d... 79046.025346: sched_switch: prev_comm=f prev_pid=22473 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94 f-22466 [002] d... 79046.125346: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22473 next_prio=94 f-22473 [002] d... 79046.225337: sched_switch: prev_comm=f prev_pid=22473 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94 f-22466 [002] d... 79046.325333: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22473 next_prio=94 f-22473 [002] d... 79046.425328: sched_switch: prev_comm=f prev_pid=22473 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94 f-22466 [002] d... 79046.525324: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22473 next_prio=94 f-22473 [002] d... 79046.625319: sched_switch: prev_comm=f prev_pid=22473 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94 f-22466 [002] d... 79046.641320: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=o next_pid=22506 next_prio=120 o-22506 [002] d... 79046.690335: sched_switch: prev_comm=o prev_pid=22506 prev_prio=120 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94 The throttling is per-rq, so even if the RR tasks keep switching between each other, the throttling will take place if there is any sched other task. On Christoph's case, the other task will be the kworker, like bellow: f-22466 [002] d... 79294.430542: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22473 next_prio=94 f-22473 [002] d... 79294.483539: sched_switch: prev_comm=f prev_pid=22473 prev_prio=94 prev_state=R ==> next_comm=kworker/2:1 next_pid=22198 next_prio=120 kworker/2:1-22198 [002] d... 79294.483544: sched_switch: prev_comm=kworker/2:1 prev_pid=22198 prev_prio=120 prev_state=S ==> next_comm=f next_pid=22473 next_prio=94 f-22473 [002] d... 79294.530537: sched_switch: prev_comm=f prev_pid=22473 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94 f-22466 [002] d... 79294.630541: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22473 next_prio=94 The throttling allowed the kworker to run, but once the kworker went to sleep, the RT tasks started to work again. In the previous behavior, the system would either go idle, or the kworker would starve because the runtime become infinity for RR tasks. -- Daniel
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016 13:30:15 -0600 (CST) Christoph Lameterwrote: > SCHED_RR tasks alternately running on on cpu can cause endless deferral of > kworker threads. With the global effect of the OS processing reserved > it may be the case that the processor we are executing never gets any > time. And if that kworker threads role is releasing a mutex (like the > cgroup_lock) then deadlocks can result. I believe SCHED_RR tasks will still throttle if they use up too much of the CPU. But I still don't see how this patch helps your situation. -- Steve
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016 13:30:15 -0600 (CST) Christoph Lameter wrote: > SCHED_RR tasks alternately running on on cpu can cause endless deferral of > kworker threads. With the global effect of the OS processing reserved > it may be the case that the processor we are executing never gets any > time. And if that kworker threads role is releasing a mutex (like the > cgroup_lock) then deadlocks can result. I believe SCHED_RR tasks will still throttle if they use up too much of the CPU. But I still don't see how this patch helps your situation. -- Steve
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016 13:54:12 -0600 (CST) Christoph Lameterwrote: > On Mon, 7 Nov 2016, Steven Rostedt wrote: > > > On Mon, 7 Nov 2016 13:30:15 -0600 (CST) > > Christoph Lameter wrote: > > > > > SCHED_RR tasks alternately running on on cpu can cause endless deferral of > > > kworker threads. With the global effect of the OS processing reserved > > > it may be the case that the processor we are executing never gets any > > > time. And if that kworker threads role is releasing a mutex (like the > > > cgroup_lock) then deadlocks can result. > > > > I believe SCHED_RR tasks will still throttle if they use up too much of > > the CPU. But I still don't see how this patch helps your situation. > > The kworker thread will be able to make progress? Or am I not reading this > correctly? > If kworker is SCHED_OTHER, then it will be able to make progress if the RT tasks are throttled. What Daniel's patch does, is to turn off throttling of the RT tasks if there's no other task on the run queue. -- Steve
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016 13:54:12 -0600 (CST) Christoph Lameter wrote: > On Mon, 7 Nov 2016, Steven Rostedt wrote: > > > On Mon, 7 Nov 2016 13:30:15 -0600 (CST) > > Christoph Lameter wrote: > > > > > SCHED_RR tasks alternately running on on cpu can cause endless deferral of > > > kworker threads. With the global effect of the OS processing reserved > > > it may be the case that the processor we are executing never gets any > > > time. And if that kworker threads role is releasing a mutex (like the > > > cgroup_lock) then deadlocks can result. > > > > I believe SCHED_RR tasks will still throttle if they use up too much of > > the CPU. But I still don't see how this patch helps your situation. > > The kworker thread will be able to make progress? Or am I not reading this > correctly? > If kworker is SCHED_OTHER, then it will be able to make progress if the RT tasks are throttled. What Daniel's patch does, is to turn off throttling of the RT tasks if there's no other task on the run queue. -- Steve
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016, Steven Rostedt wrote: > On Mon, 7 Nov 2016 13:30:15 -0600 (CST) > Christoph Lameterwrote: > > > SCHED_RR tasks alternately running on on cpu can cause endless deferral of > > kworker threads. With the global effect of the OS processing reserved > > it may be the case that the processor we are executing never gets any > > time. And if that kworker threads role is releasing a mutex (like the > > cgroup_lock) then deadlocks can result. > > I believe SCHED_RR tasks will still throttle if they use up too much of > the CPU. But I still don't see how this patch helps your situation. The kworker thread will be able to make progress? Or am I not reading this correctly?
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016, Steven Rostedt wrote: > On Mon, 7 Nov 2016 13:30:15 -0600 (CST) > Christoph Lameter wrote: > > > SCHED_RR tasks alternately running on on cpu can cause endless deferral of > > kworker threads. With the global effect of the OS processing reserved > > it may be the case that the processor we are executing never gets any > > time. And if that kworker threads role is releasing a mutex (like the > > cgroup_lock) then deadlocks can result. > > I believe SCHED_RR tasks will still throttle if they use up too much of > the CPU. But I still don't see how this patch helps your situation. The kworker thread will be able to make progress? Or am I not reading this correctly?
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016, Steven Rostedt wrote: > On Mon, 7 Nov 2016 10:55:38 -0600 (CST) > Christoph Lameterwrote: > > > On Mon, 7 Nov 2016, Daniel Bristot de Oliveira wrote: > > > > > With these two options set, the user will guarantee some runtime > > > for non-rt-tasks on all CPUs, while keeping real-time tasks running > > > as much as possible. > > > > Excellent this would improve the situation with deadlocks as a result of > > cgroup_locks not being released due to lack of workqueue processing. > > ?? What deadlocks do you see? I mean, can you show the situation that > throttling RT tasks will cause deadlock? > > Sorry, but I'm just not seeing it. SCHED_RR tasks alternately running on on cpu can cause endless deferral of kworker threads. With the global effect of the OS processing reserved it may be the case that the processor we are executing never gets any time. And if that kworker threads role is releasing a mutex (like the cgroup_lock) then deadlocks can result.
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016, Steven Rostedt wrote: > On Mon, 7 Nov 2016 10:55:38 -0600 (CST) > Christoph Lameter wrote: > > > On Mon, 7 Nov 2016, Daniel Bristot de Oliveira wrote: > > > > > With these two options set, the user will guarantee some runtime > > > for non-rt-tasks on all CPUs, while keeping real-time tasks running > > > as much as possible. > > > > Excellent this would improve the situation with deadlocks as a result of > > cgroup_locks not being released due to lack of workqueue processing. > > ?? What deadlocks do you see? I mean, can you show the situation that > throttling RT tasks will cause deadlock? > > Sorry, but I'm just not seeing it. SCHED_RR tasks alternately running on on cpu can cause endless deferral of kworker threads. With the global effect of the OS processing reserved it may be the case that the processor we are executing never gets any time. And if that kworker threads role is releasing a mutex (like the cgroup_lock) then deadlocks can result.
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016 19:49:03 +0100 Daniel Bristot de Oliveirawrote: > On 11/07/2016 07:32 PM, Steven Rostedt wrote: > >> Excellent this would improve the situation with deadlocks as a result of > >> > cgroup_locks not being released due to lack of workqueue processing. > > ?? What deadlocks do you see? I mean, can you show the situation that > > throttling RT tasks will cause deadlock? > > > > Sorry, but I'm just not seeing it. > > It is not a deadlock in the theoretical sense of the word, but it is > more a side effect of the starvation - that looks like a deadlock. > > There is a case where the removal of a cgroup dir calls > lru_add_drain_all(), that might schedule a kworker in the CPU that is > running the spinning-rt task. The kworker will starve - because they are > SCHED_OTHER by design, the lru_add_drain_all() will wait forever while > holding the cgroup lock and this will cause a lot of problems on other > tasks. I understand the issue with not throttling an RT task, but this patch is about not not throttling! That is, what scenario is there that will cause a "deadlock" or deadlock like to happen when we *do* throttle, where not throttling will work better, as this patch would have? -- Steve
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016 19:49:03 +0100 Daniel Bristot de Oliveira wrote: > On 11/07/2016 07:32 PM, Steven Rostedt wrote: > >> Excellent this would improve the situation with deadlocks as a result of > >> > cgroup_locks not being released due to lack of workqueue processing. > > ?? What deadlocks do you see? I mean, can you show the situation that > > throttling RT tasks will cause deadlock? > > > > Sorry, but I'm just not seeing it. > > It is not a deadlock in the theoretical sense of the word, but it is > more a side effect of the starvation - that looks like a deadlock. > > There is a case where the removal of a cgroup dir calls > lru_add_drain_all(), that might schedule a kworker in the CPU that is > running the spinning-rt task. The kworker will starve - because they are > SCHED_OTHER by design, the lru_add_drain_all() will wait forever while > holding the cgroup lock and this will cause a lot of problems on other > tasks. I understand the issue with not throttling an RT task, but this patch is about not not throttling! That is, what scenario is there that will cause a "deadlock" or deadlock like to happen when we *do* throttle, where not throttling will work better, as this patch would have? -- Steve
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On 11/07/2016 07:32 PM, Steven Rostedt wrote: >> Excellent this would improve the situation with deadlocks as a result of >> > cgroup_locks not being released due to lack of workqueue processing. > ?? What deadlocks do you see? I mean, can you show the situation that > throttling RT tasks will cause deadlock? > > Sorry, but I'm just not seeing it. It is not a deadlock in the theoretical sense of the word, but it is more a side effect of the starvation - that looks like a deadlock. There is a case where the removal of a cgroup dir calls lru_add_drain_all(), that might schedule a kworker in the CPU that is running the spinning-rt task. The kworker will starve - because they are SCHED_OTHER by design, the lru_add_drain_all() will wait forever while holding the cgroup lock and this will cause a lot of problems on other tasks. This problem was fixed on -rt using a -rt specific lock mechanism, but the problem still exist in the non-rt kernel. Btw, this is just an example of side effects of the starvation of non-rt-tasks. We can have more examples... -- Daniel
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On 11/07/2016 07:32 PM, Steven Rostedt wrote: >> Excellent this would improve the situation with deadlocks as a result of >> > cgroup_locks not being released due to lack of workqueue processing. > ?? What deadlocks do you see? I mean, can you show the situation that > throttling RT tasks will cause deadlock? > > Sorry, but I'm just not seeing it. It is not a deadlock in the theoretical sense of the word, but it is more a side effect of the starvation - that looks like a deadlock. There is a case where the removal of a cgroup dir calls lru_add_drain_all(), that might schedule a kworker in the CPU that is running the spinning-rt task. The kworker will starve - because they are SCHED_OTHER by design, the lru_add_drain_all() will wait forever while holding the cgroup lock and this will cause a lot of problems on other tasks. This problem was fixed on -rt using a -rt specific lock mechanism, but the problem still exist in the non-rt kernel. Btw, this is just an example of side effects of the starvation of non-rt-tasks. We can have more examples... -- Daniel
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016 13:30:46 -0500 Steven Rostedtwrote: > On Mon, 7 Nov 2016 12:22:21 -0600 > Clark Williams wrote: > > > I'm still reviewing the patch, but I have to wonder why bother with making > > it a scheduler feature? > > > > The SCHED_FIFO definition allows a fifo thread to starve others > > because a fifo task will run until it yields. Throttling was added as > > a safety valve to allow starved SCHED_OTHER tasks to get some cpu > > time. Adding this unconditionally gets us a safety valve for > > throttling a badly written fifo task, but allows the fifo task to > > continue to consume cpu cycles if it's not starving anyone. > > > > Or am I missing something that's blazingly obvious? > > Or I say make it the default. If people want the old behavior, they can > modify SCHED_FEATURES to do so. > Ok, I can see wanting the previous behavior. Clark pgp1Ch6TqUqc8.pgp Description: OpenPGP digital signature
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016 13:30:46 -0500 Steven Rostedt wrote: > On Mon, 7 Nov 2016 12:22:21 -0600 > Clark Williams wrote: > > > I'm still reviewing the patch, but I have to wonder why bother with making > > it a scheduler feature? > > > > The SCHED_FIFO definition allows a fifo thread to starve others > > because a fifo task will run until it yields. Throttling was added as > > a safety valve to allow starved SCHED_OTHER tasks to get some cpu > > time. Adding this unconditionally gets us a safety valve for > > throttling a badly written fifo task, but allows the fifo task to > > continue to consume cpu cycles if it's not starving anyone. > > > > Or am I missing something that's blazingly obvious? > > Or I say make it the default. If people want the old behavior, they can > modify SCHED_FEATURES to do so. > Ok, I can see wanting the previous behavior. Clark pgp1Ch6TqUqc8.pgp Description: OpenPGP digital signature
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On 11/07/2016 07:30 PM, Steven Rostedt wrote: >> I'm still reviewing the patch, but I have to wonder why bother with making >> it a scheduler feature? >> > >> > The SCHED_FIFO definition allows a fifo thread to starve others >> > because a fifo task will run until it yields. Throttling was added as >> > a safety valve to allow starved SCHED_OTHER tasks to get some cpu >> > time. Adding this unconditionally gets us a safety valve for >> > throttling a badly written fifo task, but allows the fifo task to >> > continue to consume cpu cycles if it's not starving anyone. >> > >> > Or am I missing something that's blazingly obvious? > Or I say make it the default. If people want the old behavior, they can > modify SCHED_FEATURES to do so. I added it as a feature to keep the current behavior by default. Currently, we have two throttling modes: With RT_RUNTIME_SHARING (default): before throttle, try to borrow some runtime from other CPU. Without RT_RUNTIME_SHARING: throttle the RT task, even if there is nothing else to do. The problem of the first is that an CPU easily borrow enough runtime to make the spin-rt-task to run forever, allowing the starvation of the non-rt-tasks, hence invalidating the mechanism. The problem of the second is that (with the default values) the CPU will be idle 5% of the time. IMHO, the balanced behavior is using GREED option + without RT_RUNTIME_SHARING: the non-rt tasks will be able to run, while avoiding CPU going idle. We can turn it by default setting default options. Moreover, AFAICS, these sched options are static keys, so they are very very low overhead conditions. -- Daniel
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On 11/07/2016 07:30 PM, Steven Rostedt wrote: >> I'm still reviewing the patch, but I have to wonder why bother with making >> it a scheduler feature? >> > >> > The SCHED_FIFO definition allows a fifo thread to starve others >> > because a fifo task will run until it yields. Throttling was added as >> > a safety valve to allow starved SCHED_OTHER tasks to get some cpu >> > time. Adding this unconditionally gets us a safety valve for >> > throttling a badly written fifo task, but allows the fifo task to >> > continue to consume cpu cycles if it's not starving anyone. >> > >> > Or am I missing something that's blazingly obvious? > Or I say make it the default. If people want the old behavior, they can > modify SCHED_FEATURES to do so. I added it as a feature to keep the current behavior by default. Currently, we have two throttling modes: With RT_RUNTIME_SHARING (default): before throttle, try to borrow some runtime from other CPU. Without RT_RUNTIME_SHARING: throttle the RT task, even if there is nothing else to do. The problem of the first is that an CPU easily borrow enough runtime to make the spin-rt-task to run forever, allowing the starvation of the non-rt-tasks, hence invalidating the mechanism. The problem of the second is that (with the default values) the CPU will be idle 5% of the time. IMHO, the balanced behavior is using GREED option + without RT_RUNTIME_SHARING: the non-rt tasks will be able to run, while avoiding CPU going idle. We can turn it by default setting default options. Moreover, AFAICS, these sched options are static keys, so they are very very low overhead conditions. -- Daniel
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016 10:55:38 -0600 (CST) Christoph Lameterwrote: > On Mon, 7 Nov 2016, Daniel Bristot de Oliveira wrote: > > > With these two options set, the user will guarantee some runtime > > for non-rt-tasks on all CPUs, while keeping real-time tasks running > > as much as possible. > > Excellent this would improve the situation with deadlocks as a result of > cgroup_locks not being released due to lack of workqueue processing. ?? What deadlocks do you see? I mean, can you show the situation that throttling RT tasks will cause deadlock? Sorry, but I'm just not seeing it. -- Steve
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016 10:55:38 -0600 (CST) Christoph Lameter wrote: > On Mon, 7 Nov 2016, Daniel Bristot de Oliveira wrote: > > > With these two options set, the user will guarantee some runtime > > for non-rt-tasks on all CPUs, while keeping real-time tasks running > > as much as possible. > > Excellent this would improve the situation with deadlocks as a result of > cgroup_locks not being released due to lack of workqueue processing. ?? What deadlocks do you see? I mean, can you show the situation that throttling RT tasks will cause deadlock? Sorry, but I'm just not seeing it. -- Steve
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016 12:22:21 -0600 Clark Williamswrote: > I'm still reviewing the patch, but I have to wonder why bother with making it > a scheduler feature? > > The SCHED_FIFO definition allows a fifo thread to starve others > because a fifo task will run until it yields. Throttling was added as > a safety valve to allow starved SCHED_OTHER tasks to get some cpu > time. Adding this unconditionally gets us a safety valve for > throttling a badly written fifo task, but allows the fifo task to > continue to consume cpu cycles if it's not starving anyone. > > Or am I missing something that's blazingly obvious? Or I say make it the default. If people want the old behavior, they can modify SCHED_FEATURES to do so. -- Steve
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016 12:22:21 -0600 Clark Williams wrote: > I'm still reviewing the patch, but I have to wonder why bother with making it > a scheduler feature? > > The SCHED_FIFO definition allows a fifo thread to starve others > because a fifo task will run until it yields. Throttling was added as > a safety valve to allow starved SCHED_OTHER tasks to get some cpu > time. Adding this unconditionally gets us a safety valve for > throttling a badly written fifo task, but allows the fifo task to > continue to consume cpu cycles if it's not starving anyone. > > Or am I missing something that's blazingly obvious? Or I say make it the default. If people want the old behavior, they can modify SCHED_FEATURES to do so. -- Steve
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016 19:03:08 +0100 Tommaso Cucinottawrote: > On 07/11/2016 14:51, Daniel Bristot de Oliveira wrote: > > Hi Tommaso, > > Hi, > > I'm cc-ing Luca for GRUB et al., pls find a few further notes below... Thanks Tommaso! I've seen the email on the linux-rt-users mailing list, and I'll reply there about GRUB... Thanks, Luca > > > On 11/07/2016 11:31 AM, Tommaso Cucinotta wrote: > >> as anticipated live to Daniel: > >> -) +1 for the general concept, we'd need something similar also for > >> SCHED_DEADLINE > > > > Resumed: the sum of the runtime of deadline tasks will not be > > greater than the "to_ratio(global_rt_period(), > > global_rt_runtime())" - see init_dl_bw(). Therefore, DL rq will not > > be throttle by the RT throttling mechanism. > > > > Extended: RT tasks' throttling aims to bound, for all CPUS of a > > domain - when RT_RUNTIME_SHARING sharing is enabled; or per-rq - > > when RT_RUNTIME_SHARING is disabled; the amount of time that RT > > tasks can run continuously, in such way to provide some CPU time > > for non-real-time tasks to run. RT tasks need this global/local > > throttling mechanism to avoid the starvation of non-rt tasks > > because RT tasks do not have a limited runtime - RT task (or > > taskset) can run for an infinity runtime. > > > > DL tasks' throttling has another meaning. DL tasks' throttling aims > > to avoid *a* DL task for running for more than *its own* > > pre-allocated runtime. > > sure, and having an option to let it run for longer, if there's > nothing else running in the system, is still interesting for pretty > much similar reasons to those being discussed in this thread ... > > > The sum of allocated runtime for all DL tasks will not to be greater > > than RT throttling enforcement runtime. The DL scheduler admission > > control already avoids this by limiting the amount of CPU time all > > DL tasks can consume (see init_dl_bw()). So, DL tasks are avoid ind > > the "global" throttling on before hand - in the admission control. > > > > GRUB might implement something <> for the DEADLINE > > scheduler. With GRUB, a deadline tasks will have more runtime than > > previously set/granted. > > yes, the main difference being: GRUB will let a DL task run for longer > than its own runtime, but still let it starve anything below (RT as > well as OTHER tasks); perhaps Luca (cc) has some further comment on > this... > > Thanks, > > T. > > > But I am quite sure it will still be bounded by the sum > > of the already allocated DL runtime, that will continue being > > smaller than "to_ratio(global_rt_period(), global_rt_runtime())". > > > > Am I missing something? > > > >> -) only issue might be that, if a non-RT task wakes up after the > >> unthrottle, it will have to wait, but worst-case it will have a > >> chance in the next throttling window > > > > In the current default behavior (RT_RUNTIME_SHARING), in a domain > > with more than two CPUs, the worst case easily become "infinity," > > because a CPU can borrow runtime from another CPU. There is no > > guarantee for minimum latency for non-rt tasks. Anyway, if the user > > wants to provide such guarantee, they just need not enable this > > feature, while disabling RT_RUNTIME_SHARING (or run the non-rt task > > as a deadline task ;-)) > > > >> -) an alternative to unthrottling might be temporary class > >> downgrade to sched_other, but that might be much more complex, > >> instead this Daniel's one looks quite simple > > > > Yeah, decrease the priority of the task would be something way more > > complicated and prone to errors. RT tasks would need to reduce its > > priority to a level higher than the IDLE task, but lower than > > SCHED_IDLE... > > > >> -) when considering also DEADLINE tasks, it might be good to think > >> about how we'd like the throttling of DEADLINE and RT tasks to > >> inter-relate, e.g.: > > > > Currently, DL tasks are limited (in the bw control) to the global RT > > throttling limit... > > > > I think that this might be an extension to GRUB... that is > > extending the current behavior... so... things for the future - and > > IMHO it is another topic - way more challenging. > > > > Comments are welcome :-) > > > > -- Daniel > > > >
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016 19:03:08 +0100 Tommaso Cucinotta wrote: > On 07/11/2016 14:51, Daniel Bristot de Oliveira wrote: > > Hi Tommaso, > > Hi, > > I'm cc-ing Luca for GRUB et al., pls find a few further notes below... Thanks Tommaso! I've seen the email on the linux-rt-users mailing list, and I'll reply there about GRUB... Thanks, Luca > > > On 11/07/2016 11:31 AM, Tommaso Cucinotta wrote: > >> as anticipated live to Daniel: > >> -) +1 for the general concept, we'd need something similar also for > >> SCHED_DEADLINE > > > > Resumed: the sum of the runtime of deadline tasks will not be > > greater than the "to_ratio(global_rt_period(), > > global_rt_runtime())" - see init_dl_bw(). Therefore, DL rq will not > > be throttle by the RT throttling mechanism. > > > > Extended: RT tasks' throttling aims to bound, for all CPUS of a > > domain - when RT_RUNTIME_SHARING sharing is enabled; or per-rq - > > when RT_RUNTIME_SHARING is disabled; the amount of time that RT > > tasks can run continuously, in such way to provide some CPU time > > for non-real-time tasks to run. RT tasks need this global/local > > throttling mechanism to avoid the starvation of non-rt tasks > > because RT tasks do not have a limited runtime - RT task (or > > taskset) can run for an infinity runtime. > > > > DL tasks' throttling has another meaning. DL tasks' throttling aims > > to avoid *a* DL task for running for more than *its own* > > pre-allocated runtime. > > sure, and having an option to let it run for longer, if there's > nothing else running in the system, is still interesting for pretty > much similar reasons to those being discussed in this thread ... > > > The sum of allocated runtime for all DL tasks will not to be greater > > than RT throttling enforcement runtime. The DL scheduler admission > > control already avoids this by limiting the amount of CPU time all > > DL tasks can consume (see init_dl_bw()). So, DL tasks are avoid ind > > the "global" throttling on before hand - in the admission control. > > > > GRUB might implement something <> for the DEADLINE > > scheduler. With GRUB, a deadline tasks will have more runtime than > > previously set/granted. > > yes, the main difference being: GRUB will let a DL task run for longer > than its own runtime, but still let it starve anything below (RT as > well as OTHER tasks); perhaps Luca (cc) has some further comment on > this... > > Thanks, > > T. > > > But I am quite sure it will still be bounded by the sum > > of the already allocated DL runtime, that will continue being > > smaller than "to_ratio(global_rt_period(), global_rt_runtime())". > > > > Am I missing something? > > > >> -) only issue might be that, if a non-RT task wakes up after the > >> unthrottle, it will have to wait, but worst-case it will have a > >> chance in the next throttling window > > > > In the current default behavior (RT_RUNTIME_SHARING), in a domain > > with more than two CPUs, the worst case easily become "infinity," > > because a CPU can borrow runtime from another CPU. There is no > > guarantee for minimum latency for non-rt tasks. Anyway, if the user > > wants to provide such guarantee, they just need not enable this > > feature, while disabling RT_RUNTIME_SHARING (or run the non-rt task > > as a deadline task ;-)) > > > >> -) an alternative to unthrottling might be temporary class > >> downgrade to sched_other, but that might be much more complex, > >> instead this Daniel's one looks quite simple > > > > Yeah, decrease the priority of the task would be something way more > > complicated and prone to errors. RT tasks would need to reduce its > > priority to a level higher than the IDLE task, but lower than > > SCHED_IDLE... > > > >> -) when considering also DEADLINE tasks, it might be good to think > >> about how we'd like the throttling of DEADLINE and RT tasks to > >> inter-relate, e.g.: > > > > Currently, DL tasks are limited (in the bw control) to the global RT > > throttling limit... > > > > I think that this might be an extension to GRUB... that is > > extending the current behavior... so... things for the future - and > > IMHO it is another topic - way more challenging. > > > > Comments are welcome :-) > > > > -- Daniel > > > >
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016 09:17:55 +0100 Daniel Bristot de Oliveirawrote: > The rt throttling mechanism prevents the starvation of non-real-time > tasks by CPU intensive real-time tasks. In terms of percentage, > the default behavior allows real-time tasks to run up to 95% of a > given period, leaving the other 5% of the period for non-real-time > tasks. In the absence of non-rt tasks, the system goes idle for 5% > of the period. > > Although this behavior works fine for the purpose of avoiding > bad real-time tasks that can hang the system, some greed users > want to allow the real-time task to continue running in the absence > of non-real-time tasks starving. In other words, they do not want to > see the system going idle. > > This patch implements the RT_RUNTIME_GREED scheduler feature for greedy > users (TM). When enabled, this feature will check if non-rt tasks are > starving before throttling the real-time task. If the real-time task > becomes throttled, it will be unthrottled as soon as the system goes > idle, or when the next period starts, whichever comes first. > > This feature is enabled with the following command: > # echo RT_RUNTIME_GREED > /sys/kernel/debug/sched_features > I'm still reviewing the patch, but I have to wonder why bother with making it a scheduler feature? The SCHED_FIFO definition allows a fifo thread to starve others because a fifo task will run until it yields. Throttling was added as a safety valve to allow starved SCHED_OTHER tasks to get some cpu time. Adding this unconditionally gets us a safety valve for throttling a badly written fifo task, but allows the fifo task to continue to consume cpu cycles if it's not starving anyone. Or am I missing something that's blazingly obvious? Clark pgpjY4N8nxMfS.pgp Description: OpenPGP digital signature
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016 09:17:55 +0100 Daniel Bristot de Oliveira wrote: > The rt throttling mechanism prevents the starvation of non-real-time > tasks by CPU intensive real-time tasks. In terms of percentage, > the default behavior allows real-time tasks to run up to 95% of a > given period, leaving the other 5% of the period for non-real-time > tasks. In the absence of non-rt tasks, the system goes idle for 5% > of the period. > > Although this behavior works fine for the purpose of avoiding > bad real-time tasks that can hang the system, some greed users > want to allow the real-time task to continue running in the absence > of non-real-time tasks starving. In other words, they do not want to > see the system going idle. > > This patch implements the RT_RUNTIME_GREED scheduler feature for greedy > users (TM). When enabled, this feature will check if non-rt tasks are > starving before throttling the real-time task. If the real-time task > becomes throttled, it will be unthrottled as soon as the system goes > idle, or when the next period starts, whichever comes first. > > This feature is enabled with the following command: > # echo RT_RUNTIME_GREED > /sys/kernel/debug/sched_features > I'm still reviewing the patch, but I have to wonder why bother with making it a scheduler feature? The SCHED_FIFO definition allows a fifo thread to starve others because a fifo task will run until it yields. Throttling was added as a safety valve to allow starved SCHED_OTHER tasks to get some cpu time. Adding this unconditionally gets us a safety valve for throttling a badly written fifo task, but allows the fifo task to continue to consume cpu cycles if it's not starving anyone. Or am I missing something that's blazingly obvious? Clark pgpjY4N8nxMfS.pgp Description: OpenPGP digital signature
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On 07/11/2016 14:51, Daniel Bristot de Oliveira wrote: Hi Tommaso, Hi, I'm cc-ing Luca for GRUB et al., pls find a few further notes below... On 11/07/2016 11:31 AM, Tommaso Cucinotta wrote: as anticipated live to Daniel: -) +1 for the general concept, we'd need something similar also for SCHED_DEADLINE Resumed: the sum of the runtime of deadline tasks will not be greater than the "to_ratio(global_rt_period(), global_rt_runtime())" - see init_dl_bw(). Therefore, DL rq will not be throttle by the RT throttling mechanism. Extended: RT tasks' throttling aims to bound, for all CPUS of a domain - when RT_RUNTIME_SHARING sharing is enabled; or per-rq - when RT_RUNTIME_SHARING is disabled; the amount of time that RT tasks can run continuously, in such way to provide some CPU time for non-real-time tasks to run. RT tasks need this global/local throttling mechanism to avoid the starvation of non-rt tasks because RT tasks do not have a limited runtime - RT task (or taskset) can run for an infinity runtime. DL tasks' throttling has another meaning. DL tasks' throttling aims to avoid *a* DL task for running for more than *its own* pre-allocated runtime. sure, and having an option to let it run for longer, if there's nothing else running in the system, is still interesting for pretty much similar reasons to those being discussed in this thread ... The sum of allocated runtime for all DL tasks will not to be greater than RT throttling enforcement runtime. The DL scheduler admission control already avoids this by limiting the amount of CPU time all DL tasks can consume (see init_dl_bw()). So, DL tasks are avoid ind the "global" throttling on before hand - in the admission control. GRUB might implement something <> for the DEADLINE scheduler. With GRUB, a deadline tasks will have more runtime than previously set/granted. yes, the main difference being: GRUB will let a DL task run for longer than its own runtime, but still let it starve anything below (RT as well as OTHER tasks); perhaps Luca (cc) has some further comment on this... Thanks, T. But I am quite sure it will still be bounded by the sum of the already allocated DL runtime, that will continue being smaller than "to_ratio(global_rt_period(), global_rt_runtime())". Am I missing something? -) only issue might be that, if a non-RT task wakes up after the unthrottle, it will have to wait, but worst-case it will have a chance in the next throttling window In the current default behavior (RT_RUNTIME_SHARING), in a domain with more than two CPUs, the worst case easily become "infinity," because a CPU can borrow runtime from another CPU. There is no guarantee for minimum latency for non-rt tasks. Anyway, if the user wants to provide such guarantee, they just need not enable this feature, while disabling RT_RUNTIME_SHARING (or run the non-rt task as a deadline task ;-)) -) an alternative to unthrottling might be temporary class downgrade to sched_other, but that might be much more complex, instead this Daniel's one looks quite simple Yeah, decrease the priority of the task would be something way more complicated and prone to errors. RT tasks would need to reduce its priority to a level higher than the IDLE task, but lower than SCHED_IDLE... -) when considering also DEADLINE tasks, it might be good to think about how we'd like the throttling of DEADLINE and RT tasks to inter-relate, e.g.: Currently, DL tasks are limited (in the bw control) to the global RT throttling limit... I think that this might be an extension to GRUB... that is extending the current behavior... so... things for the future - and IMHO it is another topic - way more challenging. Comments are welcome :-) -- Daniel -- Tommaso Cucinotta, Computer Engineering PhD Associate Professor at the Real-Time Systems Laboratory (ReTiS) Scuola Superiore Sant'Anna, Pisa, Italy http://retis.sssup.it/people/tommaso
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On 07/11/2016 14:51, Daniel Bristot de Oliveira wrote: Hi Tommaso, Hi, I'm cc-ing Luca for GRUB et al., pls find a few further notes below... On 11/07/2016 11:31 AM, Tommaso Cucinotta wrote: as anticipated live to Daniel: -) +1 for the general concept, we'd need something similar also for SCHED_DEADLINE Resumed: the sum of the runtime of deadline tasks will not be greater than the "to_ratio(global_rt_period(), global_rt_runtime())" - see init_dl_bw(). Therefore, DL rq will not be throttle by the RT throttling mechanism. Extended: RT tasks' throttling aims to bound, for all CPUS of a domain - when RT_RUNTIME_SHARING sharing is enabled; or per-rq - when RT_RUNTIME_SHARING is disabled; the amount of time that RT tasks can run continuously, in such way to provide some CPU time for non-real-time tasks to run. RT tasks need this global/local throttling mechanism to avoid the starvation of non-rt tasks because RT tasks do not have a limited runtime - RT task (or taskset) can run for an infinity runtime. DL tasks' throttling has another meaning. DL tasks' throttling aims to avoid *a* DL task for running for more than *its own* pre-allocated runtime. sure, and having an option to let it run for longer, if there's nothing else running in the system, is still interesting for pretty much similar reasons to those being discussed in this thread ... The sum of allocated runtime for all DL tasks will not to be greater than RT throttling enforcement runtime. The DL scheduler admission control already avoids this by limiting the amount of CPU time all DL tasks can consume (see init_dl_bw()). So, DL tasks are avoid ind the "global" throttling on before hand - in the admission control. GRUB might implement something <> for the DEADLINE scheduler. With GRUB, a deadline tasks will have more runtime than previously set/granted. yes, the main difference being: GRUB will let a DL task run for longer than its own runtime, but still let it starve anything below (RT as well as OTHER tasks); perhaps Luca (cc) has some further comment on this... Thanks, T. But I am quite sure it will still be bounded by the sum of the already allocated DL runtime, that will continue being smaller than "to_ratio(global_rt_period(), global_rt_runtime())". Am I missing something? -) only issue might be that, if a non-RT task wakes up after the unthrottle, it will have to wait, but worst-case it will have a chance in the next throttling window In the current default behavior (RT_RUNTIME_SHARING), in a domain with more than two CPUs, the worst case easily become "infinity," because a CPU can borrow runtime from another CPU. There is no guarantee for minimum latency for non-rt tasks. Anyway, if the user wants to provide such guarantee, they just need not enable this feature, while disabling RT_RUNTIME_SHARING (or run the non-rt task as a deadline task ;-)) -) an alternative to unthrottling might be temporary class downgrade to sched_other, but that might be much more complex, instead this Daniel's one looks quite simple Yeah, decrease the priority of the task would be something way more complicated and prone to errors. RT tasks would need to reduce its priority to a level higher than the IDLE task, but lower than SCHED_IDLE... -) when considering also DEADLINE tasks, it might be good to think about how we'd like the throttling of DEADLINE and RT tasks to inter-relate, e.g.: Currently, DL tasks are limited (in the bw control) to the global RT throttling limit... I think that this might be an extension to GRUB... that is extending the current behavior... so... things for the future - and IMHO it is another topic - way more challenging. Comments are welcome :-) -- Daniel -- Tommaso Cucinotta, Computer Engineering PhD Associate Professor at the Real-Time Systems Laboratory (ReTiS) Scuola Superiore Sant'Anna, Pisa, Italy http://retis.sssup.it/people/tommaso
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016, Daniel Bristot de Oliveira wrote: > With these two options set, the user will guarantee some runtime > for non-rt-tasks on all CPUs, while keeping real-time tasks running > as much as possible. Excellent this would improve the situation with deadlocks as a result of cgroup_locks not being released due to lack of workqueue processing.
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On Mon, 7 Nov 2016, Daniel Bristot de Oliveira wrote: > With these two options set, the user will guarantee some runtime > for non-rt-tasks on all CPUs, while keeping real-time tasks running > as much as possible. Excellent this would improve the situation with deadlocks as a result of cgroup_locks not being released due to lack of workqueue processing.
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
Hi Tommaso, On 11/07/2016 11:31 AM, Tommaso Cucinotta wrote: > as anticipated live to Daniel: > -) +1 for the general concept, we'd need something similar also for > SCHED_DEADLINE Resumed: the sum of the runtime of deadline tasks will not be greater than the "to_ratio(global_rt_period(), global_rt_runtime())" - see init_dl_bw(). Therefore, DL rq will not be throttle by the RT throttling mechanism. Extended: RT tasks' throttling aims to bound, for all CPUS of a domain - when RT_RUNTIME_SHARING sharing is enabled; or per-rq - when RT_RUNTIME_SHARING is disabled; the amount of time that RT tasks can run continuously, in such way to provide some CPU time for non-real-time tasks to run. RT tasks need this global/local throttling mechanism to avoid the starvation of non-rt tasks because RT tasks do not have a limited runtime - RT task (or taskset) can run for an infinity runtime. DL tasks' throttling has another meaning. DL tasks' throttling aims to avoid *a* DL task for running for more than *its own* pre-allocated runtime. The sum of allocated runtime for all DL tasks will not to be greater than RT throttling enforcement runtime. The DL scheduler admission control already avoids this by limiting the amount of CPU time all DL tasks can consume (see init_dl_bw()). So, DL tasks are avoid ind the "global" throttling on before hand - in the admission control. GRUB might implement something <> for the DEADLINE scheduler. With GRUB, a deadline tasks will have more runtime than previously set/granted. But I am quite sure it will still be bounded by the sum of the already allocated DL runtime, that will continue being smaller than "to_ratio(global_rt_period(), global_rt_runtime())". Am I missing something? > -) only issue might be that, if a non-RT task wakes up after the > unthrottle, it will have to wait, but worst-case it will have a chance > in the next throttling window In the current default behavior (RT_RUNTIME_SHARING), in a domain with more than two CPUs, the worst case easily become "infinity," because a CPU can borrow runtime from another CPU. There is no guarantee for minimum latency for non-rt tasks. Anyway, if the user wants to provide such guarantee, they just need not enable this feature, while disabling RT_RUNTIME_SHARING (or run the non-rt task as a deadline task ;-)) > -) an alternative to unthrottling might be temporary class downgrade to > sched_other, but that might be much more complex, instead this Daniel's > one looks quite simple Yeah, decrease the priority of the task would be something way more complicated and prone to errors. RT tasks would need to reduce its priority to a level higher than the IDLE task, but lower than SCHED_IDLE... > -) when considering also DEADLINE tasks, it might be good to think about > how we'd like the throttling of DEADLINE and RT tasks to inter-relate, > e.g.: Currently, DL tasks are limited (in the bw control) to the global RT throttling limit... I think that this might be an extension to GRUB... that is extending the current behavior... so... things for the future - and IMHO it is another topic - way more challenging. Comments are welcome :-) -- Daniel
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
Hi Tommaso, On 11/07/2016 11:31 AM, Tommaso Cucinotta wrote: > as anticipated live to Daniel: > -) +1 for the general concept, we'd need something similar also for > SCHED_DEADLINE Resumed: the sum of the runtime of deadline tasks will not be greater than the "to_ratio(global_rt_period(), global_rt_runtime())" - see init_dl_bw(). Therefore, DL rq will not be throttle by the RT throttling mechanism. Extended: RT tasks' throttling aims to bound, for all CPUS of a domain - when RT_RUNTIME_SHARING sharing is enabled; or per-rq - when RT_RUNTIME_SHARING is disabled; the amount of time that RT tasks can run continuously, in such way to provide some CPU time for non-real-time tasks to run. RT tasks need this global/local throttling mechanism to avoid the starvation of non-rt tasks because RT tasks do not have a limited runtime - RT task (or taskset) can run for an infinity runtime. DL tasks' throttling has another meaning. DL tasks' throttling aims to avoid *a* DL task for running for more than *its own* pre-allocated runtime. The sum of allocated runtime for all DL tasks will not to be greater than RT throttling enforcement runtime. The DL scheduler admission control already avoids this by limiting the amount of CPU time all DL tasks can consume (see init_dl_bw()). So, DL tasks are avoid ind the "global" throttling on before hand - in the admission control. GRUB might implement something <> for the DEADLINE scheduler. With GRUB, a deadline tasks will have more runtime than previously set/granted. But I am quite sure it will still be bounded by the sum of the already allocated DL runtime, that will continue being smaller than "to_ratio(global_rt_period(), global_rt_runtime())". Am I missing something? > -) only issue might be that, if a non-RT task wakes up after the > unthrottle, it will have to wait, but worst-case it will have a chance > in the next throttling window In the current default behavior (RT_RUNTIME_SHARING), in a domain with more than two CPUs, the worst case easily become "infinity," because a CPU can borrow runtime from another CPU. There is no guarantee for minimum latency for non-rt tasks. Anyway, if the user wants to provide such guarantee, they just need not enable this feature, while disabling RT_RUNTIME_SHARING (or run the non-rt task as a deadline task ;-)) > -) an alternative to unthrottling might be temporary class downgrade to > sched_other, but that might be much more complex, instead this Daniel's > one looks quite simple Yeah, decrease the priority of the task would be something way more complicated and prone to errors. RT tasks would need to reduce its priority to a level higher than the IDLE task, but lower than SCHED_IDLE... > -) when considering also DEADLINE tasks, it might be good to think about > how we'd like the throttling of DEADLINE and RT tasks to inter-relate, > e.g.: Currently, DL tasks are limited (in the bw control) to the global RT throttling limit... I think that this might be an extension to GRUB... that is extending the current behavior... so... things for the future - and IMHO it is another topic - way more challenging. Comments are welcome :-) -- Daniel
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
as anticipated live to Daniel: -) +1 for the general concept, we'd need something similar also for SCHED_DEADLINE -) only issue might be that, if a non-RT task wakes up after the unthrottle, it will have to wait, but worst-case it will have a chance in the next throttling window -) an alternative to unthrottling might be temporary class downgrade to sched_other, but that might be much more complex, instead this Daniel's one looks quite simple -) when considering also DEADLINE tasks, it might be good to think about how we'd like the throttling of DEADLINE and RT tasks to inter-relate, e.g.: a) DEADLINE unthrottles if there's no RT nor OTHER tasks? what if there's an unthrottled RT? b) DEADLINE throttles by downgrading to OTHER? c) DEADLINE throttles by downgrading to RT (RR/FIFO and what prio?) My2c, thanks! T. On 07/11/2016 09:17, Daniel Bristot de Oliveira wrote: The rt throttling mechanism prevents the starvation of non-real-time tasks by CPU intensive real-time tasks. In terms of percentage, the default behavior allows real-time tasks to run up to 95% of a given period, leaving the other 5% of the period for non-real-time tasks. In the absence of non-rt tasks, the system goes idle for 5% of the period. Although this behavior works fine for the purpose of avoiding bad real-time tasks that can hang the system, some greed users want to allow the real-time task to continue running in the absence of non-real-time tasks starving. In other words, they do not want to see the system going idle. This patch implements the RT_RUNTIME_GREED scheduler feature for greedy users (TM). When enabled, this feature will check if non-rt tasks are starving before throttling the real-time task. If the real-time task becomes throttled, it will be unthrottled as soon as the system goes idle, or when the next period starts, whichever comes first. This feature is enabled with the following command: # echo RT_RUNTIME_GREED > /sys/kernel/debug/sched_features The user might also want to disable NO_RT_RUNTIME_SHARE logic, to keep all CPUs with the same rt_runtime. # echo NO_RT_RUNTIME_SHARE > /sys/kernel/debug/sched_features With these two options set, the user will guarantee some runtime for non-rt-tasks on all CPUs, while keeping real-time tasks running as much as possible. The feature is disabled by default, keeping the current behavior. Signed-off-by: Daniel Bristot de OliveiraReviewed-by: Steven Rostedt Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Steven Rostedt Cc: Christoph Lameter Cc: linux-rt-users Cc: LKML diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 42d4027..c4c62ee 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3275,7 +3275,8 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie if (unlikely(!p)) p = idle_sched_class.pick_next_task(rq, prev, cookie); - return p; + if (likely(p != RETRY_TASK)) + return p; } again: diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 69631fa..3bd7a6d 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -66,6 +66,7 @@ SCHED_FEAT(RT_PUSH_IPI, true) SCHED_FEAT(FORCE_SD_OVERLAP, false) SCHED_FEAT(RT_RUNTIME_SHARE, true) +SCHED_FEAT(RT_RUNTIME_GREED, false) SCHED_FEAT(LB_MIN, false) SCHED_FEAT(ATTACH_AGE_LOAD, true) diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index 5405d3f..0f23e06 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c @@ -26,6 +26,10 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl static struct task_struct * pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie) { + if (sched_feat(RT_RUNTIME_GREED)) + if (try_to_unthrottle_rt_rq(>rt)) + return RETRY_TASK; + put_prev_task(rq, prev); update_idle_core(rq); schedstat_inc(rq->sched_goidle); diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 2516b8d..a6961a5 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -631,6 +631,22 @@ static inline struct rt_bandwidth *sched_rt_bandwidth(struct rt_rq *rt_rq) #endif /* CONFIG_RT_GROUP_SCHED */ +static inline void unthrottle_rt_rq(struct rt_rq *rt_rq) +{ + rt_rq->rt_time = 0; + rt_rq->rt_throttled = 0; + sched_rt_rq_enqueue(rt_rq); +} + +int try_to_unthrottle_rt_rq(struct rt_rq *rt_rq) +{ + if (rt_rq_throttled(rt_rq)) { + unthrottle_rt_rq(rt_rq); + return 1; + } + return 0; +} + bool sched_rt_bandwidth_account(struct rt_rq *rt_rq) { struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq); @@ -920,6 +936,18
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
as anticipated live to Daniel: -) +1 for the general concept, we'd need something similar also for SCHED_DEADLINE -) only issue might be that, if a non-RT task wakes up after the unthrottle, it will have to wait, but worst-case it will have a chance in the next throttling window -) an alternative to unthrottling might be temporary class downgrade to sched_other, but that might be much more complex, instead this Daniel's one looks quite simple -) when considering also DEADLINE tasks, it might be good to think about how we'd like the throttling of DEADLINE and RT tasks to inter-relate, e.g.: a) DEADLINE unthrottles if there's no RT nor OTHER tasks? what if there's an unthrottled RT? b) DEADLINE throttles by downgrading to OTHER? c) DEADLINE throttles by downgrading to RT (RR/FIFO and what prio?) My2c, thanks! T. On 07/11/2016 09:17, Daniel Bristot de Oliveira wrote: The rt throttling mechanism prevents the starvation of non-real-time tasks by CPU intensive real-time tasks. In terms of percentage, the default behavior allows real-time tasks to run up to 95% of a given period, leaving the other 5% of the period for non-real-time tasks. In the absence of non-rt tasks, the system goes idle for 5% of the period. Although this behavior works fine for the purpose of avoiding bad real-time tasks that can hang the system, some greed users want to allow the real-time task to continue running in the absence of non-real-time tasks starving. In other words, they do not want to see the system going idle. This patch implements the RT_RUNTIME_GREED scheduler feature for greedy users (TM). When enabled, this feature will check if non-rt tasks are starving before throttling the real-time task. If the real-time task becomes throttled, it will be unthrottled as soon as the system goes idle, or when the next period starts, whichever comes first. This feature is enabled with the following command: # echo RT_RUNTIME_GREED > /sys/kernel/debug/sched_features The user might also want to disable NO_RT_RUNTIME_SHARE logic, to keep all CPUs with the same rt_runtime. # echo NO_RT_RUNTIME_SHARE > /sys/kernel/debug/sched_features With these two options set, the user will guarantee some runtime for non-rt-tasks on all CPUs, while keeping real-time tasks running as much as possible. The feature is disabled by default, keeping the current behavior. Signed-off-by: Daniel Bristot de Oliveira Reviewed-by: Steven Rostedt Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Steven Rostedt Cc: Christoph Lameter Cc: linux-rt-users Cc: LKML diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 42d4027..c4c62ee 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3275,7 +3275,8 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie if (unlikely(!p)) p = idle_sched_class.pick_next_task(rq, prev, cookie); - return p; + if (likely(p != RETRY_TASK)) + return p; } again: diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 69631fa..3bd7a6d 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -66,6 +66,7 @@ SCHED_FEAT(RT_PUSH_IPI, true) SCHED_FEAT(FORCE_SD_OVERLAP, false) SCHED_FEAT(RT_RUNTIME_SHARE, true) +SCHED_FEAT(RT_RUNTIME_GREED, false) SCHED_FEAT(LB_MIN, false) SCHED_FEAT(ATTACH_AGE_LOAD, true) diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index 5405d3f..0f23e06 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c @@ -26,6 +26,10 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl static struct task_struct * pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie) { + if (sched_feat(RT_RUNTIME_GREED)) + if (try_to_unthrottle_rt_rq(>rt)) + return RETRY_TASK; + put_prev_task(rq, prev); update_idle_core(rq); schedstat_inc(rq->sched_goidle); diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 2516b8d..a6961a5 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -631,6 +631,22 @@ static inline struct rt_bandwidth *sched_rt_bandwidth(struct rt_rq *rt_rq) #endif /* CONFIG_RT_GROUP_SCHED */ +static inline void unthrottle_rt_rq(struct rt_rq *rt_rq) +{ + rt_rq->rt_time = 0; + rt_rq->rt_throttled = 0; + sched_rt_rq_enqueue(rt_rq); +} + +int try_to_unthrottle_rt_rq(struct rt_rq *rt_rq) +{ + if (rt_rq_throttled(rt_rq)) { + unthrottle_rt_rq(rt_rq); + return 1; + } + return 0; +} + bool sched_rt_bandwidth_account(struct rt_rq *rt_rq) { struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq); @@ -920,6 +936,18 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq) * but accrue some time due to boosting. */ if