Re: CFS scheduler unfairly prefers pinned tasks
I wrote: The Linux CFS scheduler prefers pinned tasks and unfairly gives more CPU time to tasks that have set CPU affinity. ... I believe I have now solved the problem, simply by setting: for n in /proc/sys/kernel/sched_domain/cpu*/domain0/min_interval; do echo 0 > $n; done for n in /proc/sys/kernel/sched_domain/cpu*/domain0/max_interval; do echo 1 > $n; done Testing with real-life jobs, I found I needed min_- and max_interval for domain1 also, and a couple of other non-default values, so: for n in /proc/sys/kernel/sched_domain/cpu*/dom*/min_interval; do echo 0 > $n; done for n in /proc/sys/kernel/sched_domain/cpu*/dom*/max_interval; do echo 1 > $n; done echo 10 > /proc/sys/kernel/sched_latency_ns echo 10 > /proc/sys/kernel/sched_min_granularity_ns echo 1 > /proc/sys/kernel/sched_wakeup_granularity_ns and then things seem fair and my users are happy. Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
I wrote: The Linux CFS scheduler prefers pinned tasks and unfairly gives more CPU time to tasks that have set CPU affinity. I believe I have now solved the problem, simply by setting: for n in /proc/sys/kernel/sched_domain/cpu*/domain0/min_interval; do echo 0 > $n; done for n in /proc/sys/kernel/sched_domain/cpu*/domain0/max_interval; do echo 1 > $n; done I am not sure what the domain1 values would be for (that I see exist on my 4*E5-4627v2 server). So far I do not see any negative effects of using these (extreme?) settings. (Explanation of what these things are meant for, or pointers to documentation, would be appreciated.) --- Thanks for the insightful discussion. (Scary, isn't it?) Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
I wrote: The Linux CFS scheduler prefers pinned tasks and unfairly gives more CPU time to tasks that have set CPU affinity. I believe I have now solved the problem, simply by setting: for n in /proc/sys/kernel/sched_domain/cpu*/domain0/min_interval; do echo 0 > $n; done for n in /proc/sys/kernel/sched_domain/cpu*/domain0/max_interval; do echo 1 > $n; done I am not sure what the domain1 values would be for (that I see exist on my 4*E5-4627v2 server). So far I do not see any negative effects of using these (extreme?) settings. (Explanation of what these things are meant for, or pointers to documentation, would be appreciated.) --- Thanks for the insightful discussion. (Scary, isn't it?) Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
I wrote: The Linux CFS scheduler prefers pinned tasks and unfairly gives more CPU time to tasks that have set CPU affinity. ... I believe I have now solved the problem, simply by setting: for n in /proc/sys/kernel/sched_domain/cpu*/domain0/min_interval; do echo 0 > $n; done for n in /proc/sys/kernel/sched_domain/cpu*/domain0/max_interval; do echo 1 > $n; done Testing with real-life jobs, I found I needed min_- and max_interval for domain1 also, and a couple of other non-default values, so: for n in /proc/sys/kernel/sched_domain/cpu*/dom*/min_interval; do echo 0 > $n; done for n in /proc/sys/kernel/sched_domain/cpu*/dom*/max_interval; do echo 1 > $n; done echo 10 > /proc/sys/kernel/sched_latency_ns echo 10 > /proc/sys/kernel/sched_min_granularity_ns echo 1 > /proc/sys/kernel/sched_wakeup_granularity_ns and then things seem fair and my users are happy. Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
On 10/10/15 11:59 AM, Wanpeng Li wrote: Hi Paul, On 10/8/15 4:19 PM, Mike Galbraith wrote: On Tue, 2015-10-06 at 04:45 +0200, Mike Galbraith wrote: On Tue, 2015-10-06 at 08:48 +1100, paul.sz...@sydney.edu.au wrote: The Linux CFS scheduler prefers pinned tasks and unfairly gives more CPU time to tasks that have set CPU affinity. This effect is observed with or without CGROUP controls. To demonstrate: on an otherwise idle machine, as some user run several processes pinned to each CPU, one for each CPU (as many as CPUs present in the system) e.g. for a quad-core non-HyperThreaded machine: taskset -c 0 perl -e 'while(1){1}' & taskset -c 1 perl -e 'while(1){1}' & taskset -c 2 perl -e 'while(1){1}' & taskset -c 3 perl -e 'while(1){1}' & and (as that same or some other user) run some without pinning: perl -e 'while(1){1}' & perl -e 'while(1){1}' & and use e.g. top to observe that the pinned processes get more CPU time than "fair". Interesting, I can reproduce it w/ your simple script. However, they are fair when the number of pinned perl tasks is equal to unpinned perl tasks. I will dig into it more deeply. For the pinned tasks, when set the task affinity to all the available cpus instead of the separate cpu as in your test, there is fair between pinned tasks and unpinned tasks. So I suspect that if it is the overhead associated with migration stuff. Regards, Wanpeng Li -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
On 10/10/15 11:59 AM, Wanpeng Li wrote: Hi Paul, On 10/8/15 4:19 PM, Mike Galbraith wrote: On Tue, 2015-10-06 at 04:45 +0200, Mike Galbraith wrote: On Tue, 2015-10-06 at 08:48 +1100, paul.sz...@sydney.edu.au wrote: The Linux CFS scheduler prefers pinned tasks and unfairly gives more CPU time to tasks that have set CPU affinity. This effect is observed with or without CGROUP controls. To demonstrate: on an otherwise idle machine, as some user run several processes pinned to each CPU, one for each CPU (as many as CPUs present in the system) e.g. for a quad-core non-HyperThreaded machine: taskset -c 0 perl -e 'while(1){1}' & taskset -c 1 perl -e 'while(1){1}' & taskset -c 2 perl -e 'while(1){1}' & taskset -c 3 perl -e 'while(1){1}' & and (as that same or some other user) run some without pinning: perl -e 'while(1){1}' & perl -e 'while(1){1}' & and use e.g. top to observe that the pinned processes get more CPU time than "fair". Interesting, I can reproduce it w/ your simple script. However, they are fair when the number of pinned perl tasks is equal to unpinned perl tasks. I will dig into it more deeply. For the pinned tasks, when set the task affinity to all the available cpus instead of the separate cpu as in your test, there is fair between pinned tasks and unpinned tasks. So I suspect that if it is the overhead associated with migration stuff. Regards, Wanpeng Li -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
Hi Paul, On 10/8/15 4:19 PM, Mike Galbraith wrote: On Tue, 2015-10-06 at 04:45 +0200, Mike Galbraith wrote: On Tue, 2015-10-06 at 08:48 +1100, paul.sz...@sydney.edu.au wrote: The Linux CFS scheduler prefers pinned tasks and unfairly gives more CPU time to tasks that have set CPU affinity. This effect is observed with or without CGROUP controls. To demonstrate: on an otherwise idle machine, as some user run several processes pinned to each CPU, one for each CPU (as many as CPUs present in the system) e.g. for a quad-core non-HyperThreaded machine: taskset -c 0 perl -e 'while(1){1}' & taskset -c 1 perl -e 'while(1){1}' & taskset -c 2 perl -e 'while(1){1}' & taskset -c 3 perl -e 'while(1){1}' & and (as that same or some other user) run some without pinning: perl -e 'while(1){1}' & perl -e 'while(1){1}' & and use e.g. top to observe that the pinned processes get more CPU time than "fair". Interesting, I can reproduce it w/ your simple script. However, they are fair when the number of pinned perl tasks is equal to unpinned perl tasks. I will dig into it more deeply. Regards, Wanpeng Li -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
Hi Paul, On 10/8/15 4:19 PM, Mike Galbraith wrote: On Tue, 2015-10-06 at 04:45 +0200, Mike Galbraith wrote: On Tue, 2015-10-06 at 08:48 +1100, paul.sz...@sydney.edu.au wrote: The Linux CFS scheduler prefers pinned tasks and unfairly gives more CPU time to tasks that have set CPU affinity. This effect is observed with or without CGROUP controls. To demonstrate: on an otherwise idle machine, as some user run several processes pinned to each CPU, one for each CPU (as many as CPUs present in the system) e.g. for a quad-core non-HyperThreaded machine: taskset -c 0 perl -e 'while(1){1}' & taskset -c 1 perl -e 'while(1){1}' & taskset -c 2 perl -e 'while(1){1}' & taskset -c 3 perl -e 'while(1){1}' & and (as that same or some other user) run some without pinning: perl -e 'while(1){1}' & perl -e 'while(1){1}' & and use e.g. top to observe that the pinned processes get more CPU time than "fair". Interesting, I can reproduce it w/ your simple script. However, they are fair when the number of pinned perl tasks is equal to unpinned perl tasks. I will dig into it more deeply. Regards, Wanpeng Li -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
On Fri, 2015-10-09 at 08:55 +1100, paul.sz...@sydney.edu.au wrote: > >> Good to see that you agree ... > > Weeell, we've disagreed on pretty much everything ... > > Sorry I disagree: we do agree on the essence. :-) P.S. To some extent. If the essence is $subject, nope, we definitely disagree. If the essence is that _group_ scheduling is not strictly fair, then we agree. The must be fixed bit, I also disagree with. Maybe wants fixing I can agree with ;-) -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
On Fri, 2015-10-09 at 08:55 +1100, paul.sz...@sydney.edu.au wrote: > Dear Mike, > > >>> I see a fairness issue ... but one opposite to your complaint. > >> Why is that opposite? ... > > > > Well, not exactly opposite, only opposite in that the one pert task also > > receives MORE than it's fair share when unpinned. Two 100$ hogs sharing > > one CPU should each get 50% of that CPU. ... > > But you are using CGROUPs, grouping all oinks into one group, and the > one pert into another: requesting each group to get same total CPU. > Since pert has one process only, the most he can get is 100% (not 400%), > and it is quite OK for the oinks together to get 700%. Well, that of course depends on what you call fair. I realize why and where it happens. I told weight adjustment to keep its grubby mitts off of autogroups, and of course the "problem" went away. Back to the viewpoint thing, with two users, each having been _placed_ in a group, I can well imagine a user who is trying to use all of his authorized bandwidth raising an eyebrow when he sees one of his tasks getting 24 whole milliseconds per second with an allegedly fair scheduler. I can see it both ways. What's going to come out of this is probably going to be "tough titty, yes, group scheduling has side effects, and this is one". I already know it does. Question is only whether the weight adjustment gears are spinning as intended or not. > > IFF ... massively parallel and synchronized ... > > You would be making the assumption that you had the machine to yourself: > might be the wrong thing to assume. Yup, it would be a doomed attempt to run a load which cannot thrive in a shared environment in such an environment. Are any of the compute loads you're having trouble with.. in the math department.. perhaps doing oh, say complex math goop that feeds the output of one parallel computation into the next parallel computation? :) -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
Dear Mike, >>> I see a fairness issue ... but one opposite to your complaint. >> Why is that opposite? ... > > Well, not exactly opposite, only opposite in that the one pert task also > receives MORE than it's fair share when unpinned. Two 100$ hogs sharing > one CPU should each get 50% of that CPU. ... But you are using CGROUPs, grouping all oinks into one group, and the one pert into another: requesting each group to get same total CPU. Since pert has one process only, the most he can get is 100% (not 400%), and it is quite OK for the oinks together to get 700%. > IFF ... massively parallel and synchronized ... You would be making the assumption that you had the machine to yourself: might be the wrong thing to assume. >> Good to see that you agree ... > Weeell, we've disagreed on pretty much everything ... Sorry I disagree: we do agree on the essence. :-) Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
On Thu, 2015-10-08 at 21:54 +1100, paul.sz...@sydney.edu.au wrote: > Dear Mike, > > > I see a fairness issue ... but one opposite to your complaint. > > Why is that opposite? I think it would be fair for the one pert process > to get 100% CPU, the many oink processes can get everything else. That > one oink is lowly 10% (when others are 100%) is of no consequence. Well, not exactly opposite, only opposite in that the one pert task also receives MORE than it's fair share when unpinned. Two 100$ hogs sharing one CPU should each get 50% of that CPU. The fact that the oink group contains 8 tasks vs 1 for the pert group should be irrelevant, but what that last oinker is getting is 1/9 of a CPU, and there just happen to be 9 runnable tasks total, 1 in group pert, and 8 in group oink. IFF that ratio were to prove to be a constant, AND the oink group were a massively parallel and synchronized compute job on a huge box, that entire compute job would not be slowed down by the factor 2 that a fair distribution would do to it, on say a 1000 core box, it'd be.. utterly dead, because you'd put it out of your misery. vogelweide:~/:[0]# cgexec -g cpu:foo bash vogelweide:~/:[0]# for i in `seq 0 63`; do taskset -c $i cpuhog& done [1] 8025 [2] 8026 ... vogelweide:~/:[130]# cgexec -g cpu:bar bash vogelweide:~/:[130]# taskset -c 63 pert 10 (report every 10 seconds) 2260.91 MHZ CPU perturbation threshold 0.024 usecs. pert/s: 255 >2070.76us: 38 min: 0.05 max:4065.46 avg: 93.83 sum/s: 23946us overhead: 2.39% pert/s: 255 >2070.32us: 37 min: 1.32 max:4039.94 avg: 92.82 sum/s: 23744us overhead: 2.37% pert/s: 253 >2069.85us: 38 min: 0.05 max:4036.44 avg: 94.89 sum/s: 24054us overhead: 2.41% Hm, that's a kinda odd looking number from my 64 core box, but whatever, it's far from fair according to my definition thereof. Poor little oink plus all other cycles not spent in pert's tight loop add up ~24ms/s. > Good to see that you agree on the fairness issue... it MUST be fixed! > CFS might be wrong or wasteful, but never unfair. Weeell, we've disagreed on pretty much everything we've talked about so far, but I can well imagine that what I see in the share update business _could_ be part of your massive compute job woes. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
On Thu, Oct 08, 2015 at 09:54:21PM +1100, paul.sz...@sydney.edu.au wrote: > Good to see that you agree on the fairness issue... it MUST be fixed! > CFS might be wrong or wasteful, but never unfair. I've not yet had time to look at the case at hand, but there are wat is called 'infeasible weight' scenarios for which it is impossible to be fair. Also, CFS must remain a practical scheduler, which places bounds on the amount of weird cases we can deal with. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
Dear Mike, > I see a fairness issue ... but one opposite to your complaint. Why is that opposite? I think it would be fair for the one pert process to get 100% CPU, the many oink processes can get everything else. That one oink is lowly 10% (when others are 100%) is of no consequence. What happens when you un-pin pert: does it get 100%? What if you run two perts? Have you reproduced my observations? --- Good to see that you agree on the fairness issue... it MUST be fixed! CFS might be wrong or wasteful, but never unfair. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
On Tue, 2015-10-06 at 04:45 +0200, Mike Galbraith wrote: > On Tue, 2015-10-06 at 08:48 +1100, paul.sz...@sydney.edu.au wrote: > > The Linux CFS scheduler prefers pinned tasks and unfairly > > gives more CPU time to tasks that have set CPU affinity. > > This effect is observed with or without CGROUP controls. > > > > To demonstrate: on an otherwise idle machine, as some user > > run several processes pinned to each CPU, one for each CPU > > (as many as CPUs present in the system) e.g. for a quad-core > > non-HyperThreaded machine: > > > > taskset -c 0 perl -e 'while(1){1}' & > > taskset -c 1 perl -e 'while(1){1}' & > > taskset -c 2 perl -e 'while(1){1}' & > > taskset -c 3 perl -e 'while(1){1}' & > > > > and (as that same or some other user) run some without > > pinning: > > > > perl -e 'while(1){1}' & > > perl -e 'while(1){1}' & > > > > and use e.g. top to observe that the pinned processes get > > more CPU time than "fair". I see a fairness issue with pinned tasks and group scheduling, but one opposite to your complaint. Two task groups, one with 8 hogs (oink), one with 1 (pert), all are pinned. PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ P COMMAND 3269 root 20 04060724648 R 100.0 0.004 1:00.02 1 oink 3270 root 20 04060652576 R 100.0 0.004 0:59.84 2 oink 3271 root 20 04060692616 R 100.0 0.004 0:59.95 3 oink 3274 root 20 04060608532 R 100.0 0.004 1:00.01 6 oink 3273 root 20 04060728652 R 99.90 0.005 0:59.98 5 oink 3272 root 20 04060644568 R 99.51 0.004 0:59.80 4 oink 3268 root 20 04060612536 R 99.41 0.004 0:59.67 0 oink 3279 root 20 08312804708 R 88.83 0.005 0:53.06 7 pert 3275 root 20 04060656580 R 11.07 0.004 0:06.98 7 oink . That group share math would make a huge compute group with progress checkpoints sharing an SGI monster with one other hog amusing to watch. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
On Tue, 2015-10-06 at 04:45 +0200, Mike Galbraith wrote: > On Tue, 2015-10-06 at 08:48 +1100, paul.sz...@sydney.edu.au wrote: > > The Linux CFS scheduler prefers pinned tasks and unfairly > > gives more CPU time to tasks that have set CPU affinity. > > This effect is observed with or without CGROUP controls. > > > > To demonstrate: on an otherwise idle machine, as some user > > run several processes pinned to each CPU, one for each CPU > > (as many as CPUs present in the system) e.g. for a quad-core > > non-HyperThreaded machine: > > > > taskset -c 0 perl -e 'while(1){1}' & > > taskset -c 1 perl -e 'while(1){1}' & > > taskset -c 2 perl -e 'while(1){1}' & > > taskset -c 3 perl -e 'while(1){1}' & > > > > and (as that same or some other user) run some without > > pinning: > > > > perl -e 'while(1){1}' & > > perl -e 'while(1){1}' & > > > > and use e.g. top to observe that the pinned processes get > > more CPU time than "fair". I see a fairness issue with pinned tasks and group scheduling, but one opposite to your complaint. Two task groups, one with 8 hogs (oink), one with 1 (pert), all are pinned. PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ P COMMAND 3269 root 20 04060724648 R 100.0 0.004 1:00.02 1 oink 3270 root 20 04060652576 R 100.0 0.004 0:59.84 2 oink 3271 root 20 04060692616 R 100.0 0.004 0:59.95 3 oink 3274 root 20 04060608532 R 100.0 0.004 1:00.01 6 oink 3273 root 20 04060728652 R 99.90 0.005 0:59.98 5 oink 3272 root 20 04060644568 R 99.51 0.004 0:59.80 4 oink 3268 root 20 04060612536 R 99.41 0.004 0:59.67 0 oink 3279 root 20 08312804708 R 88.83 0.005 0:53.06 7 pert 3275 root 20 04060656580 R 11.07 0.004 0:06.98 7 oink . That group share math would make a huge compute group with progress checkpoints sharing an SGI monster with one other hog amusing to watch. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
Dear Mike, > I see a fairness issue ... but one opposite to your complaint. Why is that opposite? I think it would be fair for the one pert process to get 100% CPU, the many oink processes can get everything else. That one oink is lowly 10% (when others are 100%) is of no consequence. What happens when you un-pin pert: does it get 100%? What if you run two perts? Have you reproduced my observations? --- Good to see that you agree on the fairness issue... it MUST be fixed! CFS might be wrong or wasteful, but never unfair. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
On Thu, Oct 08, 2015 at 09:54:21PM +1100, paul.sz...@sydney.edu.au wrote: > Good to see that you agree on the fairness issue... it MUST be fixed! > CFS might be wrong or wasteful, but never unfair. I've not yet had time to look at the case at hand, but there are wat is called 'infeasible weight' scenarios for which it is impossible to be fair. Also, CFS must remain a practical scheduler, which places bounds on the amount of weird cases we can deal with. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
On Thu, 2015-10-08 at 21:54 +1100, paul.sz...@sydney.edu.au wrote: > Dear Mike, > > > I see a fairness issue ... but one opposite to your complaint. > > Why is that opposite? I think it would be fair for the one pert process > to get 100% CPU, the many oink processes can get everything else. That > one oink is lowly 10% (when others are 100%) is of no consequence. Well, not exactly opposite, only opposite in that the one pert task also receives MORE than it's fair share when unpinned. Two 100$ hogs sharing one CPU should each get 50% of that CPU. The fact that the oink group contains 8 tasks vs 1 for the pert group should be irrelevant, but what that last oinker is getting is 1/9 of a CPU, and there just happen to be 9 runnable tasks total, 1 in group pert, and 8 in group oink. IFF that ratio were to prove to be a constant, AND the oink group were a massively parallel and synchronized compute job on a huge box, that entire compute job would not be slowed down by the factor 2 that a fair distribution would do to it, on say a 1000 core box, it'd be.. utterly dead, because you'd put it out of your misery. vogelweide:~/:[0]# cgexec -g cpu:foo bash vogelweide:~/:[0]# for i in `seq 0 63`; do taskset -c $i cpuhog& done [1] 8025 [2] 8026 ... vogelweide:~/:[130]# cgexec -g cpu:bar bash vogelweide:~/:[130]# taskset -c 63 pert 10 (report every 10 seconds) 2260.91 MHZ CPU perturbation threshold 0.024 usecs. pert/s: 255 >2070.76us: 38 min: 0.05 max:4065.46 avg: 93.83 sum/s: 23946us overhead: 2.39% pert/s: 255 >2070.32us: 37 min: 1.32 max:4039.94 avg: 92.82 sum/s: 23744us overhead: 2.37% pert/s: 253 >2069.85us: 38 min: 0.05 max:4036.44 avg: 94.89 sum/s: 24054us overhead: 2.41% Hm, that's a kinda odd looking number from my 64 core box, but whatever, it's far from fair according to my definition thereof. Poor little oink plus all other cycles not spent in pert's tight loop add up ~24ms/s. > Good to see that you agree on the fairness issue... it MUST be fixed! > CFS might be wrong or wasteful, but never unfair. Weeell, we've disagreed on pretty much everything we've talked about so far, but I can well imagine that what I see in the share update business _could_ be part of your massive compute job woes. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
Dear Mike, >>> I see a fairness issue ... but one opposite to your complaint. >> Why is that opposite? ... > > Well, not exactly opposite, only opposite in that the one pert task also > receives MORE than it's fair share when unpinned. Two 100$ hogs sharing > one CPU should each get 50% of that CPU. ... But you are using CGROUPs, grouping all oinks into one group, and the one pert into another: requesting each group to get same total CPU. Since pert has one process only, the most he can get is 100% (not 400%), and it is quite OK for the oinks together to get 700%. > IFF ... massively parallel and synchronized ... You would be making the assumption that you had the machine to yourself: might be the wrong thing to assume. >> Good to see that you agree ... > Weeell, we've disagreed on pretty much everything ... Sorry I disagree: we do agree on the essence. :-) Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
On Fri, 2015-10-09 at 08:55 +1100, paul.sz...@sydney.edu.au wrote: > Dear Mike, > > >>> I see a fairness issue ... but one opposite to your complaint. > >> Why is that opposite? ... > > > > Well, not exactly opposite, only opposite in that the one pert task also > > receives MORE than it's fair share when unpinned. Two 100$ hogs sharing > > one CPU should each get 50% of that CPU. ... > > But you are using CGROUPs, grouping all oinks into one group, and the > one pert into another: requesting each group to get same total CPU. > Since pert has one process only, the most he can get is 100% (not 400%), > and it is quite OK for the oinks together to get 700%. Well, that of course depends on what you call fair. I realize why and where it happens. I told weight adjustment to keep its grubby mitts off of autogroups, and of course the "problem" went away. Back to the viewpoint thing, with two users, each having been _placed_ in a group, I can well imagine a user who is trying to use all of his authorized bandwidth raising an eyebrow when he sees one of his tasks getting 24 whole milliseconds per second with an allegedly fair scheduler. I can see it both ways. What's going to come out of this is probably going to be "tough titty, yes, group scheduling has side effects, and this is one". I already know it does. Question is only whether the weight adjustment gears are spinning as intended or not. > > IFF ... massively parallel and synchronized ... > > You would be making the assumption that you had the machine to yourself: > might be the wrong thing to assume. Yup, it would be a doomed attempt to run a load which cannot thrive in a shared environment in such an environment. Are any of the compute loads you're having trouble with.. in the math department.. perhaps doing oh, say complex math goop that feeds the output of one parallel computation into the next parallel computation? :) -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
On Fri, 2015-10-09 at 08:55 +1100, paul.sz...@sydney.edu.au wrote: > >> Good to see that you agree ... > > Weeell, we've disagreed on pretty much everything ... > > Sorry I disagree: we do agree on the essence. :-) P.S. To some extent. If the essence is $subject, nope, we definitely disagree. If the essence is that _group_ scheduling is not strictly fair, then we agree. The must be fixed bit, I also disagree with. Maybe wants fixing I can agree with ;-) -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
On Wed, 2015-10-07 at 07:44 +1100, paul.sz...@sydney.edu.au wrote: > I agree that pinning may be bad... should not the kernel penalize the > badly pinned processes? I didn't say pinning is bad, I said was what you're seeing is not a bug. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
Dear Mike, >> ... the CFS is meant to be fair, using things like vruntime >> to preempt, and throttling. Why are those pinned tasks not preempted or >> throttled? > > Imagine you own a 8192 CPU box for a moment, all CPUs having one pinned > task, plus one extra unpinned task, and ponder what would have to happen > in order to meet your utilization expectation. ... Sorry but the kernel contradicts. As per my original report, things are "fair" in the case of: - with CGROUP controls and the two kinds of processes run by different users, when there is just one un-pinned process and that is so on my quad-core i5-3470 baby or my 32-core 4*E5-4627v2 server (and everywhere that I tested). The kernel is smart and gets it right for one un-pinned process: why not for two? Now re-testing further (on some machines with CGROUP): on the i5-3470 things are fair still with one un-pinned (become un-fair with two), on the 4*E5-4627v2 are fair still with 4 un-pinned (become un-fair with 5). Does this suggest that the kernel does things right within each physical CPU, but breaks across several (or exact contrary)? Maybe not: on a 2*E5530 machine, things are fair with just one un-pinned and un-fair with 2 already. > What you're seeing is not a bug. No task can occupy more than one CPU > at a time, making space reservation on multiple CPUs a very bad idea. I agree that pinning may be bad... should not the kernel penalize the badly pinned processes? Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
On Tue, 2015-10-06 at 21:06 +1100, paul.sz...@sydney.edu.au wrote: > And further... the CFS is meant to be fair, using things like vruntime > to preempt, and throttling. Why are those pinned tasks not preempted or > throttled? Imagine you own a 8192 CPU box for a moment, all CPUs having one pinned task, plus one extra unpinned task, and ponder what would have to happen in order to meet your utilization expectation.Right. What you're seeing is not a bug. No task can occupy more than one CPU at a time, making space reservation on multiple CPUs a very bad idea. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
Dear Mike, >> .. CFS ... unfairly gives more CPU time to [pinned] tasks ... > > If they can all migrate, load balancing can move any of them to try to > fix the permanent imbalance, so they'll all bounce about sharing a CPU > with some other hog, and it all kinda sorta works out. > > When most are pinned, to make it work out long term you'd have to be > short term unfair, walking the unpinned minority around the box in a > carefully orchestrated dance... and have omniscient powers that assure > that none of the tasks you're trying to equalize is gonna do something > rude like leave, sleep, fork or whatever, and muck up the grand plan. Could not your argument be turned around: for a pinned task it is harder to find an idle CPU, so they should get less time? But really... those pinned tasks do not hog the CPU forever. Whatever kicks them off: could not that be done just a little earlier? And further... the CFS is meant to be fair, using things like vruntime to preempt, and throttling. Why are those pinned tasks not preempted or throttled? Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
Dear Mike, >> ... the CFS is meant to be fair, using things like vruntime >> to preempt, and throttling. Why are those pinned tasks not preempted or >> throttled? > > Imagine you own a 8192 CPU box for a moment, all CPUs having one pinned > task, plus one extra unpinned task, and ponder what would have to happen > in order to meet your utilization expectation. ... Sorry but the kernel contradicts. As per my original report, things are "fair" in the case of: - with CGROUP controls and the two kinds of processes run by different users, when there is just one un-pinned process and that is so on my quad-core i5-3470 baby or my 32-core 4*E5-4627v2 server (and everywhere that I tested). The kernel is smart and gets it right for one un-pinned process: why not for two? Now re-testing further (on some machines with CGROUP): on the i5-3470 things are fair still with one un-pinned (become un-fair with two), on the 4*E5-4627v2 are fair still with 4 un-pinned (become un-fair with 5). Does this suggest that the kernel does things right within each physical CPU, but breaks across several (or exact contrary)? Maybe not: on a 2*E5530 machine, things are fair with just one un-pinned and un-fair with 2 already. > What you're seeing is not a bug. No task can occupy more than one CPU > at a time, making space reservation on multiple CPUs a very bad idea. I agree that pinning may be bad... should not the kernel penalize the badly pinned processes? Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
On Wed, 2015-10-07 at 07:44 +1100, paul.sz...@sydney.edu.au wrote: > I agree that pinning may be bad... should not the kernel penalize the > badly pinned processes? I didn't say pinning is bad, I said was what you're seeing is not a bug. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
Dear Mike, >> .. CFS ... unfairly gives more CPU time to [pinned] tasks ... > > If they can all migrate, load balancing can move any of them to try to > fix the permanent imbalance, so they'll all bounce about sharing a CPU > with some other hog, and it all kinda sorta works out. > > When most are pinned, to make it work out long term you'd have to be > short term unfair, walking the unpinned minority around the box in a > carefully orchestrated dance... and have omniscient powers that assure > that none of the tasks you're trying to equalize is gonna do something > rude like leave, sleep, fork or whatever, and muck up the grand plan. Could not your argument be turned around: for a pinned task it is harder to find an idle CPU, so they should get less time? But really... those pinned tasks do not hog the CPU forever. Whatever kicks them off: could not that be done just a little earlier? And further... the CFS is meant to be fair, using things like vruntime to preempt, and throttling. Why are those pinned tasks not preempted or throttled? Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
On Tue, 2015-10-06 at 21:06 +1100, paul.sz...@sydney.edu.au wrote: > And further... the CFS is meant to be fair, using things like vruntime > to preempt, and throttling. Why are those pinned tasks not preempted or > throttled? Imagine you own a 8192 CPU box for a moment, all CPUs having one pinned task, plus one extra unpinned task, and ponder what would have to happen in order to meet your utilization expectation.Right. What you're seeing is not a bug. No task can occupy more than one CPU at a time, making space reservation on multiple CPUs a very bad idea. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
On Tue, 2015-10-06 at 08:48 +1100, paul.sz...@sydney.edu.au wrote: > The Linux CFS scheduler prefers pinned tasks and unfairly > gives more CPU time to tasks that have set CPU affinity. > This effect is observed with or without CGROUP controls. > > To demonstrate: on an otherwise idle machine, as some user > run several processes pinned to each CPU, one for each CPU > (as many as CPUs present in the system) e.g. for a quad-core > non-HyperThreaded machine: > > taskset -c 0 perl -e 'while(1){1}' & > taskset -c 1 perl -e 'while(1){1}' & > taskset -c 2 perl -e 'while(1){1}' & > taskset -c 3 perl -e 'while(1){1}' & > > and (as that same or some other user) run some without > pinning: > > perl -e 'while(1){1}' & > perl -e 'while(1){1}' & > > and use e.g. top to observe that the pinned processes get > more CPU time than "fair". > > Fairness is obtained when either: > - there are as many un-pinned processes as CPUs; or > - with CGROUP controls and the two kinds of processes run by >different users, when there is just one un-pinned process; or > - if the pinning is turned off for these processes (or they >are started without). > > Any insight is welcome! If they can all migrate, load balancing can move any of them to try to fix the permanent imbalance, so they'll all bounce about sharing a CPU with some other hog, and it all kinda sorta works out. When most are pinned, to make it work out long term you'd have to be short term unfair, walking the unpinned minority around the box in a carefully orchestrated dance... and have omniscient powers that assure that none of the tasks you're trying to equalize is gonna do something rude like leave, sleep, fork or whatever, and muck up the grand plan. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
CFS scheduler unfairly prefers pinned tasks
The Linux CFS scheduler prefers pinned tasks and unfairly gives more CPU time to tasks that have set CPU affinity. This effect is observed with or without CGROUP controls. To demonstrate: on an otherwise idle machine, as some user run several processes pinned to each CPU, one for each CPU (as many as CPUs present in the system) e.g. for a quad-core non-HyperThreaded machine: taskset -c 0 perl -e 'while(1){1}' & taskset -c 1 perl -e 'while(1){1}' & taskset -c 2 perl -e 'while(1){1}' & taskset -c 3 perl -e 'while(1){1}' & and (as that same or some other user) run some without pinning: perl -e 'while(1){1}' & perl -e 'while(1){1}' & and use e.g. top to observe that the pinned processes get more CPU time than "fair". Fairness is obtained when either: - there are as many un-pinned processes as CPUs; or - with CGROUP controls and the two kinds of processes run by different users, when there is just one un-pinned process; or - if the pinning is turned off for these processes (or they are started without). Any insight is welcome! --- I would appreciate replies direct to me as I am not subscribed to the linux-kernel mailing list (but will try to watch the archives). This bug is also reported to Debian, please see http://bugs.debian.org/800945 I use Debian with the 3.16 kernel, have not yet tried 4.* kernels. Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
CFS scheduler unfairly prefers pinned tasks
The Linux CFS scheduler prefers pinned tasks and unfairly gives more CPU time to tasks that have set CPU affinity. This effect is observed with or without CGROUP controls. To demonstrate: on an otherwise idle machine, as some user run several processes pinned to each CPU, one for each CPU (as many as CPUs present in the system) e.g. for a quad-core non-HyperThreaded machine: taskset -c 0 perl -e 'while(1){1}' & taskset -c 1 perl -e 'while(1){1}' & taskset -c 2 perl -e 'while(1){1}' & taskset -c 3 perl -e 'while(1){1}' & and (as that same or some other user) run some without pinning: perl -e 'while(1){1}' & perl -e 'while(1){1}' & and use e.g. top to observe that the pinned processes get more CPU time than "fair". Fairness is obtained when either: - there are as many un-pinned processes as CPUs; or - with CGROUP controls and the two kinds of processes run by different users, when there is just one un-pinned process; or - if the pinning is turned off for these processes (or they are started without). Any insight is welcome! --- I would appreciate replies direct to me as I am not subscribed to the linux-kernel mailing list (but will try to watch the archives). This bug is also reported to Debian, please see http://bugs.debian.org/800945 I use Debian with the 3.16 kernel, have not yet tried 4.* kernels. Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS scheduler unfairly prefers pinned tasks
On Tue, 2015-10-06 at 08:48 +1100, paul.sz...@sydney.edu.au wrote: > The Linux CFS scheduler prefers pinned tasks and unfairly > gives more CPU time to tasks that have set CPU affinity. > This effect is observed with or without CGROUP controls. > > To demonstrate: on an otherwise idle machine, as some user > run several processes pinned to each CPU, one for each CPU > (as many as CPUs present in the system) e.g. for a quad-core > non-HyperThreaded machine: > > taskset -c 0 perl -e 'while(1){1}' & > taskset -c 1 perl -e 'while(1){1}' & > taskset -c 2 perl -e 'while(1){1}' & > taskset -c 3 perl -e 'while(1){1}' & > > and (as that same or some other user) run some without > pinning: > > perl -e 'while(1){1}' & > perl -e 'while(1){1}' & > > and use e.g. top to observe that the pinned processes get > more CPU time than "fair". > > Fairness is obtained when either: > - there are as many un-pinned processes as CPUs; or > - with CGROUP controls and the two kinds of processes run by >different users, when there is just one un-pinned process; or > - if the pinning is turned off for these processes (or they >are started without). > > Any insight is welcome! If they can all migrate, load balancing can move any of them to try to fix the permanent imbalance, so they'll all bounce about sharing a CPU with some other hog, and it all kinda sorta works out. When most are pinned, to make it work out long term you'd have to be short term unfair, walking the unpinned minority around the box in a carefully orchestrated dance... and have omniscient powers that assure that none of the tasks you're trying to equalize is gonna do something rude like leave, sleep, fork or whatever, and muck up the grand plan. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/