Re: [PATCH v8 3/6] cpuset: Add cpuset.sched.load_balance flag to v2

2018-05-28 Thread Peter Zijlstra
On Thu, May 24, 2018 at 02:55:25PM -0400, Waiman Long wrote:
> On 05/24/2018 11:43 AM, Peter Zijlstra wrote:

> > I'm confused... why exactly do we have both domain and load_balance ?
> 
> The domain is for partitioning the CPUs only. It doesn't change the load
> balancing state. So the load_balance flag is still need to turn on and
> off load balancing.

OK, so we have to two boolean flags, giving 4 possible states. Lets just
go through them one by on:

A) domain:0 load_balance:0 -- we have no exclusive domain, but have
   load-balancing disabled across them. AFAICT this should be an invalid
   state.

B) domain:0 load_balance:1 -- we have no exclusive domain, but have
   load-balancing enabled. AFAICT this is the default state and is a
   no-op.

C) domain:1 load_balance:0 -- we have an exclusive domain, and have
   load-balancing disabled across it. This is, AFAICT, identical to
   having a bunch of sub/sibling groups each with a single CPU domain.

D) domain:1 load_balance:1 -- we have an exclusive domain, and have
   load-balancing enabled. This is a partition.

Now, I think I've overlooked the fact that load_balance==1 only really
means something when the parent's load_balance==0, but I'm not sure that
really changes anything.

So, afaict, the above only have two useful states: B and D. Which again
raises the question, why two knobs? What useful configurations does it
allow?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 3/6] cpuset: Add cpuset.sched.load_balance flag to v2

2018-05-25 Thread Waiman Long
On 05/25/2018 05:40 AM, Patrick Bellasi wrote:
> On 24-May 11:22, Waiman Long wrote:
>> On 05/24/2018 11:16 AM, Juri Lelli wrote:
>>> On 24/05/18 11:09, Waiman Long wrote:
 On 05/24/2018 10:36 AM, Juri Lelli wrote:
> On 17/05/18 16:55, Waiman Long wrote:
>
> [...]
>
>> +A parent cgroup cannot distribute all its CPUs to child
>> +scheduling domain cgroups unless its load balancing flag is
>> +turned off.
>> +
>> +  cpuset.sched.load_balance
>> +A read-write single value file which exists on non-root
>> +cpuset-enabled cgroups.  It is a binary value flag that accepts
>> +either "0" (off) or a non-zero value (on).  This flag is set
>> +by the parent and is not delegatable.
>> +
>> +When it is on, tasks within this cpuset will be load-balanced
>> +by the kernel scheduler.  Tasks will be moved from CPUs with
>> +high load to other CPUs within the same cpuset with less load
>> +periodically.
>> +
>> +When it is off, there will be no load balancing among CPUs on
>> +this cgroup.  Tasks will stay in the CPUs they are running on
>> +and will not be moved to other CPUs.
>> +
>> +The initial value of this flag is "1".  This flag is then
>> +inherited by child cgroups with cpuset enabled.  Its state
>> +can only be changed on a scheduling domain cgroup with no
>> +cpuset-enabled children.
> [...]
>
>> +/*
>> + * On default hierachy, a load balance flag change is only 
>> allowed
>> + * in a scheduling domain with no child cpuset.
>> + */
>> +if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys) && 
>> balance_flag_changed &&
>> +   (!is_sched_domain(cs) || css_has_online_children(>css))) 
>> {
>> +err = -EINVAL;
>> +goto out;
>> +}
> The rule is actually
>
>  - no child cpuset
>  - and it must be a scheduling domain
> I always a bit confused by the usage of "scheduling domain", which
> overlaps with the SD concept from the scheduler standpoint.

It is supposed to mimic SD concept of scheduler.

>
> AFAIU a cpuset sched domain is not granted to be turned into an
> actual scheduler SD, am I wrong?
>
> If that's the case, why not better disambiguate these two concept by
> calling the cpuset one a "cpus partition" or eventually "cpuset domain"?

Good point. Peter has similar comment. I will probably change the name
and clarifying it better in the documentation.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 3/6] cpuset: Add cpuset.sched.load_balance flag to v2

2018-05-25 Thread Patrick Bellasi
On 24-May 11:22, Waiman Long wrote:
> On 05/24/2018 11:16 AM, Juri Lelli wrote:
> > On 24/05/18 11:09, Waiman Long wrote:
> >> On 05/24/2018 10:36 AM, Juri Lelli wrote:
> >>> On 17/05/18 16:55, Waiman Long wrote:
> >>>
> >>> [...]
> >>>
>  +A parent cgroup cannot distribute all its CPUs to child
>  +scheduling domain cgroups unless its load balancing flag is
>  +turned off.
>  +
>  +  cpuset.sched.load_balance
>  +A read-write single value file which exists on non-root
>  +cpuset-enabled cgroups.  It is a binary value flag that accepts
>  +either "0" (off) or a non-zero value (on).  This flag is set
>  +by the parent and is not delegatable.
>  +
>  +When it is on, tasks within this cpuset will be load-balanced
>  +by the kernel scheduler.  Tasks will be moved from CPUs with
>  +high load to other CPUs within the same cpuset with less load
>  +periodically.
>  +
>  +When it is off, there will be no load balancing among CPUs on
>  +this cgroup.  Tasks will stay in the CPUs they are running on
>  +and will not be moved to other CPUs.
>  +
>  +The initial value of this flag is "1".  This flag is then
>  +inherited by child cgroups with cpuset enabled.  Its state
>  +can only be changed on a scheduling domain cgroup with no
>  +cpuset-enabled children.
> >>> [...]
> >>>
>  +/*
>  + * On default hierachy, a load balance flag change is only 
>  allowed
>  + * in a scheduling domain with no child cpuset.
>  + */
>  +if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys) && 
>  balance_flag_changed &&
>  +   (!is_sched_domain(cs) || css_has_online_children(>css))) 
>  {
>  +err = -EINVAL;
>  +goto out;
>  +}
> >>> The rule is actually
> >>>
> >>>  - no child cpuset
> >>>  - and it must be a scheduling domain

I always a bit confused by the usage of "scheduling domain", which
overlaps with the SD concept from the scheduler standpoint.

AFAIU a cpuset sched domain is not granted to be turned into an
actual scheduler SD, am I wrong?

If that's the case, why not better disambiguate these two concept by
calling the cpuset one a "cpus partition" or eventually "cpuset domain"?

-- 
#include 

Patrick Bellasi
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 3/6] cpuset: Add cpuset.sched.load_balance flag to v2

2018-05-24 Thread Waiman Long
On 05/24/2018 11:43 AM, Peter Zijlstra wrote:
> On Thu, May 17, 2018 at 04:55:42PM -0400, Waiman Long wrote:
>> The sched.load_balance flag is needed to enable CPU isolation similar to
>> what can be done with the "isolcpus" kernel boot parameter. Its value
>> can only be changed in a scheduling domain with no child cpusets. On
>> a non-scheduling domain cpuset, the value of sched.load_balance is
>> inherited from its parent.
>>
>> This flag is set by the parent and is not delegatable.
>>
>> Signed-off-by: Waiman Long 
>> ---
>>  Documentation/cgroup-v2.txt | 24 
>>  kernel/cgroup/cpuset.c  | 53 
>> +
>>  2 files changed, 73 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
>> index 54d9e22..071b634d 100644
>> --- a/Documentation/cgroup-v2.txt
>> +++ b/Documentation/cgroup-v2.txt
>> @@ -1536,6 +1536,30 @@ Cpuset Interface Files
>>  CPUs of the parent cgroup. Once it is set, this flag cannot be
>>  cleared if there are any child cgroups with cpuset enabled.
>>  
>> +A parent cgroup cannot distribute all its CPUs to child
>> +scheduling domain cgroups unless its load balancing flag is
>> +turned off.
>> +
>> +  cpuset.sched.load_balance
>> +A read-write single value file which exists on non-root
>> +cpuset-enabled cgroups.  It is a binary value flag that accepts
>> +either "0" (off) or a non-zero value (on).  This flag is set
>> +by the parent and is not delegatable.
>> +
>> +When it is on, tasks within this cpuset will be load-balanced
>> +by the kernel scheduler.  Tasks will be moved from CPUs with
>> +high load to other CPUs within the same cpuset with less load
>> +periodically.
>> +
>> +When it is off, there will be no load balancing among CPUs on
>> +this cgroup.  Tasks will stay in the CPUs they are running on
>> +and will not be moved to other CPUs.
>> +
>> +The initial value of this flag is "1".  This flag is then
>> +inherited by child cgroups with cpuset enabled.  Its state
>> +can only be changed on a scheduling domain cgroup with no
>> +cpuset-enabled children.
> I'm confused... why exactly do we have both domain and load_balance ?

The domain is for partitioning the CPUs only. It doesn't change the load
balancing state. So the load_balance flag is still need to turn on and
off load balancing.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 3/6] cpuset: Add cpuset.sched.load_balance flag to v2

2018-05-24 Thread Peter Zijlstra
On Thu, May 17, 2018 at 04:55:42PM -0400, Waiman Long wrote:
> The sched.load_balance flag is needed to enable CPU isolation similar to
> what can be done with the "isolcpus" kernel boot parameter. Its value
> can only be changed in a scheduling domain with no child cpusets. On
> a non-scheduling domain cpuset, the value of sched.load_balance is
> inherited from its parent.
> 
> This flag is set by the parent and is not delegatable.
> 
> Signed-off-by: Waiman Long 
> ---
>  Documentation/cgroup-v2.txt | 24 
>  kernel/cgroup/cpuset.c  | 53 
> +
>  2 files changed, 73 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
> index 54d9e22..071b634d 100644
> --- a/Documentation/cgroup-v2.txt
> +++ b/Documentation/cgroup-v2.txt
> @@ -1536,6 +1536,30 @@ Cpuset Interface Files
>   CPUs of the parent cgroup. Once it is set, this flag cannot be
>   cleared if there are any child cgroups with cpuset enabled.
>  
> + A parent cgroup cannot distribute all its CPUs to child
> + scheduling domain cgroups unless its load balancing flag is
> + turned off.
> +
> +  cpuset.sched.load_balance
> + A read-write single value file which exists on non-root
> + cpuset-enabled cgroups.  It is a binary value flag that accepts
> + either "0" (off) or a non-zero value (on).  This flag is set
> + by the parent and is not delegatable.
> +
> + When it is on, tasks within this cpuset will be load-balanced
> + by the kernel scheduler.  Tasks will be moved from CPUs with
> + high load to other CPUs within the same cpuset with less load
> + periodically.
> +
> + When it is off, there will be no load balancing among CPUs on
> + this cgroup.  Tasks will stay in the CPUs they are running on
> + and will not be moved to other CPUs.
> +
> + The initial value of this flag is "1".  This flag is then
> + inherited by child cgroups with cpuset enabled.  Its state
> + can only be changed on a scheduling domain cgroup with no
> + cpuset-enabled children.

I'm confused... why exactly do we have both domain and load_balance ?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 3/6] cpuset: Add cpuset.sched.load_balance flag to v2

2018-05-24 Thread Waiman Long
On 05/24/2018 11:16 AM, Juri Lelli wrote:
> On 24/05/18 11:09, Waiman Long wrote:
>> On 05/24/2018 10:36 AM, Juri Lelli wrote:
>>> On 17/05/18 16:55, Waiman Long wrote:
>>>
>>> [...]
>>>
 +  A parent cgroup cannot distribute all its CPUs to child
 +  scheduling domain cgroups unless its load balancing flag is
 +  turned off.
 +
 +  cpuset.sched.load_balance
 +  A read-write single value file which exists on non-root
 +  cpuset-enabled cgroups.  It is a binary value flag that accepts
 +  either "0" (off) or a non-zero value (on).  This flag is set
 +  by the parent and is not delegatable.
 +
 +  When it is on, tasks within this cpuset will be load-balanced
 +  by the kernel scheduler.  Tasks will be moved from CPUs with
 +  high load to other CPUs within the same cpuset with less load
 +  periodically.
 +
 +  When it is off, there will be no load balancing among CPUs on
 +  this cgroup.  Tasks will stay in the CPUs they are running on
 +  and will not be moved to other CPUs.
 +
 +  The initial value of this flag is "1".  This flag is then
 +  inherited by child cgroups with cpuset enabled.  Its state
 +  can only be changed on a scheduling domain cgroup with no
 +  cpuset-enabled children.
>>> [...]
>>>
 +  /*
 +   * On default hierachy, a load balance flag change is only allowed
 +   * in a scheduling domain with no child cpuset.
 +   */
 +  if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys) && balance_flag_changed &&
 + (!is_sched_domain(cs) || css_has_online_children(>css))) {
 +  err = -EINVAL;
 +  goto out;
 +  }
>>> The rule is actually
>>>
>>>  - no child cpuset
>>>  - and it must be a scheduling domain
>>>
>>> Right?
>> Yes, because it doesn't make sense to have a cpu in one cpuset that has
>> loading balance off while, at the same time, in another cpuset with load
>> balancing turned on. This restriction is there to make sure that the
>> above condition will not happen. I may be wrong if there is a realistic
>> use case where the above condition is desired.
> Yep, makes sense to me.
>
> Maybe add the second condition to the comment and documentation.

Sure. Will do.

-Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 3/6] cpuset: Add cpuset.sched.load_balance flag to v2

2018-05-24 Thread Juri Lelli
On 24/05/18 11:09, Waiman Long wrote:
> On 05/24/2018 10:36 AM, Juri Lelli wrote:
> > On 17/05/18 16:55, Waiman Long wrote:
> >
> > [...]
> >
> >> +  A parent cgroup cannot distribute all its CPUs to child
> >> +  scheduling domain cgroups unless its load balancing flag is
> >> +  turned off.
> >> +
> >> +  cpuset.sched.load_balance
> >> +  A read-write single value file which exists on non-root
> >> +  cpuset-enabled cgroups.  It is a binary value flag that accepts
> >> +  either "0" (off) or a non-zero value (on).  This flag is set
> >> +  by the parent and is not delegatable.
> >> +
> >> +  When it is on, tasks within this cpuset will be load-balanced
> >> +  by the kernel scheduler.  Tasks will be moved from CPUs with
> >> +  high load to other CPUs within the same cpuset with less load
> >> +  periodically.
> >> +
> >> +  When it is off, there will be no load balancing among CPUs on
> >> +  this cgroup.  Tasks will stay in the CPUs they are running on
> >> +  and will not be moved to other CPUs.
> >> +
> >> +  The initial value of this flag is "1".  This flag is then
> >> +  inherited by child cgroups with cpuset enabled.  Its state
> >> +  can only be changed on a scheduling domain cgroup with no
> >> +  cpuset-enabled children.
> > [...]
> >
> >> +  /*
> >> +   * On default hierachy, a load balance flag change is only allowed
> >> +   * in a scheduling domain with no child cpuset.
> >> +   */
> >> +  if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys) && balance_flag_changed &&
> >> + (!is_sched_domain(cs) || css_has_online_children(>css))) {
> >> +  err = -EINVAL;
> >> +  goto out;
> >> +  }
> > The rule is actually
> >
> >  - no child cpuset
> >  - and it must be a scheduling domain
> >
> > Right?
> 
> Yes, because it doesn't make sense to have a cpu in one cpuset that has
> loading balance off while, at the same time, in another cpuset with load
> balancing turned on. This restriction is there to make sure that the
> above condition will not happen. I may be wrong if there is a realistic
> use case where the above condition is desired.

Yep, makes sense to me.

Maybe add the second condition to the comment and documentation.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 3/6] cpuset: Add cpuset.sched.load_balance flag to v2

2018-05-24 Thread Juri Lelli
On 17/05/18 16:55, Waiman Long wrote:

[...]

> + A parent cgroup cannot distribute all its CPUs to child
> + scheduling domain cgroups unless its load balancing flag is
> + turned off.
> +
> +  cpuset.sched.load_balance
> + A read-write single value file which exists on non-root
> + cpuset-enabled cgroups.  It is a binary value flag that accepts
> + either "0" (off) or a non-zero value (on).  This flag is set
> + by the parent and is not delegatable.
> +
> + When it is on, tasks within this cpuset will be load-balanced
> + by the kernel scheduler.  Tasks will be moved from CPUs with
> + high load to other CPUs within the same cpuset with less load
> + periodically.
> +
> + When it is off, there will be no load balancing among CPUs on
> + this cgroup.  Tasks will stay in the CPUs they are running on
> + and will not be moved to other CPUs.
> +
> + The initial value of this flag is "1".  This flag is then
> + inherited by child cgroups with cpuset enabled.  Its state
> + can only be changed on a scheduling domain cgroup with no
> + cpuset-enabled children.

[...]

> + /*
> +  * On default hierachy, a load balance flag change is only allowed
> +  * in a scheduling domain with no child cpuset.
> +  */
> + if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys) && balance_flag_changed &&
> +(!is_sched_domain(cs) || css_has_online_children(>css))) {
> + err = -EINVAL;
> + goto out;
> + }

The rule is actually

 - no child cpuset
 - and it must be a scheduling domain

Right?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 3/6] cpuset: Add cpuset.sched.load_balance flag to v2

2018-05-17 Thread Waiman Long
The sched.load_balance flag is needed to enable CPU isolation similar to
what can be done with the "isolcpus" kernel boot parameter. Its value
can only be changed in a scheduling domain with no child cpusets. On
a non-scheduling domain cpuset, the value of sched.load_balance is
inherited from its parent.

This flag is set by the parent and is not delegatable.

Signed-off-by: Waiman Long 
---
 Documentation/cgroup-v2.txt | 24 
 kernel/cgroup/cpuset.c  | 53 +
 2 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index 54d9e22..071b634d 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -1536,6 +1536,30 @@ Cpuset Interface Files
CPUs of the parent cgroup. Once it is set, this flag cannot be
cleared if there are any child cgroups with cpuset enabled.
 
+   A parent cgroup cannot distribute all its CPUs to child
+   scheduling domain cgroups unless its load balancing flag is
+   turned off.
+
+  cpuset.sched.load_balance
+   A read-write single value file which exists on non-root
+   cpuset-enabled cgroups.  It is a binary value flag that accepts
+   either "0" (off) or a non-zero value (on).  This flag is set
+   by the parent and is not delegatable.
+
+   When it is on, tasks within this cpuset will be load-balanced
+   by the kernel scheduler.  Tasks will be moved from CPUs with
+   high load to other CPUs within the same cpuset with less load
+   periodically.
+
+   When it is off, there will be no load balancing among CPUs on
+   this cgroup.  Tasks will stay in the CPUs they are running on
+   and will not be moved to other CPUs.
+
+   The initial value of this flag is "1".  This flag is then
+   inherited by child cgroups with cpuset enabled.  Its state
+   can only be changed on a scheduling domain cgroup with no
+   cpuset-enabled children.
+
 
 Device controller
 -
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index e1a1af0..368e1b7 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -510,7 +510,7 @@ static int validate_change(struct cpuset *cur, struct 
cpuset *trial)
 
par = parent_cs(cur);
 
-   /* On legacy hiearchy, we must be a subset of our parent cpuset. */
+   /* On legacy hierarchy, we must be a subset of our parent cpuset. */
ret = -EACCES;
if (!is_in_v2_mode() && !is_cpuset_subset(trial, par))
goto out;
@@ -1061,6 +1061,14 @@ static int update_isolated_cpumask(struct cpuset *cpuset,
goto out;
 
/*
+* A parent can't distribute all its CPUs to child scheduling
+* domain cpusets unless load balancing is off.
+*/
+   if (adding & !deleting && is_sched_load_balance(parent) &&
+   cpumask_equal(addmask, parent->effective_cpus))
+   goto out;
+
+   /*
 * Check if any CPUs in addmask or delmask are in a sibling cpuset.
 * An empty sibling cpus_allowed means it is the same as parent's
 * effective_cpus. This checking is skipped if the cpuset is dying.
@@ -1531,6 +1539,16 @@ static int update_flag(cpuset_flagbits_t bit, struct 
cpuset *cs,
 
domain_flag_changed = (is_sched_domain(cs) != is_sched_domain(trialcs));
 
+   /*
+* On default hierachy, a load balance flag change is only allowed
+* in a scheduling domain with no child cpuset.
+*/
+   if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys) && balance_flag_changed &&
+  (!is_sched_domain(cs) || css_has_online_children(>css))) {
+   err = -EINVAL;
+   goto out;
+   }
+
if (domain_flag_changed) {
err = turning_on
? update_isolated_cpumask(cs, NULL, cs->cpus_allowed)
@@ -2187,6 +2205,14 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state 
*css, struct cftype *cft)
.flags = CFTYPE_NOT_ON_ROOT,
},
 
+   {
+   .name = "sched.load_balance",
+   .read_u64 = cpuset_read_u64,
+   .write_u64 = cpuset_write_u64,
+   .private = FILE_SCHED_LOAD_BALANCE,
+   .flags = CFTYPE_NOT_ON_ROOT,
+   },
+
{ } /* terminate */
 };
 
@@ -2200,19 +2226,38 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state 
*css, struct cftype *cft)
 cpuset_css_alloc(struct cgroup_subsys_state *parent_css)
 {
struct cpuset *cs;
+   struct cgroup_subsys_state *errptr = ERR_PTR(-ENOMEM);
 
if (!parent_css)
return _cpuset.css;
 
cs = kzalloc(sizeof(*cs), GFP_KERNEL);
if (!cs)
-   return ERR_PTR(-ENOMEM);
+   return errptr;
if (!alloc_cpumask_var(>cpus_allowed, GFP_KERNEL))
goto free_cs;
if