Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-19 Thread Vincent Guittot
On 19 February 2013 11:29, Vincent Guittot  wrote:
> On 18 February 2013 16:40, Frederic Weisbecker  wrote:
>> 2013/2/18 Vincent Guittot :
>>> On 18 February 2013 15:38, Frederic Weisbecker  wrote:
 I pasted the original at: http://pastebin.com/DMm5U8J8
>>>
>>> We can clear the idle flag only in the nohz_kick_needed which will not
>>> be called if the sched_domain is NULL so the sequence will be
>>>
>>> = CPU 0 == CPU 1=
>>>
>>> detach_and_destroy_domain {
>>> rcu_assign_pointer(cpu1_dom, NULL);
>>> }
>>>
>>> dom = new_domain(...) {
>>>  nr_cpus_busy = 0;
>>>  set_idle(CPU 1);
>>> }
>>> dom =
>>> rcu_dereference(cpu1_dom)
>>> //dom == NULL, return
>>>
>>> rcu_assign_pointer(cpu1_dom, dom);
>>>
>>> dom =
>>> rcu_dereference(cpu1_dom)
>>> //dom != NULL,
>>> nohz_kick_needed {
>>>
>>> set_idle(CPU 1)
>>>dom
>>> = rcu_dereference(cpu1_dom)
>>>
>>> //dec nr_cpus_busy,
>>> }
>>>
>>> Vincent
>>
>> Ok but CPU 0 can assign NULL to the domain of cpu1 while CPU 1 is
>> already in the middle of nohz_kick_needed().
>
> Yes nothing prevents the sequence below to occur
>
> = CPU 0 == CPU 1=
> dom =
> rcu_dereference(cpu1_dom)
> //dom != NULL
> detach_and_destroy_domain {
> rcu_assign_pointer(cpu1_dom, NULL);
> }
>
> dom = new_domain(...) {
>  nr_cpus_busy = 0;
>  //nr_cpus_busy in the new_dom
>  set_idle(CPU 1);
> }
> nohz_kick_needed {
>  clear_idle(CPU 1)
>  dom =
> rcu_dereference(cpu1_dom)
>
> //cpu1_dom == old_dom
>  inc nr_cpus_busy,
>
> //nr_cpus_busy in the old_dom
> }
>
> rcu_assign_pointer(cpu1_dom, dom);
> //cpu1_dom == new_dom

The sequence above is not correct in addition to become unreadable
after going through gmail

The correct and readable version
https://pastebin.linaro.org/1750/

Vincent

>
> I'm not sure that this can happen in practice because CPU1 is in
> interrupt handler but we don't have any mechanism to prevent the
> sequence.
>
> The NULL sched_domain can be used to detect this situation and the
> set_cpu_sd_state_busy function can be modified like below
>
> inline void set_cpu_sd_state_busy
>  {
> struct sched_domain *sd;
> int cpu = smp_processor_id();
> +   int clear = 0;
>
> if (!test_bit(NOHZ_IDLE, nohz_flags(cpu)))
> return;
> -   clear_bit(NOHZ_IDLE, nohz_flags(cpu));
>
> rcu_read_lock();
> for_each_domain(cpu, sd) {
> atomic_inc(>groups->sgp->nr_busy_cpus);
> +   clear = 1;
> }
> rcu_read_unlock();
> +
> +   if (likely(clear))
> +   clear_bit(NOHZ_IDLE, nohz_flags(cpu));
>  }
>
> The NOHZ_IDLE flag will not be clear if we have a NULL sched_domain
> attached to the CPU.
> With this implementation, we still don't need to get the sched_domain
> for testing the NOHZ_IDLE flag which occurs each time CPU becomes idle
>
> The patch 2 become useless
>
> Vincent
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-19 Thread Vincent Guittot
On 18 February 2013 16:40, Frederic Weisbecker  wrote:
> 2013/2/18 Vincent Guittot :
>> On 18 February 2013 15:38, Frederic Weisbecker  wrote:
>>> I pasted the original at: http://pastebin.com/DMm5U8J8
>>
>> We can clear the idle flag only in the nohz_kick_needed which will not
>> be called if the sched_domain is NULL so the sequence will be
>>
>> = CPU 0 == CPU 1=
>>
>> detach_and_destroy_domain {
>> rcu_assign_pointer(cpu1_dom, NULL);
>> }
>>
>> dom = new_domain(...) {
>>  nr_cpus_busy = 0;
>>  set_idle(CPU 1);
>> }
>> dom =
>> rcu_dereference(cpu1_dom)
>> //dom == NULL, return
>>
>> rcu_assign_pointer(cpu1_dom, dom);
>>
>> dom =
>> rcu_dereference(cpu1_dom)
>> //dom != NULL,
>> nohz_kick_needed {
>>
>> set_idle(CPU 1)
>>dom
>> = rcu_dereference(cpu1_dom)
>>
>> //dec nr_cpus_busy,
>> }
>>
>> Vincent
>
> Ok but CPU 0 can assign NULL to the domain of cpu1 while CPU 1 is
> already in the middle of nohz_kick_needed().

Yes nothing prevents the sequence below to occur

= CPU 0 == CPU 1=
dom =
rcu_dereference(cpu1_dom)
//dom != NULL
detach_and_destroy_domain {
rcu_assign_pointer(cpu1_dom, NULL);
}

dom = new_domain(...) {
 nr_cpus_busy = 0;
 //nr_cpus_busy in the new_dom
 set_idle(CPU 1);
}
nohz_kick_needed {
 clear_idle(CPU 1)
 dom =
rcu_dereference(cpu1_dom)

//cpu1_dom == old_dom
 inc nr_cpus_busy,

//nr_cpus_busy in the old_dom
}

rcu_assign_pointer(cpu1_dom, dom);
//cpu1_dom == new_dom

I'm not sure that this can happen in practice because CPU1 is in
interrupt handler but we don't have any mechanism to prevent the
sequence.

The NULL sched_domain can be used to detect this situation and the
set_cpu_sd_state_busy function can be modified like below

inline void set_cpu_sd_state_busy
 {
struct sched_domain *sd;
int cpu = smp_processor_id();
+   int clear = 0;

if (!test_bit(NOHZ_IDLE, nohz_flags(cpu)))
return;
-   clear_bit(NOHZ_IDLE, nohz_flags(cpu));

rcu_read_lock();
for_each_domain(cpu, sd) {
atomic_inc(>groups->sgp->nr_busy_cpus);
+   clear = 1;
}
rcu_read_unlock();
+
+   if (likely(clear))
+   clear_bit(NOHZ_IDLE, nohz_flags(cpu));
 }

The NOHZ_IDLE flag will not be clear if we have a NULL sched_domain
attached to the CPU.
With this implementation, we still don't need to get the sched_domain
for testing the NOHZ_IDLE flag which occurs each time CPU becomes idle

The patch 2 become useless

Vincent
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-19 Thread Vincent Guittot
On 18 February 2013 16:40, Frederic Weisbecker fweis...@gmail.com wrote:
 2013/2/18 Vincent Guittot vincent.guit...@linaro.org:
 On 18 February 2013 15:38, Frederic Weisbecker fweis...@gmail.com wrote:
 I pasted the original at: http://pastebin.com/DMm5U8J8

 We can clear the idle flag only in the nohz_kick_needed which will not
 be called if the sched_domain is NULL so the sequence will be

 = CPU 0 == CPU 1=

 detach_and_destroy_domain {
 rcu_assign_pointer(cpu1_dom, NULL);
 }

 dom = new_domain(...) {
  nr_cpus_busy = 0;
  set_idle(CPU 1);
 }
 dom =
 rcu_dereference(cpu1_dom)
 //dom == NULL, return

 rcu_assign_pointer(cpu1_dom, dom);

 dom =
 rcu_dereference(cpu1_dom)
 //dom != NULL,
 nohz_kick_needed {

 set_idle(CPU 1)
dom
 = rcu_dereference(cpu1_dom)

 //dec nr_cpus_busy,
 }

 Vincent

 Ok but CPU 0 can assign NULL to the domain of cpu1 while CPU 1 is
 already in the middle of nohz_kick_needed().

Yes nothing prevents the sequence below to occur

= CPU 0 == CPU 1=
dom =
rcu_dereference(cpu1_dom)
//dom != NULL
detach_and_destroy_domain {
rcu_assign_pointer(cpu1_dom, NULL);
}

dom = new_domain(...) {
 nr_cpus_busy = 0;
 //nr_cpus_busy in the new_dom
 set_idle(CPU 1);
}
nohz_kick_needed {
 clear_idle(CPU 1)
 dom =
rcu_dereference(cpu1_dom)

//cpu1_dom == old_dom
 inc nr_cpus_busy,

//nr_cpus_busy in the old_dom
}

rcu_assign_pointer(cpu1_dom, dom);
//cpu1_dom == new_dom

I'm not sure that this can happen in practice because CPU1 is in
interrupt handler but we don't have any mechanism to prevent the
sequence.

The NULL sched_domain can be used to detect this situation and the
set_cpu_sd_state_busy function can be modified like below

inline void set_cpu_sd_state_busy
 {
struct sched_domain *sd;
int cpu = smp_processor_id();
+   int clear = 0;

if (!test_bit(NOHZ_IDLE, nohz_flags(cpu)))
return;
-   clear_bit(NOHZ_IDLE, nohz_flags(cpu));

rcu_read_lock();
for_each_domain(cpu, sd) {
atomic_inc(sd-groups-sgp-nr_busy_cpus);
+   clear = 1;
}
rcu_read_unlock();
+
+   if (likely(clear))
+   clear_bit(NOHZ_IDLE, nohz_flags(cpu));
 }

The NOHZ_IDLE flag will not be clear if we have a NULL sched_domain
attached to the CPU.
With this implementation, we still don't need to get the sched_domain
for testing the NOHZ_IDLE flag which occurs each time CPU becomes idle

The patch 2 become useless

Vincent
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-19 Thread Vincent Guittot
On 19 February 2013 11:29, Vincent Guittot vincent.guit...@linaro.org wrote:
 On 18 February 2013 16:40, Frederic Weisbecker fweis...@gmail.com wrote:
 2013/2/18 Vincent Guittot vincent.guit...@linaro.org:
 On 18 February 2013 15:38, Frederic Weisbecker fweis...@gmail.com wrote:
 I pasted the original at: http://pastebin.com/DMm5U8J8

 We can clear the idle flag only in the nohz_kick_needed which will not
 be called if the sched_domain is NULL so the sequence will be

 = CPU 0 == CPU 1=

 detach_and_destroy_domain {
 rcu_assign_pointer(cpu1_dom, NULL);
 }

 dom = new_domain(...) {
  nr_cpus_busy = 0;
  set_idle(CPU 1);
 }
 dom =
 rcu_dereference(cpu1_dom)
 //dom == NULL, return

 rcu_assign_pointer(cpu1_dom, dom);

 dom =
 rcu_dereference(cpu1_dom)
 //dom != NULL,
 nohz_kick_needed {

 set_idle(CPU 1)
dom
 = rcu_dereference(cpu1_dom)

 //dec nr_cpus_busy,
 }

 Vincent

 Ok but CPU 0 can assign NULL to the domain of cpu1 while CPU 1 is
 already in the middle of nohz_kick_needed().

 Yes nothing prevents the sequence below to occur

 = CPU 0 == CPU 1=
 dom =
 rcu_dereference(cpu1_dom)
 //dom != NULL
 detach_and_destroy_domain {
 rcu_assign_pointer(cpu1_dom, NULL);
 }

 dom = new_domain(...) {
  nr_cpus_busy = 0;
  //nr_cpus_busy in the new_dom
  set_idle(CPU 1);
 }
 nohz_kick_needed {
  clear_idle(CPU 1)
  dom =
 rcu_dereference(cpu1_dom)

 //cpu1_dom == old_dom
  inc nr_cpus_busy,

 //nr_cpus_busy in the old_dom
 }

 rcu_assign_pointer(cpu1_dom, dom);
 //cpu1_dom == new_dom

The sequence above is not correct in addition to become unreadable
after going through gmail

The correct and readable version
https://pastebin.linaro.org/1750/

Vincent


 I'm not sure that this can happen in practice because CPU1 is in
 interrupt handler but we don't have any mechanism to prevent the
 sequence.

 The NULL sched_domain can be used to detect this situation and the
 set_cpu_sd_state_busy function can be modified like below

 inline void set_cpu_sd_state_busy
  {
 struct sched_domain *sd;
 int cpu = smp_processor_id();
 +   int clear = 0;

 if (!test_bit(NOHZ_IDLE, nohz_flags(cpu)))
 return;
 -   clear_bit(NOHZ_IDLE, nohz_flags(cpu));

 rcu_read_lock();
 for_each_domain(cpu, sd) {
 atomic_inc(sd-groups-sgp-nr_busy_cpus);
 +   clear = 1;
 }
 rcu_read_unlock();
 +
 +   if (likely(clear))
 +   clear_bit(NOHZ_IDLE, nohz_flags(cpu));
  }

 The NOHZ_IDLE flag will not be clear if we have a NULL sched_domain
 attached to the CPU.
 With this implementation, we still don't need to get the sched_domain
 for testing the NOHZ_IDLE flag which occurs each time CPU becomes idle

 The patch 2 become useless

 Vincent
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-18 Thread Frederic Weisbecker
2013/2/18 Vincent Guittot :
> On 18 February 2013 15:38, Frederic Weisbecker  wrote:
>> I pasted the original at: http://pastebin.com/DMm5U8J8
>
> We can clear the idle flag only in the nohz_kick_needed which will not
> be called if the sched_domain is NULL so the sequence will be
>
> = CPU 0 == CPU 1=
>
> detach_and_destroy_domain {
> rcu_assign_pointer(cpu1_dom, NULL);
> }
>
> dom = new_domain(...) {
>  nr_cpus_busy = 0;
>  set_idle(CPU 1);
> }
> dom =
> rcu_dereference(cpu1_dom)
> //dom == NULL, return
>
> rcu_assign_pointer(cpu1_dom, dom);
>
> dom =
> rcu_dereference(cpu1_dom)
> //dom != NULL,
> nohz_kick_needed {
>
> set_idle(CPU 1)
>dom
> = rcu_dereference(cpu1_dom)
>
> //dec nr_cpus_busy,
> }
>
> Vincent

Ok but CPU 0 can assign NULL to the domain of cpu1 while CPU 1 is
already in the middle of nohz_kick_needed().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-18 Thread Vincent Guittot
On 18 February 2013 15:38, Frederic Weisbecker  wrote:
> 2013/2/18 Frederic Weisbecker :
>> 2013/2/8 Vincent Guittot :
>>> On 8 February 2013 16:35, Frederic Weisbecker  wrote:
 What if the following happen (inventing function names but you get the 
 idea):

 CPU 0   CPU 1

 dom = new_domain(...) {
nr_cpus_busy = 0;
set_idle(CPU 1);  old_dom =get_dom()
  clear_idle(CPU 1)
 }
 rcu_assign_pointer(cpu1_dom, dom);


 Can this scenario happen?
>>>
>>> This scenario will be:
>>>
>>>  CPU 0   CPU 1
>>>
>>>  detach_and_destroy_domain {
>>> rcu_assign_pointer(cpu1_dom, NULL);
>>>  }
>>>
>>>  dom = new_domain(...) {
>>> nr_cpus_busy = 0;
>>> set_idle(CPU 1);  old_dom =get_dom()
>>>   old_dom is null
>>>   //clear_idle(CPU
>>> 1) can't happen because a null domain is attached so we will never
>>> call nohz_kick_needed which is the only place where we can clear_idle
>>>  }
>>>  rcu_assign_pointer(cpu1_dom, dom);
>>
>> So is the following possible?
>>
>> = CPU 0 =   = CPU 1=
>>
>> detach_and_destroy_domain {
>> rcu_assign_pointer(cpu1_dom, NULL);
>> }
>>
>> dom = new_domain(...) {
>>  nr_cpus_busy = 0;
>>  set_idle(CPU 1);
>> }
>>
>> clear_idle(CPU 1)
>>
>> dom = rcu_dereference(cpu1_dom)
>>
>> //dom == NULL, return
>>
>> rcu_assign_pointer(cpu1_dom, NULL);
>>
>>
>> set_idle(CPU 1)
>>
>> dom = rcu_dereference(cpu1_dom)
>>
>> //dec nr_cpus_busy, making it negative
>
> Sorry, gmail messed up as usual.
>
> I pasted the original at: http://pastebin.com/DMm5U8J8

We can clear the idle flag only in the nohz_kick_needed which will not
be called if the sched_domain is NULL so the sequence will be

= CPU 0 == CPU 1=

detach_and_destroy_domain {
rcu_assign_pointer(cpu1_dom, NULL);
}

dom = new_domain(...) {
 nr_cpus_busy = 0;
 set_idle(CPU 1);
}
dom =
rcu_dereference(cpu1_dom)
//dom == NULL, return

rcu_assign_pointer(cpu1_dom, dom);

dom =
rcu_dereference(cpu1_dom)
//dom != NULL,
nohz_kick_needed {

set_idle(CPU 1)
   dom
= rcu_dereference(cpu1_dom)

//dec nr_cpus_busy,
}

Vincent
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-18 Thread Frederic Weisbecker
2013/2/18 Frederic Weisbecker :
> 2013/2/8 Vincent Guittot :
>> On 8 February 2013 16:35, Frederic Weisbecker  wrote:
>>> What if the following happen (inventing function names but you get the 
>>> idea):
>>>
>>> CPU 0   CPU 1
>>>
>>> dom = new_domain(...) {
>>>nr_cpus_busy = 0;
>>>set_idle(CPU 1);  old_dom =get_dom()
>>>  clear_idle(CPU 1)
>>> }
>>> rcu_assign_pointer(cpu1_dom, dom);
>>>
>>>
>>> Can this scenario happen?
>>
>> This scenario will be:
>>
>>  CPU 0   CPU 1
>>
>>  detach_and_destroy_domain {
>> rcu_assign_pointer(cpu1_dom, NULL);
>>  }
>>
>>  dom = new_domain(...) {
>> nr_cpus_busy = 0;
>> set_idle(CPU 1);  old_dom =get_dom()
>>   old_dom is null
>>   //clear_idle(CPU
>> 1) can't happen because a null domain is attached so we will never
>> call nohz_kick_needed which is the only place where we can clear_idle
>>  }
>>  rcu_assign_pointer(cpu1_dom, dom);
>
> So is the following possible?
>
> = CPU 0 =   = CPU 1=
>
> detach_and_destroy_domain {
> rcu_assign_pointer(cpu1_dom, NULL);
> }
>
> dom = new_domain(...) {
>  nr_cpus_busy = 0;
>  set_idle(CPU 1);
> }
>
> clear_idle(CPU 1)
>
> dom = rcu_dereference(cpu1_dom)
>
> //dom == NULL, return
>
> rcu_assign_pointer(cpu1_dom, NULL);
>
>
> set_idle(CPU 1)
>
> dom = rcu_dereference(cpu1_dom)
>
> //dec nr_cpus_busy, making it negative

Sorry, gmail messed up as usual.

I pasted the original at: http://pastebin.com/DMm5U8J8
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-18 Thread Frederic Weisbecker
2013/2/8 Vincent Guittot :
> On 8 February 2013 16:35, Frederic Weisbecker  wrote:
>> What if the following happen (inventing function names but you get the idea):
>>
>> CPU 0   CPU 1
>>
>> dom = new_domain(...) {
>>nr_cpus_busy = 0;
>>set_idle(CPU 1);  old_dom =get_dom()
>>  clear_idle(CPU 1)
>> }
>> rcu_assign_pointer(cpu1_dom, dom);
>>
>>
>> Can this scenario happen?
>
> This scenario will be:
>
>  CPU 0   CPU 1
>
>  detach_and_destroy_domain {
> rcu_assign_pointer(cpu1_dom, NULL);
>  }
>
>  dom = new_domain(...) {
> nr_cpus_busy = 0;
> set_idle(CPU 1);  old_dom =get_dom()
>   old_dom is null
>   //clear_idle(CPU
> 1) can't happen because a null domain is attached so we will never
> call nohz_kick_needed which is the only place where we can clear_idle
>  }
>  rcu_assign_pointer(cpu1_dom, dom);

So is the following possible?

= CPU 0 =   = CPU 1=

detach_and_destroy_domain {
rcu_assign_pointer(cpu1_dom, NULL);
}

dom = new_domain(...) {
 nr_cpus_busy = 0;
 set_idle(CPU 1);
}

clear_idle(CPU 1)

dom = rcu_dereference(cpu1_dom)

//dom == NULL, return

rcu_assign_pointer(cpu1_dom, NULL);


set_idle(CPU 1)

dom = rcu_dereference(cpu1_dom)

//dec nr_cpus_busy, making it negative
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-18 Thread Frederic Weisbecker
2013/2/8 Vincent Guittot vincent.guit...@linaro.org:
 On 8 February 2013 16:35, Frederic Weisbecker fweis...@gmail.com wrote:
 What if the following happen (inventing function names but you get the idea):

 CPU 0   CPU 1

 dom = new_domain(...) {
nr_cpus_busy = 0;
set_idle(CPU 1);  old_dom =get_dom()
  clear_idle(CPU 1)
 }
 rcu_assign_pointer(cpu1_dom, dom);


 Can this scenario happen?

 This scenario will be:

  CPU 0   CPU 1

  detach_and_destroy_domain {
 rcu_assign_pointer(cpu1_dom, NULL);
  }

  dom = new_domain(...) {
 nr_cpus_busy = 0;
 set_idle(CPU 1);  old_dom =get_dom()
   old_dom is null
   //clear_idle(CPU
 1) can't happen because a null domain is attached so we will never
 call nohz_kick_needed which is the only place where we can clear_idle
  }
  rcu_assign_pointer(cpu1_dom, dom);

So is the following possible?

= CPU 0 =   = CPU 1=

detach_and_destroy_domain {
rcu_assign_pointer(cpu1_dom, NULL);
}

dom = new_domain(...) {
 nr_cpus_busy = 0;
 set_idle(CPU 1);
}

clear_idle(CPU 1)

dom = rcu_dereference(cpu1_dom)

//dom == NULL, return

rcu_assign_pointer(cpu1_dom, NULL);


set_idle(CPU 1)

dom = rcu_dereference(cpu1_dom)

//dec nr_cpus_busy, making it negative
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-18 Thread Frederic Weisbecker
2013/2/18 Frederic Weisbecker fweis...@gmail.com:
 2013/2/8 Vincent Guittot vincent.guit...@linaro.org:
 On 8 February 2013 16:35, Frederic Weisbecker fweis...@gmail.com wrote:
 What if the following happen (inventing function names but you get the 
 idea):

 CPU 0   CPU 1

 dom = new_domain(...) {
nr_cpus_busy = 0;
set_idle(CPU 1);  old_dom =get_dom()
  clear_idle(CPU 1)
 }
 rcu_assign_pointer(cpu1_dom, dom);


 Can this scenario happen?

 This scenario will be:

  CPU 0   CPU 1

  detach_and_destroy_domain {
 rcu_assign_pointer(cpu1_dom, NULL);
  }

  dom = new_domain(...) {
 nr_cpus_busy = 0;
 set_idle(CPU 1);  old_dom =get_dom()
   old_dom is null
   //clear_idle(CPU
 1) can't happen because a null domain is attached so we will never
 call nohz_kick_needed which is the only place where we can clear_idle
  }
  rcu_assign_pointer(cpu1_dom, dom);

 So is the following possible?

 = CPU 0 =   = CPU 1=

 detach_and_destroy_domain {
 rcu_assign_pointer(cpu1_dom, NULL);
 }

 dom = new_domain(...) {
  nr_cpus_busy = 0;
  set_idle(CPU 1);
 }

 clear_idle(CPU 1)

 dom = rcu_dereference(cpu1_dom)

 //dom == NULL, return

 rcu_assign_pointer(cpu1_dom, NULL);


 set_idle(CPU 1)

 dom = rcu_dereference(cpu1_dom)

 //dec nr_cpus_busy, making it negative

Sorry, gmail messed up as usual.

I pasted the original at: http://pastebin.com/DMm5U8J8
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-18 Thread Vincent Guittot
On 18 February 2013 15:38, Frederic Weisbecker fweis...@gmail.com wrote:
 2013/2/18 Frederic Weisbecker fweis...@gmail.com:
 2013/2/8 Vincent Guittot vincent.guit...@linaro.org:
 On 8 February 2013 16:35, Frederic Weisbecker fweis...@gmail.com wrote:
 What if the following happen (inventing function names but you get the 
 idea):

 CPU 0   CPU 1

 dom = new_domain(...) {
nr_cpus_busy = 0;
set_idle(CPU 1);  old_dom =get_dom()
  clear_idle(CPU 1)
 }
 rcu_assign_pointer(cpu1_dom, dom);


 Can this scenario happen?

 This scenario will be:

  CPU 0   CPU 1

  detach_and_destroy_domain {
 rcu_assign_pointer(cpu1_dom, NULL);
  }

  dom = new_domain(...) {
 nr_cpus_busy = 0;
 set_idle(CPU 1);  old_dom =get_dom()
   old_dom is null
   //clear_idle(CPU
 1) can't happen because a null domain is attached so we will never
 call nohz_kick_needed which is the only place where we can clear_idle
  }
  rcu_assign_pointer(cpu1_dom, dom);

 So is the following possible?

 = CPU 0 =   = CPU 1=

 detach_and_destroy_domain {
 rcu_assign_pointer(cpu1_dom, NULL);
 }

 dom = new_domain(...) {
  nr_cpus_busy = 0;
  set_idle(CPU 1);
 }

 clear_idle(CPU 1)

 dom = rcu_dereference(cpu1_dom)

 //dom == NULL, return

 rcu_assign_pointer(cpu1_dom, NULL);


 set_idle(CPU 1)

 dom = rcu_dereference(cpu1_dom)

 //dec nr_cpus_busy, making it negative

 Sorry, gmail messed up as usual.

 I pasted the original at: http://pastebin.com/DMm5U8J8

We can clear the idle flag only in the nohz_kick_needed which will not
be called if the sched_domain is NULL so the sequence will be

= CPU 0 == CPU 1=

detach_and_destroy_domain {
rcu_assign_pointer(cpu1_dom, NULL);
}

dom = new_domain(...) {
 nr_cpus_busy = 0;
 set_idle(CPU 1);
}
dom =
rcu_dereference(cpu1_dom)
//dom == NULL, return

rcu_assign_pointer(cpu1_dom, dom);

dom =
rcu_dereference(cpu1_dom)
//dom != NULL,
nohz_kick_needed {

set_idle(CPU 1)
   dom
= rcu_dereference(cpu1_dom)

//dec nr_cpus_busy,
}

Vincent
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-18 Thread Frederic Weisbecker
2013/2/18 Vincent Guittot vincent.guit...@linaro.org:
 On 18 February 2013 15:38, Frederic Weisbecker fweis...@gmail.com wrote:
 I pasted the original at: http://pastebin.com/DMm5U8J8

 We can clear the idle flag only in the nohz_kick_needed which will not
 be called if the sched_domain is NULL so the sequence will be

 = CPU 0 == CPU 1=

 detach_and_destroy_domain {
 rcu_assign_pointer(cpu1_dom, NULL);
 }

 dom = new_domain(...) {
  nr_cpus_busy = 0;
  set_idle(CPU 1);
 }
 dom =
 rcu_dereference(cpu1_dom)
 //dom == NULL, return

 rcu_assign_pointer(cpu1_dom, dom);

 dom =
 rcu_dereference(cpu1_dom)
 //dom != NULL,
 nohz_kick_needed {

 set_idle(CPU 1)
dom
 = rcu_dereference(cpu1_dom)

 //dec nr_cpus_busy,
 }

 Vincent

Ok but CPU 0 can assign NULL to the domain of cpu1 while CPU 1 is
already in the middle of nohz_kick_needed().
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-08 Thread Vincent Guittot
On 8 February 2013 16:35, Frederic Weisbecker  wrote:
> 2013/2/4 Vincent Guittot :
>> On 1 February 2013 19:03, Frederic Weisbecker  wrote:
 diff --git a/kernel/sched/core.c b/kernel/sched/core.c
 index 257002c..fd41924 100644
 --- a/kernel/sched/core.c
 +++ b/kernel/sched/core.c
 @@ -5884,6 +5884,7 @@ static void init_sched_groups_power(int cpu, struct 
 sched_domain *sd)

 update_group_power(sd, cpu);
 atomic_set(>sgp->nr_busy_cpus, sg->group_weight);
 +   clear_bit(NOHZ_IDLE, nohz_flags(cpu));
>>>
>>> So that's a real issue indeed.  nr_busy_cpus was never correct.
>>>
>>> Now I'm still a bit worried with this solution. What if an idle task
>>> started in smp_init() has not yet stopped its tick, but is about to do
>>> so? The domains are not yet available to the task but the nohz flags
>>> are. When it later restarts the tick, it's going to erroneously
>>> increase nr_busy_cpus.
>>
>> My 1st idea was to clear NOHZ_IDLE flag and nr_busy_cpus in
>> init_sched_groups_power instead of setting them as it is done now. If
>> a CPU enters idle during the init sequence, the flag is already
>> cleared, and nohz_flags and nr_busy_cpus will stay synced and cleared
>> while a NULL sched_domain is attached to the CPU thanks to patch 2.
>> This should solve all use cases ?
>
> This may work on smp_init(). But the per cpu domain can be changed 
> concurrently
> anytime on cpu hotplug, with a new sched group power struct, right?

During a cpu hotplug, a null domain is attached to each CPU of the
partition because we have to build new sched_domains so we have a
similar behavior than smp_init.
So if we clear  NOHZ_IDLE flag and nr_busy_cpus in
init_sched_groups_power, we should be safe for init and hotplug.

More generally speaking, if the sched_domains of a group of CPUs must
be rebuilt, a NULL sched_domain is attached to these CPUs during the
build

>
> What if the following happen (inventing function names but you get the idea):
>
> CPU 0   CPU 1
>
> dom = new_domain(...) {
>nr_cpus_busy = 0;
>set_idle(CPU 1);  old_dom =get_dom()
>  clear_idle(CPU 1)
> }
> rcu_assign_pointer(cpu1_dom, dom);
>
>
> Can this scenario happen?

This scenario will be:

 CPU 0   CPU 1

 detach_and_destroy_domain {
rcu_assign_pointer(cpu1_dom, NULL);
 }

 dom = new_domain(...) {
nr_cpus_busy = 0;
set_idle(CPU 1);  old_dom =get_dom()
  old_dom is null
  //clear_idle(CPU
1) can't happen because a null domain is attached so we will never
call nohz_kick_needed which is the only place where we can clear_idle
 }
 rcu_assign_pointer(cpu1_dom, dom);

>
>
>>>
>>> It probably won't happen in practice. But then there is more: sched
>>> domains can be concurrently rebuild anytime, right?  So what if we
>>> call set_cpu_sd_state_idle() and decrease nr_busy_cpus while the
>>> domain is switched concurrently. Are we having a new sched group along
>>> the way? If so we have a bug here as well because we can have
>>> NOHZ_IDLE set but nr_busy_cpus accounting the CPU.
>>
>> When the sched_domain are rebuilt, we set a null sched_domain during
>> the rebuild sequence and a new sched_group_power is created as well
>
> So at that time we may race with a CPU setting/clearing its NOHZ_IDLE flag
> as in my above scenario?

Unless i have missed a use case, we always have a null domain attached
to a CPU while we build the new one. So the patch 2/2 should protect
us against clearing the NOHZ_IDLE whereas the new nr_busy_cpus is not
yet attached.

I'm going to send a new version which set the NOHZ_IDLE bit and clear
nr_busy_cpus during the built of a sched_domain

Vincent
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-08 Thread Frederic Weisbecker
2013/2/4 Vincent Guittot :
> On 1 February 2013 19:03, Frederic Weisbecker  wrote:
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index 257002c..fd41924 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -5884,6 +5884,7 @@ static void init_sched_groups_power(int cpu, struct 
>>> sched_domain *sd)
>>>
>>> update_group_power(sd, cpu);
>>> atomic_set(>sgp->nr_busy_cpus, sg->group_weight);
>>> +   clear_bit(NOHZ_IDLE, nohz_flags(cpu));
>>
>> So that's a real issue indeed.  nr_busy_cpus was never correct.
>>
>> Now I'm still a bit worried with this solution. What if an idle task
>> started in smp_init() has not yet stopped its tick, but is about to do
>> so? The domains are not yet available to the task but the nohz flags
>> are. When it later restarts the tick, it's going to erroneously
>> increase nr_busy_cpus.
>
> My 1st idea was to clear NOHZ_IDLE flag and nr_busy_cpus in
> init_sched_groups_power instead of setting them as it is done now. If
> a CPU enters idle during the init sequence, the flag is already
> cleared, and nohz_flags and nr_busy_cpus will stay synced and cleared
> while a NULL sched_domain is attached to the CPU thanks to patch 2.
> This should solve all use cases ?

This may work on smp_init(). But the per cpu domain can be changed concurrently
anytime on cpu hotplug, with a new sched group power struct, right?

What if the following happen (inventing function names but you get the idea):

CPU 0   CPU 1

dom = new_domain(...) {
   nr_cpus_busy = 0;
   set_idle(CPU 1);  old_dom =get_dom()
 clear_idle(CPU 1)
}
rcu_assign_pointer(cpu1_dom, dom);


Can this scenario happen?


>>
>> It probably won't happen in practice. But then there is more: sched
>> domains can be concurrently rebuild anytime, right?  So what if we
>> call set_cpu_sd_state_idle() and decrease nr_busy_cpus while the
>> domain is switched concurrently. Are we having a new sched group along
>> the way? If so we have a bug here as well because we can have
>> NOHZ_IDLE set but nr_busy_cpus accounting the CPU.
>
> When the sched_domain are rebuilt, we set a null sched_domain during
> the rebuild sequence and a new sched_group_power is created as well

So at that time we may race with a CPU setting/clearing its NOHZ_IDLE flag
as in my above scenario?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-08 Thread Frederic Weisbecker
2013/2/4 Vincent Guittot vincent.guit...@linaro.org:
 On 1 February 2013 19:03, Frederic Weisbecker fweis...@gmail.com wrote:
 diff --git a/kernel/sched/core.c b/kernel/sched/core.c
 index 257002c..fd41924 100644
 --- a/kernel/sched/core.c
 +++ b/kernel/sched/core.c
 @@ -5884,6 +5884,7 @@ static void init_sched_groups_power(int cpu, struct 
 sched_domain *sd)

 update_group_power(sd, cpu);
 atomic_set(sg-sgp-nr_busy_cpus, sg-group_weight);
 +   clear_bit(NOHZ_IDLE, nohz_flags(cpu));

 So that's a real issue indeed.  nr_busy_cpus was never correct.

 Now I'm still a bit worried with this solution. What if an idle task
 started in smp_init() has not yet stopped its tick, but is about to do
 so? The domains are not yet available to the task but the nohz flags
 are. When it later restarts the tick, it's going to erroneously
 increase nr_busy_cpus.

 My 1st idea was to clear NOHZ_IDLE flag and nr_busy_cpus in
 init_sched_groups_power instead of setting them as it is done now. If
 a CPU enters idle during the init sequence, the flag is already
 cleared, and nohz_flags and nr_busy_cpus will stay synced and cleared
 while a NULL sched_domain is attached to the CPU thanks to patch 2.
 This should solve all use cases ?

This may work on smp_init(). But the per cpu domain can be changed concurrently
anytime on cpu hotplug, with a new sched group power struct, right?

What if the following happen (inventing function names but you get the idea):

CPU 0   CPU 1

dom = new_domain(...) {
   nr_cpus_busy = 0;
   set_idle(CPU 1);  old_dom =get_dom()
 clear_idle(CPU 1)
}
rcu_assign_pointer(cpu1_dom, dom);


Can this scenario happen?



 It probably won't happen in practice. But then there is more: sched
 domains can be concurrently rebuild anytime, right?  So what if we
 call set_cpu_sd_state_idle() and decrease nr_busy_cpus while the
 domain is switched concurrently. Are we having a new sched group along
 the way? If so we have a bug here as well because we can have
 NOHZ_IDLE set but nr_busy_cpus accounting the CPU.

 When the sched_domain are rebuilt, we set a null sched_domain during
 the rebuild sequence and a new sched_group_power is created as well

So at that time we may race with a CPU setting/clearing its NOHZ_IDLE flag
as in my above scenario?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-08 Thread Vincent Guittot
On 8 February 2013 16:35, Frederic Weisbecker fweis...@gmail.com wrote:
 2013/2/4 Vincent Guittot vincent.guit...@linaro.org:
 On 1 February 2013 19:03, Frederic Weisbecker fweis...@gmail.com wrote:
 diff --git a/kernel/sched/core.c b/kernel/sched/core.c
 index 257002c..fd41924 100644
 --- a/kernel/sched/core.c
 +++ b/kernel/sched/core.c
 @@ -5884,6 +5884,7 @@ static void init_sched_groups_power(int cpu, struct 
 sched_domain *sd)

 update_group_power(sd, cpu);
 atomic_set(sg-sgp-nr_busy_cpus, sg-group_weight);
 +   clear_bit(NOHZ_IDLE, nohz_flags(cpu));

 So that's a real issue indeed.  nr_busy_cpus was never correct.

 Now I'm still a bit worried with this solution. What if an idle task
 started in smp_init() has not yet stopped its tick, but is about to do
 so? The domains are not yet available to the task but the nohz flags
 are. When it later restarts the tick, it's going to erroneously
 increase nr_busy_cpus.

 My 1st idea was to clear NOHZ_IDLE flag and nr_busy_cpus in
 init_sched_groups_power instead of setting them as it is done now. If
 a CPU enters idle during the init sequence, the flag is already
 cleared, and nohz_flags and nr_busy_cpus will stay synced and cleared
 while a NULL sched_domain is attached to the CPU thanks to patch 2.
 This should solve all use cases ?

 This may work on smp_init(). But the per cpu domain can be changed 
 concurrently
 anytime on cpu hotplug, with a new sched group power struct, right?

During a cpu hotplug, a null domain is attached to each CPU of the
partition because we have to build new sched_domains so we have a
similar behavior than smp_init.
So if we clear  NOHZ_IDLE flag and nr_busy_cpus in
init_sched_groups_power, we should be safe for init and hotplug.

More generally speaking, if the sched_domains of a group of CPUs must
be rebuilt, a NULL sched_domain is attached to these CPUs during the
build


 What if the following happen (inventing function names but you get the idea):

 CPU 0   CPU 1

 dom = new_domain(...) {
nr_cpus_busy = 0;
set_idle(CPU 1);  old_dom =get_dom()
  clear_idle(CPU 1)
 }
 rcu_assign_pointer(cpu1_dom, dom);


 Can this scenario happen?

This scenario will be:

 CPU 0   CPU 1

 detach_and_destroy_domain {
rcu_assign_pointer(cpu1_dom, NULL);
 }

 dom = new_domain(...) {
nr_cpus_busy = 0;
set_idle(CPU 1);  old_dom =get_dom()
  old_dom is null
  //clear_idle(CPU
1) can't happen because a null domain is attached so we will never
call nohz_kick_needed which is the only place where we can clear_idle
 }
 rcu_assign_pointer(cpu1_dom, dom);




 It probably won't happen in practice. But then there is more: sched
 domains can be concurrently rebuild anytime, right?  So what if we
 call set_cpu_sd_state_idle() and decrease nr_busy_cpus while the
 domain is switched concurrently. Are we having a new sched group along
 the way? If so we have a bug here as well because we can have
 NOHZ_IDLE set but nr_busy_cpus accounting the CPU.

 When the sched_domain are rebuilt, we set a null sched_domain during
 the rebuild sequence and a new sched_group_power is created as well

 So at that time we may race with a CPU setting/clearing its NOHZ_IDLE flag
 as in my above scenario?

Unless i have missed a use case, we always have a null domain attached
to a CPU while we build the new one. So the patch 2/2 should protect
us against clearing the NOHZ_IDLE whereas the new nr_busy_cpus is not
yet attached.

I'm going to send a new version which set the NOHZ_IDLE bit and clear
nr_busy_cpus during the built of a sched_domain

Vincent
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-04 Thread Vincent Guittot
On 1 February 2013 19:03, Frederic Weisbecker  wrote:
> 2013/1/29 Vincent Guittot :
>> On my smp platform which is made of 5 cores in 2 clusters,I have the
>> nr_busy_cpu field of sched_group_power struct that is not null when the
>> platform is fully idle. The root cause seems to be:
>> During the boot sequence, some CPUs reach the idle loop and set their
>> NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus
>> field is initialized later with the assumption that all CPUs are in the busy
>> state whereas some CPUs have already set their NOHZ_IDLE flag.
>> We clear the NOHZ_IDLE flag when nr_busy_cpus is initialized in order to
>> have a coherent configuration.
>>
>> Signed-off-by: Vincent Guittot 
>> ---
>>  kernel/sched/core.c |1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 257002c..fd41924 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -5884,6 +5884,7 @@ static void init_sched_groups_power(int cpu, struct 
>> sched_domain *sd)
>>
>> update_group_power(sd, cpu);
>> atomic_set(>sgp->nr_busy_cpus, sg->group_weight);
>> +   clear_bit(NOHZ_IDLE, nohz_flags(cpu));
>
> So that's a real issue indeed.  nr_busy_cpus was never correct.
>
> Now I'm still a bit worried with this solution. What if an idle task
> started in smp_init() has not yet stopped its tick, but is about to do
> so? The domains are not yet available to the task but the nohz flags
> are. When it later restarts the tick, it's going to erroneously
> increase nr_busy_cpus.

My 1st idea was to clear NOHZ_IDLE flag and nr_busy_cpus in
init_sched_groups_power instead of setting them as it is done now. If
a CPU enters idle during the init sequence, the flag is already
cleared, and nohz_flags and nr_busy_cpus will stay synced and cleared
while a NULL sched_domain is attached to the CPU thanks to patch 2.
This should solve all use cases ?

>
> It probably won't happen in practice. But then there is more: sched
> domains can be concurrently rebuild anytime, right?  So what if we
> call set_cpu_sd_state_idle() and decrease nr_busy_cpus while the
> domain is switched concurrently. Are we having a new sched group along
> the way? If so we have a bug here as well because we can have
> NOHZ_IDLE set but nr_busy_cpus accounting the CPU.

When the sched_domain are rebuilt, we set a null sched_domain during
the rebuild sequence and a new sched_group_power is created as well

>
> May be we need to set the per cpu nohz flags on the child leaf sched
> domain? This way it's initialized and stored on the same RCU pointer
> and we nohz_flags and nr_busy_cpus become sync.
>
> Also we probably still need the first patch of your previous round.
> Because the current patch may introduce situations where we have idle
> CPUs with NOHZ_IDLE flags cleared.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-04 Thread Vincent Guittot
On 1 February 2013 19:03, Frederic Weisbecker fweis...@gmail.com wrote:
 2013/1/29 Vincent Guittot vincent.guit...@linaro.org:
 On my smp platform which is made of 5 cores in 2 clusters,I have the
 nr_busy_cpu field of sched_group_power struct that is not null when the
 platform is fully idle. The root cause seems to be:
 During the boot sequence, some CPUs reach the idle loop and set their
 NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus
 field is initialized later with the assumption that all CPUs are in the busy
 state whereas some CPUs have already set their NOHZ_IDLE flag.
 We clear the NOHZ_IDLE flag when nr_busy_cpus is initialized in order to
 have a coherent configuration.

 Signed-off-by: Vincent Guittot vincent.guit...@linaro.org
 ---
  kernel/sched/core.c |1 +
  1 file changed, 1 insertion(+)

 diff --git a/kernel/sched/core.c b/kernel/sched/core.c
 index 257002c..fd41924 100644
 --- a/kernel/sched/core.c
 +++ b/kernel/sched/core.c
 @@ -5884,6 +5884,7 @@ static void init_sched_groups_power(int cpu, struct 
 sched_domain *sd)

 update_group_power(sd, cpu);
 atomic_set(sg-sgp-nr_busy_cpus, sg-group_weight);
 +   clear_bit(NOHZ_IDLE, nohz_flags(cpu));

 So that's a real issue indeed.  nr_busy_cpus was never correct.

 Now I'm still a bit worried with this solution. What if an idle task
 started in smp_init() has not yet stopped its tick, but is about to do
 so? The domains are not yet available to the task but the nohz flags
 are. When it later restarts the tick, it's going to erroneously
 increase nr_busy_cpus.

My 1st idea was to clear NOHZ_IDLE flag and nr_busy_cpus in
init_sched_groups_power instead of setting them as it is done now. If
a CPU enters idle during the init sequence, the flag is already
cleared, and nohz_flags and nr_busy_cpus will stay synced and cleared
while a NULL sched_domain is attached to the CPU thanks to patch 2.
This should solve all use cases ?


 It probably won't happen in practice. But then there is more: sched
 domains can be concurrently rebuild anytime, right?  So what if we
 call set_cpu_sd_state_idle() and decrease nr_busy_cpus while the
 domain is switched concurrently. Are we having a new sched group along
 the way? If so we have a bug here as well because we can have
 NOHZ_IDLE set but nr_busy_cpus accounting the CPU.

When the sched_domain are rebuilt, we set a null sched_domain during
the rebuild sequence and a new sched_group_power is created as well


 May be we need to set the per cpu nohz flags on the child leaf sched
 domain? This way it's initialized and stored on the same RCU pointer
 and we nohz_flags and nr_busy_cpus become sync.

 Also we probably still need the first patch of your previous round.
 Because the current patch may introduce situations where we have idle
 CPUs with NOHZ_IDLE flags cleared.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-01 Thread Frederic Weisbecker
2013/1/29 Vincent Guittot :
> On my smp platform which is made of 5 cores in 2 clusters,I have the
> nr_busy_cpu field of sched_group_power struct that is not null when the
> platform is fully idle. The root cause seems to be:
> During the boot sequence, some CPUs reach the idle loop and set their
> NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus
> field is initialized later with the assumption that all CPUs are in the busy
> state whereas some CPUs have already set their NOHZ_IDLE flag.
> We clear the NOHZ_IDLE flag when nr_busy_cpus is initialized in order to
> have a coherent configuration.
>
> Signed-off-by: Vincent Guittot 
> ---
>  kernel/sched/core.c |1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 257002c..fd41924 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5884,6 +5884,7 @@ static void init_sched_groups_power(int cpu, struct 
> sched_domain *sd)
>
> update_group_power(sd, cpu);
> atomic_set(>sgp->nr_busy_cpus, sg->group_weight);
> +   clear_bit(NOHZ_IDLE, nohz_flags(cpu));

So that's a real issue indeed.  nr_busy_cpus was never correct.

Now I'm still a bit worried with this solution. What if an idle task
started in smp_init() has not yet stopped its tick, but is about to do
so? The domains are not yet available to the task but the nohz flags
are. When it later restarts the tick, it's going to erroneously
increase nr_busy_cpus.

It probably won't happen in practice. But then there is more: sched
domains can be concurrently rebuild anytime, right?  So what if we
call set_cpu_sd_state_idle() and decrease nr_busy_cpus while the
domain is switched concurrently. Are we having a new sched group along
the way? If so we have a bug here as well because we can have
NOHZ_IDLE set but nr_busy_cpus accounting the CPU.

May be we need to set the per cpu nohz flags on the child leaf sched
domain? This way it's initialized and stored on the same RCU pointer
and we nohz_flags and nr_busy_cpus become sync.

Also we probably still need the first patch of your previous round.
Because the current patch may introduce situations where we have idle
CPUs with NOHZ_IDLE flags cleared.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-01 Thread Frederic Weisbecker
2013/1/29 Vincent Guittot vincent.guit...@linaro.org:
 On my smp platform which is made of 5 cores in 2 clusters,I have the
 nr_busy_cpu field of sched_group_power struct that is not null when the
 platform is fully idle. The root cause seems to be:
 During the boot sequence, some CPUs reach the idle loop and set their
 NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus
 field is initialized later with the assumption that all CPUs are in the busy
 state whereas some CPUs have already set their NOHZ_IDLE flag.
 We clear the NOHZ_IDLE flag when nr_busy_cpus is initialized in order to
 have a coherent configuration.

 Signed-off-by: Vincent Guittot vincent.guit...@linaro.org
 ---
  kernel/sched/core.c |1 +
  1 file changed, 1 insertion(+)

 diff --git a/kernel/sched/core.c b/kernel/sched/core.c
 index 257002c..fd41924 100644
 --- a/kernel/sched/core.c
 +++ b/kernel/sched/core.c
 @@ -5884,6 +5884,7 @@ static void init_sched_groups_power(int cpu, struct 
 sched_domain *sd)

 update_group_power(sd, cpu);
 atomic_set(sg-sgp-nr_busy_cpus, sg-group_weight);
 +   clear_bit(NOHZ_IDLE, nohz_flags(cpu));

So that's a real issue indeed.  nr_busy_cpus was never correct.

Now I'm still a bit worried with this solution. What if an idle task
started in smp_init() has not yet stopped its tick, but is about to do
so? The domains are not yet available to the task but the nohz flags
are. When it later restarts the tick, it's going to erroneously
increase nr_busy_cpus.

It probably won't happen in practice. But then there is more: sched
domains can be concurrently rebuild anytime, right?  So what if we
call set_cpu_sd_state_idle() and decrease nr_busy_cpus while the
domain is switched concurrently. Are we having a new sched group along
the way? If so we have a bug here as well because we can have
NOHZ_IDLE set but nr_busy_cpus accounting the CPU.

May be we need to set the per cpu nohz flags on the child leaf sched
domain? This way it's initialized and stored on the same RCU pointer
and we nohz_flags and nr_busy_cpus become sync.

Also we probably still need the first patch of your previous round.
Because the current patch may introduce situations where we have idle
CPUs with NOHZ_IDLE flags cleared.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-01-29 Thread Vincent Guittot
On my smp platform which is made of 5 cores in 2 clusters,I have the
nr_busy_cpu field of sched_group_power struct that is not null when the
platform is fully idle. The root cause seems to be:
During the boot sequence, some CPUs reach the idle loop and set their
NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus
field is initialized later with the assumption that all CPUs are in the busy
state whereas some CPUs have already set their NOHZ_IDLE flag.
We clear the NOHZ_IDLE flag when nr_busy_cpus is initialized in order to
have a coherent configuration.

Signed-off-by: Vincent Guittot 
---
 kernel/sched/core.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 257002c..fd41924 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5884,6 +5884,7 @@ static void init_sched_groups_power(int cpu, struct 
sched_domain *sd)
 
update_group_power(sd, cpu);
atomic_set(>sgp->nr_busy_cpus, sg->group_weight);
+   clear_bit(NOHZ_IDLE, nohz_flags(cpu));
 }
 
 int __weak arch_sd_sibling_asym_packing(void)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-01-29 Thread Vincent Guittot
On my smp platform which is made of 5 cores in 2 clusters,I have the
nr_busy_cpu field of sched_group_power struct that is not null when the
platform is fully idle. The root cause seems to be:
During the boot sequence, some CPUs reach the idle loop and set their
NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus
field is initialized later with the assumption that all CPUs are in the busy
state whereas some CPUs have already set their NOHZ_IDLE flag.
We clear the NOHZ_IDLE flag when nr_busy_cpus is initialized in order to
have a coherent configuration.

Signed-off-by: Vincent Guittot vincent.guit...@linaro.org
---
 kernel/sched/core.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 257002c..fd41924 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5884,6 +5884,7 @@ static void init_sched_groups_power(int cpu, struct 
sched_domain *sd)
 
update_group_power(sd, cpu);
atomic_set(sg-sgp-nr_busy_cpus, sg-group_weight);
+   clear_bit(NOHZ_IDLE, nohz_flags(cpu));
 }
 
 int __weak arch_sd_sibling_asym_packing(void)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/